MASS (Multiple Alignment by Secondary Structures) program Contact: Oranit Dror (oranit@post.tau.ac.il) Web Site: http://bioinfo3d.cs.tau.ac.il/MASS References: - O. Dror, H. Benyamini, R. Nussinov and H.J. Wolfson (2003). MASS: Multiple Structural Alignment by Secondary Structures, Bioinformatics, 19 Suppl. 1: i95-i104 - O. Dror, H. Benyamini, R. Nussinov, and H. Wolfson. Multiple structural alignment by secondary structures: algorithm and applications. Protein Science, 12:2492 2507, 2003. --------------------------------------------------------------------- Installation: ------------- To install, simply open the archive anywhere. --------------------------------------------------------------------- Quick start: ------------ 1. Input: PDB format files for the molecules you wish to align. 2. Run mass. The command line is: mass . The output is a file named 'mass.output'. This file outlines all the possible alignments found by mass (the format of the file is explained below). 3. Use mass2pdb to create PDB files of a specific alignment. The command line is: mass2pdb mass.output By executing this command, three PDB files (in which the molecules are aligned) and a RASMOL script are created. These files are furhter explained below. * NOTE: MASS utilizes secondary structure assignment. By default, it uses the secondary structure assignment of the PDB. Therefore, running MASS in the default mode requires the input PDB files to contain the secondary structure records (HELIX, SHEET). MASS supports two other options for secondary structure assignment: DSSP and DSSPC. Our recommndation is to use the DSSP ASSIGNMENT, since we find it the is the most accurate. See the input section (1B below) for more details. Using other assignments than the default means that the input contains the coordinate files (PDB format) and secondary structure assignments files (.dssp or .dsspc). --------------------------------------------------------------------- Configuration: -------------- The default configuration file is located under MASS folder (where all the executable files and scripts) and its name is "mass.config". In addition, mass can get as an input in the command line a configuration file that is specific for the run (by using the "-c" flag in the command line, see below). --------------------------------------------------------------------- The rest of this document explains the tools in the mass package in depth. List of Tools: -------------- The tools that would be described are: 1. mass - The main application, which run MASS algorithm 2. mkdssps - A script to create DSSP files from PDB files 3. mass2pdb - A utility for creating PDB files where the molecules are aligned. 4. mass2csv - A utility for creating a CSV (Separated Comma Value) file that summarizes all the alignments obtained by mass in a spreadsheet. 5. mass2fasta - A utility for creating a sequence alignment, based on MASS structural alignment, in FASTA format. 1. mass -------- A. Command line format: mass [options] or mass -f [options] Where: - is a file listing the paths to all PDB files one wish to align. Listing the paths of all the PDB files in one file is a more convenient way when experimenting with many molecules and rerunning mass. - options are: -c, --config=configuration file configuration file. If not specified, the default parameters are used. -o, --output=output file output file. If not specified, the output will be printed into a file named mass.output B. Input: For each protein there are two inputs: One is the sequence of the 3D coordinates of its atoms in PDB format. The other is the assignment of SSE types to each residue in one of the following formats: PDB, DSSP and DSSPC. mass expects two files for each molecule: 1. A PDB file. 2. A file that assigns secondary structures to the residues of the molecule in one of the following formats: - PDB: In this case just one PDB file is needed. - DSSP: If the format of the PDB file's name is ".pdb", then the name of the DSSP file should be: ".dssp". Otherwise, it should be: ".dssp". Note that a script named mkdssps is available for creating such files automatically for several PDB files. - DSSPC: If the format of the PDB file's name is ".pdb", then the name of the DSSP file should be: ".dssp". Otherwise, it should be: ".dssp". The default is to use the PDB file. This is the easiest way of using mass, but is less accurate than using the other two. DSSP is the most reliable and most tests done with mass used it, however, it means the use of an external application (http://www.sander.ebi.ac.uk/dssp/). Configuring mass so it will use DSSP / DSSPC is done by changing the value of 'sse-file-type' from 'PDB' to either 'DSSP' or 'DSSPC' in the configuration file. For more details, see the section that describes a script named 'mkdssps' later. C. Output : By default, mass creates an output file named mass.out in the directory from which it has been run. This file summarises the alignments mass has found. Log entries and progress messages go to the standard output. The output file has 5 parts: 1. The parameters that were used for this run of mass. 2. General information about the molecules that have been loaded (e.g. names, average size). 3. A table that for each number of molecules, gives information about the best alignment (e.g. the core size, the average RMSD). For example, if mass has been run on three molecules, this table will have two entries: one for the best alignment found for 2 molecules and the other for the best alignment found for 3 molecules. 4. A table that lists all the alignment mass has found. Each alignment has an ID. This ID can be used to find a more detailed information about the alignment down the file. 5. The rest of the file gives detailed description about each alignment: - The ID of the alignment - The names of the molecules participating in the alignment - The transformations that superimpose the molecules onto the reference molecule (the first molecule in the list). - Core size, average RMSD - The match list that describes the core of the alignment. Each entry contains the atoms that mass has matched between the molecules. D. Customizing If a configuration file is supplied in the command line (by using -c option), mass use this file. Otherwise, mass uses the default configuration (defined in a file named 'mass.config' and is supposed to be located at the same directory as mass). 2. mkdssps ----------- Use this utility when running mass with DSSP files. It creates DSSP files from PDB files. A. Command line format mkdssps or mkdssps -f B. Output For each PDB file, mkdssps creates a DSSP file, in the same location. The script assumes the dssp binary is located in the same directory as the script, and is named 'dssp'. To change that, change the variable DSSP_PROGRAM at the start of the script (more details are available within the script). 3. mass2pdb ------------ Use this utility when you wish to view one of the alignments, obtained by mass. A. Command line format mass2pdb B. Output The utility creates the following three PDB files. - A PDB file named 'fullAlignment.pdb': When loading this file into a viewer, the alignened molecules will be shown completely. - A PDB file named 'referenceAlignment': When loading this file into a viewer, the reference molecule will be shown completely. For the rest of the molecules, only their core, superimposed on the reference molecule, will be displayed - A PDB file named 'coreAlignment': When loading this file into a viewer, only the core of the alignment will be shown. In addition, the utility creates a Rasmol script, named "script_fullAlignment.rsm". This script loads the 'full_alignment.pdb file' and defines the core of the alignment. To use this script execute: -script full_alignment.pdb Then, in the prompt of the Rasmol viewer, you can select the core of the alignment ('select core') and manipulates it. 4. mass2csv ------------ Use this utility when you wish to analysis the alignment obtained by mass in a spreadsheet (e.g. it can be opened with Excel) A. Command line format mass2csv or mass2csv [-o ] B. Output A comma separated value (CSV) file, that summarizes all the alignments obtained by mass in a spreadsheet (e.g. it can be opened with Excel). By default, the name of the CSV file is "mass.csv". However, the user can specify a different name by using the -o option. 5. mass2fasta -------------- Use this utility when you wish to create a sequnece alignment in FASTA format, based on a structural alignment obtained by MASS. A. Command line format mass2fasta [-o ] B. Output A file that contains seqence alignment in FASTA format. By default, the name of the output file is "match.fasta". However, the user can specify a different name by using the -o option. Important to note: currently, only residues that are structurally aligned are presented in the output sequences. We are still thinking of how to give the complete sequences with the information of the structurally aligned residues.