Examples
Contact: oranit@tau.ac.il

[MASS Homepage] [Download Software] [Web Server] [FAQ] [Help]




Click on the images for a larger version.

Non-sequential alignments

Serine proteases:

The following 'Serine Proteinase' ensemble is known to be difficult to align using sequence information alone: 1sgt, 2pkaAB, 3est, 4chaA, 1ton, 2ptn, 3rp2A, 2alp, 2sga, 3sgbE. Mass has been applied on this ensemble. The size of the conserved region is 123 residues with an RMSD of 1.48. This region contains the two antiparallel beta-barrels (six strands each) that form the fold and the three residues of the catalytic triad (HIS-57, ASP-102, SER-195). Furthermore, residues HIS-57 and SER-195were found to be located on two conserved loops (55-59 and 189-197 respectively).

serine proteinase structurally aligned with MASS


Figure: 2ptn is shown completely in blue, except for its catalytic triad that is colored in red. For the rest of the proteins, only the core is shown where the two beta-barrels of the folds are colored in orange and the two conserved loops near the active site are colored in green. The rest of the core is colored in yellow.

 

[Top][MASS]


Non-topological alignments

Helix Bundle:

The following ten proteins belong to four different folds and six different superfamilies in the SCOP database: 1flx, 1aep, 1bbhA, 1bgeB, 1le2, 1rcb, 256bA,2ccyA, 2hmzA, 3inkC. The running time of MASS on this ensemble was 48 seconds. Figure a presents their structural alignment. 29 residues were structurally aligned within an RMSD of 2.43. MASS detected 4 conserved helices that form a bundle. Figures b and c show that the alignment is non-topological.     

                          

  A structural alignment of ten helix-bundle proteins (obtained by MASS)               The matching residues between ten helix-bundle structurally aligned proteins (obtained by MASS)


                       (a)                                                                                            (b)  


            TOPS digrams for the structural alignment of ten helix-bundle proteins as obtained by MASS
                                                                          (c)

 

Figure: (a) The structural alignment of all ten proteins. The core of the alignment is a bundle of four helices. (b) The alignment of the core residues. Secondary structure assignments, determined by the DSSP program, are shown in brackets next to the residue index (H stands for a helix). Residues of matched secondary structure regions are colored in gray and yellow alternately. Note that only a small loop of two residues is located between H4 (residues: 84-91) and H5 (residues: 93-105) of 256bA protein and between H2(53-60) and H3(63-70) of 3inkC protein. (c) The schematic TOPS representation. Triangles represent strands and circles helices. Corresponding secondary structure regions are drawn in the same color. As one can see the solution is non-topological.

EF-Hand:

MASS was applied to an ensemble of six proteins, taken from three different families of the 'EF-hand' SCOP superfamily: 4cpv, 2scpA, 2sas, 1top, 1scmB, 3icb. The proteins of this superfamily consist of two EF-hand motifs, each is made of two helices connected with a calcium-binding loop. For the six selected proteins MASS detected the four conserved helices of two EF-hand motifs (see Figure a). Figure b shows that the alignment is non-topological. The running time was 1 second. 

 

The structural alignment between four EF-hand proteins as obtained by MASSThe matched SSEs for the structural alignment between four EF-Hand proteins, as obtained by MASS

       
             (a)                                                                                           (b)

Figure: (a) The structural alignment of all six proteins. The backbone of all proteins is colored in gray while their common core is shown by assigning a different color to each protein. The four helices of the two motifs are conserved.  (b) The match between the four conserved helices. It shows that the alignment is non-topological.

 

[Top][MASS]


Subset alignments

PLP-dependent transferases:

We applied MASS to an ensemble of eleven proteins belonging to five different families of the 'PLP-dependent transferases' superfamily. The running time was 1min:21sec. All eleven proteins were aligned with a core containing 89 residues within an RMSD of 2.14. Additionally, although it is considered very hard to align all the seven strands of the sheet, especially the two external beta-strands due to a slight twist of the sheet in the different proteins, MASS detected a core containing the complete seven-stranded $\beta$-sheet of the fold and three alpha-helices (Figure a).

Moreover, a subset alignment of ten proteins has been detected. The core of the alignment contains 100 residues with an RMSD of 2.12. This core has an additional alpha-helix (see figure b). protein 1bt4 is an exception lacking this extra helix.

 

Structural Alignments between PLP proteins as obtained by MASS

Figure: (a) The structural alignment between all the eleven proteins of the ensemble. The backbone of all proteins is colored in gray while their common core is colored according to its secondary structure. (b) A subset alignment between only ten proteins (1bt4 is missing). This core contains an additional alpha-helix.

[Top][MASS]


Large scale alignments

MASS can be applied on the order of tens of proteins in practical running times on a standard PC. To demonstrate the efficiency of MASS on large sets of proteins, we have applied it to the following three ensembles: (i) serine proteases - all structures from the 'Prokaryotic trypsin-like serine protease' SCOP family (68 molecules); (ii) PK beta barrel - all structures from the 'Pyruvate kinase beta-barrel domain' SCOP family (66 molecules); and (iii) unrelated proteins - a compiled set of 80 proteins, where each protein was taken from a different SCOP fold. This set contains proteins from the four major SCOP classes: all-alpha, all-beta, alpha+beta and alpha\beta.


The details of the ensembles and the running times are summarized in the table below. The results show that the running time is influenced by: (i) the number of molecules; (ii) the average molecular size (and the average number of SSEs in a molecule); and (iii) the structural variance among the molecules. The first two parameters are expected and indeed increase the running time as they grow. For example, the average molecular size in the serine protease ensemble is 243 residues vs. 98 in the PK beta-barrel ensemble. The running times were 1h:25min and 1min:28sec respectively.  We attribute the difference in the running times to the difference in the molecular size, since the number of molecules in both ensembles is almost the same (68 and 66). Structural variance also influences the running time. The more structurally variable is the ensemble, the shorter the running time is. This is shown by comparing the serine protease ensemble to the unrelated compiled set of proteins, which represents a much more structurally variable set. The running times were 1h:25min and 22min respectively. This difference in the running times is attributed to the difference in the structural variance within each ensemble, since the average molecular size and the number of molecules in the unrelated set are higher than those of the serine proteases (300 and 80 vs. 243 and 68).

Ensemble Name

No. of Mol.

Avg. Mol. Size

Avg. No. of SSEs

Run Time (h:mm:ss)

serine proteases

68

243

13

1:25:03

PK beta-barrel

66

98

4

0:01:28

unrelated proteins

80

300

14

0:22:01


[Top][MASS]