Methods of resolving protein 3D structures

Most of the three-dimensional macromolecular structure data in the Protein Data Bank were obtained by one of three methods: X-ray crystallography (over 80%), solution nuclear magnetic resonance (NMR) (about 16%) or theoretical modeling (2%). A few structures were determined by other methods. The first two are experimental methods. The empirical results of these experimental methods accurately describe the 3D structure of the molecule in the state in which measurements were made.

The results of NMR analysis are an ensemble of alternative models, in contrast to the unique model obtained by
crystallography. Structures obtained by theoretical modeling tend to be less accurate than those obtained by experimental methods. One kind of modeling, called homology modeling, involves fitting a known sequence to the experimentally determined 3D structure of a sequence-similar molecule. Results of homology modeling are more likely to be reliable than are results derived purely from theory (ab initio modeling).

X-Ray Crystallography

The 3D structure of a macromolecule can be determined by X-ray diffraction from crystals. First, the molecule must be crystallized, and the crystals must be singular (not 2 or more stuck together) and of perfect quality. Countless attempts to determine molecular structures have failed at this stage. Once a crystal is obtained, a diffraction pattern is produced by X-irradiation. This pattern consists of thousands of spots which are the raw data. The position and intensity of each spot is relatively easily determined, but the phases of the waves which formed each spot must also be determined in order to produce an electron density map.

X-rays are diffracted by the electrons of the molecules in the crystal, so the result of successful crystallization and solution of the phase problem is a 3D image of the electron clouds of the molecule (an electron density map). One then interprets this image by building a model of the protein to fit the map. A molecular model of the sequence of amino acids or nucleotides, which must be known independently, is then fitted into this electron density map, and a series of refinements are performed. The result is a set of X, Y, Z Cartesian coordinates for every non-hydrogen atom in the molecule.

Solution Nuclear Magnetic Resonance

Solution nuclear magnetic resonance is performed on an aqueous solution of macromolecules, while the molecules tumble and vibrate with thermal motion. NMR detects chemical shifts of atomic nuclei with nonzero spin. The shifts depend on the electronic environments of the nuclei, the identities and distances of nearby atoms. 1H is the only naturally occuring atom in proteins that can be observed by NMR.

In order to get NMR resonances sufficiently sharp for adequate resolution, the molecule must tumble rapidly. This typically limits the size of the molecule to about 30 kD. Also, the protein must be soluble at high concentration (0.2-1 mM, 6-30 mg/ml) and stable for days without aggregation under the experimental conditions.

The result of NMR analysis is a set of estimates of distances between specific pairs of atoms, called "constraints". Such constraints are obtained both for bonded or non-bonded atom pairs (through-bond or through-space distances). With a sufficient number of such constraints, the number of configurations consistent with the data becomes finite. The result is an ensemble of models, rather than a single structure. Often the positions of atoms in the different models are averaged, and the average model is then adjusted to obey normal bond distances and angles ("restrained minimization").
Most usually, the ensemble of structures, with perhaps 10 - 50 members, all of which fit the NMR data and retain good stereochemistry is deposited with the Protein Data Bank. Comparisons between the models in this ensemble provide some information on how well the protein conformation was determined by the NMR constraints.

The result of an NMR study is less detailed and accurate than that obtained by crystallographically. The RMSD (root mean square deviation) between these models is used to assess how well the structure calculations have converged. The best structures have backbone RMSD values of less than 1 Angstrom, provided there are not large motions of the backbone or substantially different conformations coexisting in solution.

Regions where there is a large spread between NMR models (often observed in loops or at termini) may signify lack of sufficient information (larger uncertainty), or actual motion or disorder of the molecule in solution. One powerful aspect of NMR is the ability to measure the dynamics of each residue which helps to distinguish between these possibilities.