Most of the three-dimensional macromolecular structure data in the Protein Data Bank were obtained by one of three methods: X-ray crystallography (over 80%), solution nuclear magnetic resonance (NMR) (about 16%) or theoretical modeling (2%). A few structures were determined by other methods. The first two are experimental methods. The empirical results of these experimental methods accurately describe the 3D structure of the molecule in the state in which measurements were made.
The results of NMR analysis are an ensemble of alternative models, in
contrast to the unique model obtained by
crystallography. Structures obtained by theoretical modeling tend to
be less accurate than those obtained by experimental methods. One kind
of modeling, called homology modeling, involves fitting a known sequence
to the experimentally determined 3D structure of a sequence-similar molecule.
Results of homology modeling are more likely to be reliable than are results
derived purely from theory (ab initio modeling).
X-rays are diffracted by the electrons of the molecules in the crystal,
so the result of successful crystallization and solution of the phase problem
is a 3D image of the electron clouds of the molecule (an electron density
map). One then interprets this image by building a model of the protein
to fit the map. A molecular model of the sequence of amino acids or nucleotides,
which must be known independently, is then fitted into this electron density
map, and a series of refinements are performed. The result is a set of
X, Y, Z Cartesian coordinates for every non-hydrogen atom in the molecule.
In order to get NMR resonances sufficiently sharp for adequate resolution, the molecule must tumble rapidly. This typically limits the size of the molecule to about 30 kD. Also, the protein must be soluble at high concentration (0.2-1 mM, 6-30 mg/ml) and stable for days without aggregation under the experimental conditions.
The result of NMR analysis is a set of estimates of distances between
specific pairs of atoms, called "constraints". Such constraints are obtained
both for bonded or non-bonded atom pairs (through-bond or through-space
distances). With a sufficient number of such constraints, the number of
configurations consistent with the data becomes finite. The result is an
ensemble of models, rather than a single structure. Often the positions
of atoms in the different models are averaged, and the average model is
then adjusted to obey normal bond distances and angles ("restrained minimization").
Most usually, the ensemble of structures, with perhaps 10 - 50 members,
all of which fit the NMR data and retain good stereochemistry is deposited
with the Protein Data Bank. Comparisons between the models in this ensemble
provide some information on how well the protein conformation was determined
by the NMR constraints.
The result of an NMR study is less detailed and accurate than that obtained by crystallographically. The RMSD (root mean square deviation) between these models is used to assess how well the structure calculations have converged. The best structures have backbone RMSD values of less than 1 Angstrom, provided there are not large motions of the backbone or substantially different conformations coexisting in solution.
Regions where there is a large spread between NMR models (often observed
in loops or at termini) may signify lack of sufficient information (larger
uncertainty), or actual motion or disorder of the molecule in solution.
One powerful aspect of NMR is the ability to measure the dynamics of each
residue which helps to distinguish between these possibilities.