Protein and peptide analysis employs many different tools to identify and characterize the immense variety of proteins that are expressed in a cell of any organism. Traditionally, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) is one of the first tools that can be used to analyze and characterize proteins or peptides, however as a one-dimensional gel electrophoresis, it is limited to the analysis of refined or purified protein mixtures with less than 100 different proteins.

To analyze the whole protein complement that can be expressed by a particular cell, the two dimensional- (2D) gel electrophoresis should be employed, which consist of a isoelectric focusing step in one direction of the gel and the SDS-PAGE step in the second direction of the gel. Only one sample can be analyzed on a single gel, but a higher resolution can be achieved this way. Still some protein mixtures may be too complex to be analyzed without prior fractionation; therefore most protein analysis should include some form of separation and purification before more complex analysis methods are employed. Usually any type of column chromatography, like size exclusion, ion- exchange, hydrophobic interaction or affinity chromatography is used to separate the protein mixture into smaller fractions that can then be analyzed and characterized. While the previously mentioned techniques are useful to separate and characterize a protein or peptide based on charge or molecular size, they are not capable of identifying the protein or peptide.

A traditional technique to identify the sequence of a protein or peptide is the Edman degradation procedure, where a peptide is degraded stepwise by phenylisothiocyanate, cleaved by trifluoroacetic acid and converted in aqueous HCl to phenylthiohydantoin, repeatedly until the entire protein is sequenced. Since this process is slow and has a low throughput it is no longer used in most proteomics research.

The major advancement in proteomics research came with the reliable adaptation of mass spectrometry (MS) for peptide analysis. Since most proteins are too large and complex for direct MS analysis a pre-digestion into smaller peptides by specific proteases, for example trypsin, is usually performed before analysis. In mass spectrometry the fragmented molecules are characterized by their mass to charge ratio (m/z). In order to be analyzed the particles have to be ionized into a gas-phase and are then analyzed by a mass analyzer which is coupled with a ion detector to determine the mass to charge ratio of the ion. Mass spectrometers are differentiated by their ionization source, their ion detector and mass analyzer. The most common ionization sources are MALDI (Matrix-assisted laser desorption ionization) and ESI (Electrospray ionization). Mass analyzer and ion detectors used are TOF (Time of Flight) analyzer, Quadrupol mass filter and Ion traps, such as three-dimensional Quadrupol ion trap, linear Quadrupol ion trap and orbitrap. Another common approach is the use of tandem mass spectrometry or MS/MS, where the ionized peptides are first resolved in the first mass analyzer, which separates and isolates one species at a time that is then sent to a collision cell. In the collision cell the peptide is further fragmented, and then sent to the second mass analyzer. There the mass of the fragments is precisely analyzed and a partial amino acid sequence of the peptide can be determined. These partial sequences, called peptide sequence tags, can then be used in database searches to identify the protein or peptide.

The final step after mass spectrometry analysis is to identify the protein or peptide sequence by database search or De novo sequencing. The large amount of peptide sequence tags, generated by the mass spectrometer can be compared to a database of known protein and peptide sequences to identify the actual proteins and peptide present in the sample. Databases like UniProt or SwissProt contain large amounts of information regarding protein and peptide sequences for a large variety of species. Search engines like SEQUEST or PEAKS are employed to compare the mass spectrometry results to the databases and identify the proteins with a certain amount of accuracy, defined by probability parameters. To identify the error rate of false identification of certain amino acid sequences a decoy database search can be employed, where, the mass spectra are compared to the original database and a database where the amino acid sequences are reversed. A match in both databases would be considered false and increase the False Discovery Rate (FDR) of the protein or peptide detection.

When the protein or peptide sequence is not known in any database De novo sequencing can be employed to determine the peptide amino acid sequence. De novo means “anew” or “all over” and describes the fact that the peptide has to be characterized without additional knowledge. In this regard the Edman degradation can be characterized as De novo sequencing. In mass spectrometry a labeling technique for the C-terminal amino acid can be used to identify the side by the additional mass in the mass spectrometer and the correct sequence of a peptide can be analyzed by tandem mass spectrometry. Of course the more information is available of the original protein or peptide the more accurate are the results of the mass spectrometry analysis.


Crynen, S. (2011). Bioactive properties of peptides derived from enzymatic hydrolysis of cod muscle myosin with trypsin, chymotrypsin and elastase. University of Florida.