Protein engineering

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

There are two general strategies for protein engineering: rational protein design and directed evolution. These methods are not mutually exclusive; researchers will often apply both. In the future, more detailed knowledge of protein structure and function, and advances in high-throughput screening, may greatly expand the abilities of protein engineering. Eventually, even unnatural amino acids may be included, via newer methods, such as expanded genetic code, that allow encoding novel amino acids in genetic code.

The applications in numerous fields, including medicine and industrial bioprocessing, are vast and numerous.

Approaches

Rational design

In rational protein design, a scientist uses detailed knowledge of the structure and function of a protein to make desired changes. In general, this has the advantage of being inexpensive and technically easy, since site-directed mutagenesis methods are well-developed. However, its major drawback is that detailed structural knowledge of a protein is often unavailable, and, even when available, it can be very difficult to predict the effects of various mutations since structural information most often provide a static picture of a protein structure. However, programs such as Folding@home and Foldit have utilized crowdsourcing techniques in order to gain insight into the folding motifs of proteins.

Computational protein design algorithms seek to identify novel amino acid sequences that are low in energy when folded to the pre-specified target structure. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.

Multiple sequence alignment

Without structural information about a protein, sequence analysis is often useful in elucidating information about the protein. These techniques involve alignment of target protein sequences with other related protein sequences. This alignment can show which amino acids are conserved between species and are important for the function of the protein. These analyses can help to identify hot spot amino acids that can serve as the target sites for mutations. Multiple sequence alignment utilizes data bases such as PREFAB, SABMARK, OXBENCH, IRMBASE, and BALIBASE in order to cross reference target protein sequences with known sequences. Multiple sequence alignment techniques are listed below.

This method begins by performing pair wise alignment of sequences using k-tuple or Needleman–Wunsch methods. These methods calculate a matrix that depicts the pair wise similarity among the sequence pairs. Similarity scores are then transformed into distance scores that are used to produce a guide tree using the neighbor joining method. This guide tree is then employed to yield a multiple sequence alignment.

Multivalent proteins

Multivalent proteins are relatively easy to produce by post-translational modifications or multiplying the protein-coding DNA sequence. The main advantage of multivalent and multispecific proteins is that they can increase the effective affinity for a target of a known protein. In the case of an inhomogeneous target using a combination of proteins resulting in multispecific binding can increase specificity, which has high applicability in protein therapeutics.

The most common example for multivalent binding are the antibodies, and there is extensive research for bispecific antibodies. Applications of bispecific antibodies cover a broad spectrum that includes diagnosis, imaging, prophylaxis, and therapy.

Directed evolution

In directed evolution, random mutagenesis, e.g. by error-prone PCR or sequence saturation mutagenesis, is applied to a protein, and a selection regime is used to select variants having desired traits. Further rounds of mutation and selection are then applied. This method mimics natural evolution and, in general, produces superior results to rational design. An added process, termed DNA shuffling, mixes and matches pieces of successful variants to produce better results. Such processes mimic the recombination that occurs naturally during sexual reproduction. Advantages of directed evolution are that it requires no prior structural knowledge of a protein, nor is it necessary to be able to predict what effect a given mutation will have. Indeed, the results of directed evolution experiments are often surprising in that desired changes are often caused by mutations that were not expected to have some effect. The drawback is that they require high-throughput screening, which is not feasible for all proteins. Large amounts of recombinant DNA must be mutated and the products screened for desired traits. The large number of variants often requires expensive robotic equipment to automate the process. Further, not all desired activities can be screened for easily.

Laboratory methods can be used to imitate aspects of Darwinian evolution for the development of protein properties in applications such as catalysis. Various experimental techniques are available to generate large and diverse protein libraries and to screen or select variants with desired folding and functionality. Folded proteins can arise in random sequence space, which can be utilized in the development of binding proteins and catalysts. An alternative approach involves modifying existing proteins through random mutagenesis followed by selection or screening to optimize or alter their properties. This approach can serve as a basis for more advanced protein engineering efforts. Combining experimental evolution with computational methods may support the development of functional macromolecules not found in nature.

The main challenges of designing high quality mutant libraries have shown significant progress in the recent past. This progress has been in the form of better descriptions of the effects of mutational loads on protein traits. Also computational approaches have shown large advances in the innumerably large sequence space to more manageable screenable sizes, thus creating smart libraries of mutants. Library size has also been reduced to more screenable sizes by the identification of key beneficial residues using algorithms for systematic recombination. Finally a significant step forward toward efficient reengineering of enzymes has been made with the development of more accurate statistical models and algorithms quantifying and predicting coupled mutational effects on protein functions.

Generally, directed evolution may be summarized as an iterative two step process which involves generation of protein mutant libraries, and high throughput screening processes to select for variants with improved traits. This technique does not require prior knowledge of the protein structure and function relationship. Directed evolution utilizes random or focused mutagenesis to generate libraries of mutant proteins. Random mutations can be introduced using either error prone PCR, or site saturation mutagenesis. Mutants may also be generated using recombination of multiple homologous genes. Nature has evolved a limited number of beneficial sequences. Directed evolution makes it possible to identify undiscovered protein sequences which have novel functions. This ability is contingent on the proteins ability to tolerant amino acid residue substitutions without compromising folding or stability.

Advances in semi-rational enzyme engineering and de novo enzyme design offer researchers efficient frameworks for manipulating biocatalysts. The integration of sequence- and structure-based approaches in enzyme redesign has proven to significantly reduce variant iterations, eliminating the necessity for high-throughput analysis. However, current computational de novo and redesign methods are inferior to evolved variants in catalytic performance. To close this gap, design algorithm refinements — such as integrating protein dynamics into future simulations — are expected to further improve accuracy in structure predictions and enhance catalytic ability. or to convert from certain compounds into others (biotransformation). These products are useful as chemicals, pharmaceuticals, fuel, food, or agricultural additives.

An enzyme reactor consists of a vessel containing a reactional medium that is used to perform a desired conversion by enzymatic means. Enzymes used in this process are free in the solution.

Examples of engineered proteins

Computing methods have been used to design a protein with a novel fold, such as Top7, and sensors for unnatural molecules. The engineering of fusion proteins has yielded rilonacept, a pharmaceutical that has secured Food and Drug Administration (FDA) approval for treating cryopyrin-associated periodic syndrome.

Another computing method, IPRO, successfully engineered the switching of cofactor specificity of Candida boidinii xylose reductase. Iterative Protein Redesign and Optimization (IPRO) redesigns proteins to increase or give specificity to native or novel substrates and cofactors. This is done by repeatedly randomly perturbing the structure of the proteins around specified design positions, identifying the lowest energy combination of rotamers, and determining whether the new design has a lower binding energy than prior ones. The iterative nature of this process allows IPRO to make additive mutations to a protein sequence that collectively improve the specificity toward desired substrates and/or cofactors. A protein cage, E. coli bacterioferritin (EcBfr), which naturally shows structural instability and an incomplete self-assembly behavior by populating two oligomerization states, is the model protein in this study. Through computational analysis and comparison to its homologs, it has been found that this protein has a smaller-than-average dimeric interface on its two-fold symmetry axis due mainly to the existence of an interfacial water pocket centered on two water-bridged asparagine residues. To investigate the possibility of engineering EcBfr for modified structural stability, a semi-empirical computational method is used to virtually explore the energy differences of the 480 possible mutants at the dimeric interface relative to the wild type EcBfr. This computational study also converges on the water-bridged asparagines. Replacing these two asparagines with hydrophobic amino acids results in proteins that fold into alpha-helical monomers and assemble into cages as evidenced by circular dichroism and transmission electron microscopy. Both thermal and chemical denaturation confirm that, all redesigned proteins, in agreement with the calculations, possess increased stability. One of the three mutations shifts the population in favor of the higher order oligomerization state in solution as shown by both size exclusion chromatography and native gel electrophoresis. was developed to redesign bacterial channel protein (OmpF) to reduce its 1 nm pore size to any desired sub-nm dimension. Transport experiments on the narrowest designed pores revealed complete salt rejection when assembled in biomimetic block-polymer matrices.

References

External links

servers for protein engineering and related topics based on the WHAT IF software
Enzymes Built from Scratch – Researchers engineer never-before-seen catalysts using a new computational technique, Technology Review, March 10, 2008