thumb|[[Site saturation mutagenesis is a type of site-directed mutagenesis. This image shows the saturation mutagenesis of a single position in a theoretical 10-residue protein. The wild type version of the protein is shown at the top, with M representing the first amino acid methionine, and * representing the termination of translation. All 19 mutants of the isoleucine at position 5 are shown below.]]

thumb|How DNA libraries generated by [[Mutagenesis (molecular biology technique)#Random mutagenesis|random mutagenesis sample sequence space. The amino acid substituted into a given position is shown. Each dot or set of connected dots is one member of the library. Error-prone PCR randomly mutates some residues to other amino acids. Alanine scanning replaces each residue of the protein with alanine, one-by-one. Site saturation substitutes each of the 20 possible amino acids (or some subset of them) at a single position, one-by-one.]]

In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries (formed from reverse-transcribed RNA), genomic libraries (formed from genomic DNA) and randomized mutant libraries (formed by de novo gene synthesis where alternative nucleotides or codons are incorporated). DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria (a Bacterial Artificial Chromosome or BAC library) or yeast such that each organism contains on average one construct (vector + insert). As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated (thus, "cloned").

Terminology

The term "library" can refer to a population of organisms, each of which carries a DNA molecule inserted into a cloning vector, or alternatively to the collection of all of the cloned vector molecules.

cDNA libraries

A cDNA library represents a sample of the mRNA purified from a particular source (either a collection of cells, a particular tissue, or an entire organism), which has been converted back to a DNA template by the use of the enzyme reverse transcriptase. It thus represents the genes that were being actively transcribed in that particular source under the physiological, developmental, or environmental conditions that existed when the mRNA was purified. cDNA libraries can be generated using techniques that promote "full-length" clones or under conditions that generate shorter fragments used for the identification of "expressed sequence tags".

cDNA libraries are useful in reverse genetics, but they only represent a very small (less than 1%) portion of the overall genome in a given organism.

Applications of cDNA libraries include:

  • Discovery of novel genes
  • Cloning of full-length cDNA molecules for in vitro study of gene function
  • Study of the repertoire of mRNAs expressed in different cells or tissues
  • Study of alternative splicing in different cells or tissues

Genomic libraries

A genomic library is a set of clones that together represents the entire genome of a given organism. The number of clones that constitute a genomic library depends on (1) the size of the genome in question and (2) the insert size tolerated by the particular cloning vector system. For most practical purposes, the tissue source of the genomic DNA is unimportant because each cell of the body contains virtually identical DNA (with some exceptions).

Applications of genomic libraries include:

  • Determining the complete genome sequence of a given organism (see genome project)
  • Serving as a source of genomic sequence for generation of transgenic animals through genetic engineering
  • Study of the function of regulatory sequences in vitro
  • Study of genetic mutations in cancer tissues

Synthetic mutant libraries

thumb|Depiction of one common way to clone a site-directed mutagenesis library (i.e., using degenerate oligos). The gene of interest is PCRed with oligos that contain a region that is perfectly complementary to the template (blue), and one that differs from the template by one or more nucleotides (red). Many such primers containing degeneracy in the non-complementary region are pooled into the same PCR, resulting in many different PCR products with different mutations in that region (individual mutants shown with different colors below).

In contrast to the library types described above, a variety of artificial methods exist for making libraries of variant genes. Variation throughout the gene can be introduced randomly by either error-prone PCR, DNA shuffling to recombine parts of similar genes together, or transposon-based methods to introduce indels.

Alternatively, mutations can be targeted to specific codons during de novo synthesis or saturation mutagenesis to construct one or more point mutants of a gene in a controlled way. This results in a mixture of double stranded DNA molecules which represent variants of the original gene.

The expressed proteins from these libraries can then be screened for variants which exhibit favorable properties (e.g. stability, binding affinity or enzyme activity). This can be repeated in cycles of creating gene variants and screening the expression products in a directed evolution process.