thumb|upright=1.5|In metagenomics, the genetic materials ([[DNA, C) are extracted directly from samples taken from the environment (e.g. soil, sea water, human gut, A) after filtering (B), and are sequenced (E) after multiplication by cloning (D) in an approach called shotgun sequencing. These short sequences can then be put together again using assembly methods (F) to deduce the individual genomes or parts of genomes that constitute the original environmental sample. This information can then be used to study the species diversity and functional potential of the microbial community of the environment.

Metagenomic studies most commonly employ shotgun sequencing The field is also referred to as environmental genomics, ecogenomics, community genomics, or microbiomics and has significantly expanded the understanding of microbial life beyond what traditional cultivation-based methods can reveal.

Metagenomics is distinct from Amplicon sequencing, also referred to as Metabarcoding or PCR-based sequencing. The main difference is the underlying methodology, since metagenomics targets all DNA in a sample, while Amplicon sequencing amplifies and sequences one or multiple specific genes. Data utilisation also differs between these two approaches. Amplicon sequencing provides mainly community profiles detailing which taxa are present in a sample, whereas metagenomics also recovers encoded enzymes and pathways. Amplicon sequencing was frequently used in early environmental gene sequencing focused on assessing specific highly conserved marker genes, such as the 16S rRNA gene, to profile microbial diversity. These studies demonstrated that the vast majority of microbial biodiversity had been missed by cultivation-based methods. This led to the first report of isolating and cloning bulk DNA from an environmental sample, published by Pace and colleagues in 1991

Sequencing

thumb|200px|Flow diagram of a typical metagenome project

Recovery of DNA sequences longer than a few thousand base pairs from environmental samples was very difficult until recent advances in molecular biological techniques allowed the construction of libraries in bacterial artificial chromosomes (BACs), which provided better vectors for molecular cloning. One approach combines shotgun sequencing and chromosome conformation capture (Hi-C), which measures the proximity of any two DNA sequences within the same cell, to guide microbial genome assembly. Another technique is single-cell metagenomic sequencing, which resolves the heterogeneity present within the community.

Sequencing depth

An important consideration when sequencing for metagenomics is sequencing depth, the number of times each base is read by the sequencer; it can be thought of as resolution. The higher the sequencing depth, the larger the resultant file and number of contigs, and the higher the number of microbial genomes recovered. Higher depth metagenomes have been shown to have exceptionally high genome recovery, with tremendous novelty being reported. Low depth metagenomes have lower resolution on every taxonomic level than high depth samples.

Bioinformatics

[[File:WGS metagenomics analysis steps.gif|thumb|450px|Schematic representation of the main steps necessary for the analysis of whole metagenome shotgun sequencing-derived data.

Sequence pre-filtering

The first step of metagenomic data analysis requires the execution of certain pre-filtering steps, including the removal of redundant, low-quality sequences and sequences of probable eukaryotic origin (especially in metagenomes of human origin). The methods available for the removal of contaminating eukaryotic genomic DNA sequences include Eu-Detect and DeConseq. Metagenomic samples can also be affected by cross-sample contamination (or well-to-well leakage), in which microbial content is inadvertently exchanged between samples processed concurrently. Solutions include negative controls or control-free detection tools such as CroCoDeEL.

Assembly

DNA sequence data from genomic and metagenomic projects are essentially the same, but genomic sequence data offers higher coverage while metagenomic data is usually highly non-redundant. The use of reference genomes allows researchers to improve the assembly of the most abundant microbial species, but this approach is limited by the small subset of microbial phyla for which sequenced genomes are available.

Gene prediction

Metagenomic analysis pipelines use two approaches in the annotation of coding regions in the assembled contigs. Comparative analyses of metagenomes provide insights into how microbial communities vary across environments or hosts, helping to link community structure and function to ecological or health-related outcomes. Metadata on the environmental context of the metagenomic sample is important in comparative analyses, as it provides researchers with the ability to study the effect of habitat upon community structure and function. or gutSMASH, which map reads to reference databases (e.g. KEGG, COG) or detect biosynthetic/metabolic gene clusters, enabling statistical comparison of functional potential across samples. This gene-centric approach emphasizes the functional complement of the community as a whole rather than taxonomic groups, and shows that the functional complements are analogous under similar environmental conditions.

Agriculture

The soils in which plants grow are inhabited by microbial communities, with one gram of soil containing around 10<sup>9</sup>-10<sup>10</sup> microbial cells which comprise about one gigabase of sequence information. By allowing insights into the role of previously uncultivated or rare community members in nutrient cycling and the promotion of plant growth, metagenomic approaches can contribute to improved disease detection in crops and livestock and the adaptation of enhanced farming practices which improve crop health by harnessing the relationship between microbes and plants. debris filtered from the air, sample of dirt, or animal's faeces, and even detect diet items from blood meals. This can establish the range of invasive species and endangered species, and track seasonal populations.

Furthermore, metagenomics is utilized to assess the ecological impacts of anthropogenic pollution on environmental microbiomes. For instance, long-read whole-metagenome sequencing of soils in industrial technogenic zones has revealed that chronic heavy metal contamination fundamentally restructures the soil microbial community. Rather than significantly reducing overall biodiversity, intense environmental pressures drive strain-level adaptations, selecting for metal-resistant taxa and suppressing vulnerable phyla, which provides high-resolution insights into the natural bioremediation potential of polluted ecosystems.

Environmental remediation

Metagenomics can improve strategies for monitoring the impact of pollutants on ecosystems and for cleaning up contaminated environments. Increased understanding of how microbial communities cope with pollutants improves assessments of the potential of contaminated sites to recover from pollution and increases the chances of bioaugmentation or biostimulation trials to succeed.

In the Human Microbiome Project (HMP), gut microbial communities were assayed using high-throughput DNA sequencing. HMP showed that, unlike individual microbial species, many metabolic processes were present among all body habitats with varying frequencies. Microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals were studied as part of the human microbiome project. The metagenomic analysis revealed variations in niche specific abundance among 168 functional modules and 196 metabolic pathways within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. The HMP has brought to light the utility of metagenomics in diagnostics and evidence-based medicine. Thus metagenomics is a powerful tool to address many of the pressing issues in the field of personalized medicine. This can have implications in monitoring the spread of diseases from wildlife to farmed animals and humans.

Infectious disease diagnosis

Differentiating between infectious and non-infectious illness, and identifying the underlying etiology of infection, can be challenging. For example, more than half of cases of encephalitis remain undiagnosed, despite extensive testing using state-of-the-art clinical laboratory methods. Clinical metagenomic sequencing shows promise as a sensitive and rapid method to diagnose infection by comparing genetic material found in a patient's sample to databases of all known microscopic human pathogens and thousands of other bacterial, viral, fungal, and parasitic organisms and databases on antimicrobial resistances gene sequences with associated clinical phenotypes.

Arbovirus surveillance

Metagenomics is helpful to characterize the diversity and ecology of viruses spread by hematophagous (blood-feeding) arthropods such as mosquitoes and ticks, called arboviruses. It can also be used as a tool by public health officials and organizations to surveil arboviruses in circulation in wild arthropod populations.

Dietary estimation

Metagenomic Estimation of Dietary Intake (MEDI), enables reconstruction of individual dietary profiles by detecting food-derived DNA in human stool metagenomes. MEDI has shown concordance with food frequency questionnaires, tracked dietary shifts in infants, and identified diet–health associations in large cohorts without dietary records.

See also

  • Binning
  • Epidemiology and sewage
  • Metaproteomics
  • Microbial ecology
  • Pathogenomics
  • Virome analysis

References

  • Focus on Metagenomics at Nature Reviews Microbiology journal website
  • The “Critical Assessment of Metagenome Interpretation” (CAMI) initiative to evaluate methods in metagenomics

<!--

Please be cautious adding more external links.

Wikipedia is not a collection of links and should not be used for advertising.

Excessive or inappropriate links will be removed.

See Wikipedia:External links and Wikipedia:Spam for details.

If there are already suitable links, propose additions or replacements on

the article's talk page, or submit your link to the relevant category at

DMOZ (dmoz.org) and link there using .

-->