thumb|Home page of a biological database called characterises functional links between proteins|350 px|right

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.)

Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.

Major biological databases

These tables cover a variety of notable biological databses across a wide swath of fields, specialties, data types, and use-cases. Many of these databases are collated in the ELIXIR Core Data Resource list which collects important European data resources critical to life science research.

{| class="wikitable sortable"

|+ELIXIR Core Data Resources

!Resource

!Category

!Host institution

!Description

|Transcriptomics

|EMBL-EBI /

European Nucleotide Archive

|Microbiology

|Leibniz Institute DSMZ

|Taxonomy, morphology, physiology, and ecology of bacterial and archaeal strains

|-

|Bgee

|Transcriptomics

|Swiss Institute of Bioinformatics / University of Lausanne

|Imaging

|EMBL-EBI

|Metadata

|EMBL-EBI

|Enzymology

|Leibniz Institute DSMZ

|Protein Structure

|University College London

|Cells

|Swiss Institute of Bioinformatics

|Biochemistry

|EMBL-EBI

|Biochemistry

|EMBL-EBI

|Genome

|EMBL-EBI / CRG

|Biochemical Structure

|EMBL-EBI

|Sequence

|EMBL-EBI

|Genome

|EMBL-EBI

|Genome

|EMBL-EBI

|Literature

|EMBL-EBI

|Variation

|EMBL-EBI / NHGRI

|Curated collection of human genome-wide association studies

|-

|HGNC

|Nomenclature

|University of Cambridge / EMBL-EBI

|Proteomics

|KTH Royal Institute of Technology / Karolinska Institute / Uppsala Universitet / MINT (IMEx)

|Interactions

|EMBL-EBI

|Protein

|EMBL-EBI

|Regulation

|University of Oslo

|Metagenomics

|EMBL-EBI

|Lipidomics

|Cardiff University / UCSD / Babraham Institute / Swansea University / University of Edinburgh

|Nomenclature

|Leibniz Institute DSMZ

|Disease

|Inserm / French Ministry of Health

|Orthology

|Swiss Institute of Bioinformatics / University of Lausanne

|Orthology

|Swiss Institute of Bioinformatics / University of Geneva

|Structure

|EMBL-EBI

|Genome

|University of Cambridge / University College London / Babraham Institute

|Proteomics

|EMBL-EBI

|Pathways

|EMBL-EBI / OICR / NYU

|Chemistry

|Swiss Institute of Bioinformatics

|Sequence

|Leibniz Institute DSMZ

|Interactions

|Swiss Institute of Bioinformatics / Novo Nordisk Foundation Center Protein Research / EMBL-EBI

|Structure

|Swiss Institute of Bioinformatics / University of Basel

|Protein

|EMBL-EBI / SIB / PIR

|Pathogens

|University of Pennsylvania / University of Georgia / University of Liverpool for the fission yeast Schizosaccharomyces pombe, FlyBase for Drosophila, WormBase for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae, and Xenbase for Xenopus tropicalis and Xenopus laevis frogs.

Biodiversity and species databases

thumb|Animal groups and their number of species from the [[Catalogue of Life]]

Numerous databases attempt to document the diversity of life on earth. A prominent example is the Catalogue of Life, first created in 2001 by Species 2000 and the Integrated Taxonomic Information System. The Catalogue of Life is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world. The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total, the Catalogue of Life draws from 165 databases as of May 2022. Operational costs of the Catalogue of Life are paid for by the Global Biodiversity Information Facility, the Illinois Natural History Survey, the Naturalis Biodiversity Center, and the Smithsonian Institution.

Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species. Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of the bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known.

Medical databases

thumb|Foot wounds from WoundsDB

Medical databases are a special case of biomedical data resource and can range from bibliographies, such as PubMed, to image databases for the development of AI based diagnostic software. For instance, one such image database was developed with the goal of aiding in the development of wound monitoring algorithms. Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets. The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website.

Publications

Biological databases are commonly described and updated through peer-reviewed publications, which serve both as documentation and as a means of community dissemination.

A major venue for such publications is the annual Nucleic Acids Research (NAR) Database Issue, typically published in January. This special issue presents articles describing new biological databases as well as updates to existing resources, and is accompanied by the NAR online Molecular Biology Database Collection.

Dedicated journals focusing on biological data resources include Database: The Journal of Biological Databases and Curation and GigaScience, which publish articles describing databases, curated resources, and large-scale datasets, often alongside associated computational tools and workflows.

In addition, general data-focused journals such as Scientific Data publish descriptions of datasets across a wide range of scientific disciplines, including but not limited to the life sciences.

See also

  • Biobank
  • Biological data
  • Chemical database
  • Death Domain database
  • European Bioinformatics Institute
  • Gene Disease Database
  • Integrative bioinformatics
  • List of biological databases
  • Model organism databases
  • NCBI
  • PubMed (a database of biomedical literature)
  • Database: The Journal of Biological Databases and Curation

References

  • Interactive list of biological databases, classified by categories, from Nucleic Acids Research, 2010
  • DBD: Database of Biological Databases
  • Biosharing (a database of biological databases)
  • Chronic Wounds Database WoundsDB
  • Catalogue of Life Catalogue of Life