thumbnail|Phylogenetic tree of the Mup gene family

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

Genes are categorized into families based on shared nucleotide or protein sequences. Phylogenetic techniques can be used as a more rigorous test. The positions of exons within the coding sequence can be used to infer common ancestry. Knowing the sequence of the protein encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more information than similarities or differences among DNA sequences.

If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family.

The expansion or contraction of gene families along a specific lineage can be due to chance, or can be the result of

natural selection. To distinguish between these two cases is often difficult in practice. Recent work uses a combination

of statistical models and algorithmic techniques to detect gene families that are under the effect of natural selection.

The HUGO Gene Nomenclature Committee (HGNC) creates nomenclature schemes using a "stem" (or "root") symbol for members of a gene family (by homology or function), with a hierarchical numbering system to distinguish the individual members. For example, for the peroxiredoxin family, PRDX is the root symbol, and the family members are PRDX1, PRDX2, PRDX3, PRDX4, PRDX5, and PRDX6.

Basic structure

thumb|400x400px|Gene [[Phylogenetic tree|phylogeny as lines within grey species phylogeny. Top: An ancestral gene duplication produces two paralogs (histone H1.1 and 1.2). A speciation event produces orthologs in the two daughter species (human and chimpanzee). Bottom: in a separate species (E. coli), a gene has a similar function (histone-like nucleoid-structuring protein) but has a separate evolutionary origin and so is an analog.]]One level of genome organization is the grouping of genes into several gene families. Gene families are groups of related genes that share a common ancestor. Members of gene families may be paralogs or orthologs. Gene paralogs are genes with similar sequences from within the same species while gene orthologs are genes with similar sequences in different species. Gene families are highly variable in size, sequence diversity, and arrangement. Depending on the diversity and functions of the genes within the family, families can be classified as multigene families or superfamilies.

Multigene families typically consist of members with similar sequences and functions, though a high degree of divergence (at the sequence and/or functional level) does not lead to the removal of a gene from a gene family. Individual genes in the family may be arranged close together on the same chromosome or dispersed throughout the genome on different chromosomes. Due to the similarity of their sequences and their overlapping functions, individual genes in the family often share regulatory control elements. Different types of pseudogenes exist. Non-processed pseudogenes are genes that acquired mutations over time becoming non-functional. Processed pseudogenes are genes that have lost their function after being moved around the genome by retrotransposition.

Evolution

Gene families, part of a hierarchy of information storage in a genome, play a large role in the evolution and diversity of multicellular organisms. Gene families are large units of information and genetic variability.

An adaptive expansion of a single gene into many initially identical copies occurs when natural selection would favour additional gene copies. This is the case when an environmental stressor acts on a species. Gene amplification is more common in bacteria and is a reversible process. Contraction of gene families commonly results from accumulation of loss of function mutations. A nonsense mutation which prematurely halts gene transcription becomes fixed in the population, leading to the loss of genes. This process occurs when changes in the environment render a gene redundant. As a result, a stem can also refer to genes that have the same function, often part of the same protein complex. For example, BRCA1 and BRCA2 are unrelated genes that are both named for their role in breast cancer and RPS2 and RPS3 are unrelated ribosomal proteins found in the same small subunit.

The HGNC also maintains a "gene group" (formerly "gene family") classification. A gene can be a member of multiple groups, and all groups form a hierarchy. As with the stem classification, both structural and functional groups exist.