Quantitative genetics

Quantitative genetics is the study of quantitative traits, which are phenotypes that vary continuously—such as height or mass—as opposed to phenotypes and gene-products that are discretely identifiable—such as eye-colour, or the presence of a particular biochemical.

Both of these branches of genetics use the frequencies of different alleles of a gene in breeding populations (gamodemes), and combine them with concepts from simple Mendelian inheritance to analyze inheritance patterns across generations and descendant lines. While population genetics can focus on particular genes and their subsequent metabolic products, quantitative genetics focuses more on the outward phenotypes, and makes only summaries of the underlying genetics.

Due to the continuous distribution of phenotypic values, quantitative genetics must employ many other statistical methods (such as the effect size, the mean and the variance) to link phenotypes (attributes) to genotypes. Some phenotypes may be analyzed either as discrete categories or as continuous phenotypes, depending on the definition of cut-off points, or on the metric used to quantify them. Mendel himself had to discuss this matter in his famous paper, especially with respect to his peas' attribute tall/dwarf, which actually was derived by adding a cut-off point to "length of stem". Analysis of quantitative trait loci, or QTLs, is a more recent addition to quantitative genetics, linking it more directly to molecular genetics.

Gene effects

In diploid organisms, the average genotypic "value" (locus value) may be defined by the allele "effect" together with a dominance effect, and also by how genes interact with genes at other loci (epistasis). The founder of quantitative genetics - Sir Ronald Fisher - perceived much of this when he proposed the first mathematics of this branch of genetics.

thumb|right | 300px | Gene effects and phenotype values

Being a statistician, he defined the gene effects as deviations from a central value—enabling the use of statistical concepts such as mean and variance, which use this idea. The central value he chose for the gene was the midpoint between the two opposing homozygotes at the one locus. The deviation from there to the "greater" homozygous genotype can be named "+a"; and therefore it is "-a" from that same midpoint to the "lesser" homozygote genotype. This is the "allele" effect mentioned above. The heterozygote deviation from the same midpoint can be named "d", this being the "dominance" effect referred to above. The diagram depicts the idea. However, in reality we measure phenotypes, and the figure also shows how observed phenotypes relate to the gene effects. Formal definitions of these effects recognize this phenotypic focus. Epistasis has been approached statistically as interaction (i.e., inconsistencies), but epigenetics suggests a new approach may be needed.

If 0<d<a, the dominance is regarded as partial or incomplete—while d=a indicates full or classical dominance. Previously, d>a was known as "over-dominance". This historical example illustrates clearly how phenotype values and gene effects are linked.

Allele and genotype frequencies

thumb|left | 400px | Analysis of sexual reproduction

To obtain means, variances and other statistical values, both quantities and their occurrences are required. The gene effects (above) provide the framework for quantities: and the frequencies of the contrasting alleles in the fertilization gamete-pool provide the information on occurrences.

Commonly, the frequency of the allele causing "more" in the phenotype (including dominance) is given the symbol p, while the frequency of the contrasting allele is q. An initial assumption made when establishing the algebra was that the parental population was infinite and random mating, which was made simply to facilitate the derivation. The subsequent mathematical development also implied that the frequency distribution within the effective gamete-pool was uniform: there were no local perturbations where p and q varied. Looking at the diagrammatic analysis of sexual reproduction, this is the same as declaring that pP = pg = p; and similarly for q. as gamete distribution may be limited, for example by dispersal restrictions or by behaviour, or by chance sampling (those local perturbations mentioned above). It is well known that there is a huge wastage of gametes in nature, which is why the diagram depicts a potential gamete-pool separately to the actual gamete-pool. Only the latter sets the definitive frequencies for the zygotes: this is the true "gamodeme" ("gamo" refers to the gametes, and "deme" derives from Greek for "population"). But, under Fisher's assumptions, the gamodeme can be effectively extended back to the potential gamete-pool, and even back to the parental base-population (the "source" population). The random sampling arising when small "actual" gamete-pools are sampled from a large "potential" gamete-pool is known as genetic drift, and is considered subsequently.

While panmixia may not be widely extant, the potential for it does occur, although it may be only ephemeral because of those local perturbations. It has been shown, for example, that the F2 derived from random fertilization of F1 individuals (an allogamous F2), following hybridization, is an origin of a new potentially panmictic population. It has also been shown that if panmictic random fertilization occurred continually, it would maintain the same allele and genotype frequencies across each successive panmictic sexual generation—this being the Hardy–Weinberg equilibrium. However, as soon as genetic drift was initiated by local random sampling of gametes, the equilibrium would cease.

Random fertilization

Male and female gametes within the actual fertilizing pool are considered usually to have the same frequencies for their corresponding alleles. (Exceptions have been considered.) This means that when p male gametes carrying the A allele randomly fertilize p female gametes carrying that same allele, the resulting zygote has genotype AA, and, under random fertilization, the combination occurs with a frequency of p x p (= p2). Similarly, the zygote aa occurs with a frequency of q2. Heterozygotes (Aa) can arise in two ways: when p male (A allele) randomly fertilize q female (a allele) gametes, and vice versa. The resulting frequency for the heterozygous zygotes is thus 2pq.

Mendel's research cross – a contrast

Mendel's pea experiments were constructed by establishing true-breeding parents with "opposite" phenotypes for each attribute. The F2 generation was produced by natural self-pollination of the F1 (with monitoring against insect contamination), resulting in p = q = being maintained. Such an F2 is said to be "autogamous". However, the genotype frequencies (0.25 TT, 0.5 Tt, 0.25 tt) have arisen through a mating system very different from random fertilization, and therefore the use of the quadratic expansion has been avoided. The numerical values obtained were the same as those for random fertilization only because this is the special case of having originally crossed homozygous opposite parents. We can notice that, because of the dominance of T- [frequency (0.25 + 0.5)] over tt [frequency 0.25], the 3:1 ratio is still obtained.

A cross such as Mendel's, where true-breeding (largely homozygous) opposite parents are crossed in a controlled way to produce an F1, is a special case of hybrid structure. The F1 is often regarded as "entirely heterozygous" for the gene under consideration. However, this is an over-simplification and does not apply generally—for example when individual parents are not homozygous, or when populations inter-hybridise to form hybrid swarms. Arising from this background, the inbreeding coefficient (often symbolized as F or f) quantifies the effect of inbreeding from whatever cause. There are several formal definitions of f, and some of these are considered in later sections. For the present, note that for a long-term self-fertilized species f = 1.

Natural self-fertilized populations are not single " pure lines ", however, but mixtures of such lines. This becomes particularly obvious when considering more than one gene at a time. Therefore, allele frequencies (p and q) other than 1 or 0 are still relevant in these cases (refer back to the Mendel Cross section). The genotype frequencies take a different form, however.

In general, the genotype frequencies become <math display="inline">[p^2(1-f)+pf]</math> for AA and <math display="inline">2pq(1-f)</math> for Aa and <math display="inline">[q^2(1-f)+qf]</math> for aa. Assuming for now that one gene only is represented, a = 5.45 cm, d = 0.12 cm [virtually "0", really], mp = 12.05 cm. Further assuming that p = 0.6 and q = 0.4 in this example population, then:

G = 5.45 (0.6 − 0.4) + (0.48)0.12 = 1.15 cm (rounded); and

P = 1.15 + 12.05 = 13.20 cm (rounded).

The mean after long-term self-fertilization

The contribution of AA is <math display="inline"> p (+a)</math>, while that of aa is <math display="inline"> q (-a)</math>. [See above for the frequencies.] Gathering these two a terms together leads to an immediately very simple final result:

<math display="inline"> G_{(f=1)} = a(p-q)</math>. As before, <math display="inline"> P = G + mp</math>.

Often, "G(f=1)" is abbreviated to "G1".

Mendel's peas can provide us with the allele effects and midpoint (see previously); and a mixed self-pollinated population with p = 0.6 and q = 0.4 provides example frequencies. Thus:

G(f=1) = 82 (0.6 − .04) = 59.6 cm (rounded); and

P(f=1) = 59.6 + 116 = 175.6 cm (rounded).

The mean – generalized fertilization

A general formula incorporates the inbreeding coefficient f, and can then accommodate any situation. The procedure is exactly the same as before, using the weighted genotype frequencies given earlier. After translation into our symbols, and further rearrangement: Each sampling "packet" involves 2N alleles, and produces N zygotes (a "progeny" or a "line") as a result. During the course of the reproductive period, this sampling is repeated over and over, so that the final result is a mixture of sample progenies. The result is dispersed random fertilization <math> \left( \bigodot \right) </math> These events, and the overall end-result, are examined here with an illustrative example.

The "base" allele frequencies of the example are those of the potential gamodeme: the frequency of A is pg = 0.75, while the frequency of a is qg = 0.25. [White label "1" in the diagram.] Five example actual gamodemes are binomially sampled out of this base (s = the number of samples = 5), and each sample is designated with an "index" k: with k = 1 .... s sequentially. (These are the sampling "packets" referred to in the previous paragraph.) The number of gametes involved in fertilization varies from sample to sample, and is given as 2Nk [at white label "2" in the diagram]. The total (Σ) number of gametes sampled overall is 52 [white label "3" in the diagram]. Because each sample has its own size, weights are needed to obtain averages (and other statistics) when obtaining the overall results. These are <math display="inline"> \omega_k = 2N_k / (\sum_{k}^s 2N_k) </math>, and are given at white label "4" in the diagram.

thumb|400px|right|Genetic drift example analysis

The sample gamodemes – genetic drift

Following completion of these five binomial sampling events, the resultant actual gamodemes each contained different allele frequencies—(pk and qk). [These are given at white label "5" in the diagram.] This outcome is actually the genetic drift itself. Notice that two samples (k = 1 and 5) happen to have the same frequencies as the base (potential) gamodeme. Another (k = 3) happens to have the p and q "reversed". Sample (k = 2) happens to be an "extreme" case, with pk = 0.9 and qk = 0.1; while the remaining sample (k = 4) is "middle of the range" in its allele frequencies. All of these results have arisen only by "chance", through binomial sampling. Having occurred, however, they set in place all the downstream properties of the progenies.

Because sampling involves chance, the probabilities ( k ) of obtaining each of these samples become of interest. These binomial probabilities depend on the starting frequencies (pg and qg) and the sample size (2Nk). They are tedious to obtain, [See to the right of black label "5" in the diagram]. [Further discussion on this variance occurs in the section below on Extensive genetic drift.]

The progeny lines – dispersion

The genotype frequencies of the five sample progenies are obtained from the usual quadratic expansion of their respective allele frequencies (random fertilization). The results are given at the diagram's white label "7" for the homozygotes, and at white label "8" for the heterozygotes. Re-arrangement in this manner prepares the way for monitoring inbreeding levels. This can be done either by examining the level of total homozygosis [(p2k + q2k) = (1 − 2pkqk)], or by examining the level of heterozygosis (2pkqk), as they are complementary. Notice that samples k= 1, 3, 5 all had the same level of heterozygosis, despite one being the "mirror image" of the others with respect to allele frequencies. The "extreme" allele-frequency case (k= 2) had the most homozygosis (least heterozygosis) of any sample. The "middle of the range" case (k= 4) had the least homozygosity (most heterozygosity): they were each equal at 0.50, in fact.

The overall summary can continue by obtaining the weighted average of the respective genotype frequencies for the progeny bulk. Thus, for AA, it is <math display="inline"> p^2_\centerdot = \sum_k^s \omega_k \ p_k^2 </math>, for Aa, it is <math display="inline"> 2p_\centerdot q_\centerdot = \sum_k^s \omega_k \ 2 p_k q_k </math> and for aa, it is <math display="inline"> q_\centerdot^2 = \sum_k^s \omega_k \ q_k^2 </math>. The example results are given at black label "7" for the homozygotes, and at black label "8" for the heterozygote. Note that the heterozygosity mean is 0.3588, which the next section uses to examine inbreeding resulting from this genetic drift.

The next focus of interest is the dispersion itself, which refers to the "spreading apart" of the progenies' population means. These are obtained as <math display="inline"> G_k = a (p_k - q_k) + 2p_k q_k d </math> [see section on the Population mean], for each sample progeny in turn, using the example gene effects given at white label "9" in the diagram. Then, each <math display="inline"> P_k = G_k + mp </math> is obtained also [at white label "10" in the diagram]. Notice that the "best" line (k = 2) had the highest allele frequency for the "more" allele (A) (it also had the highest level of homozygosity). The worst progeny (k = 3) had the highest frequency for the "less" allele (a), which accounted for its poor performance. This "poor" line was less homozygous than the "best" line; and it shared the same level of homozygosity, in fact, as the two second-best lines (k = 1, 5). The progeny line with both the "more" and the "less" alleles present in equal frequency (k = 4) had a mean below the overall average (see next paragraph), and had the lowest level of homozygosity. These results reveal the fact that the alleles most prevalent in the "gene-pool" (also called the "germplasm") determine performance, not the level of homozygosity per se. Binomial sampling alone effects this dispersion.

The overall summary can now be concluded by obtaining <math display="inline"> G_{\centerdot} = \sum_k^s \omega_k \ G_k </math> and <math display="inline"> P_{\centerdot} = \sum_k^s \omega_k \ P_k </math>. The example result for P• is 36.94 (black label "10" in the diagram). This later is used to quantify inbreeding depression overall, from the gamete sampling. [See the next section.] However, recall that some "non-depressed" progeny means have been identified already (k = 1, 2, 5). This is an enigma of inbreeding—while there may be "depression" overall, there are usually superior lines among the gamodeme samplings.

The equivalent post-dispersion panmictic – inbreeding

Included in the overall summary were the average allele frequencies in the mixture of progeny lines (p• and q•). These can now be used to construct a hypothetical panmictic equivalent. For the example, these frequency changes are 0.1069 and 0.1070, respectively. This result is different to the above, indicating that bias with respect to the full underlying distribution is present in the example. For the example itself, these latter values are the better ones to use, namely f• = 0.10695.

The population mean of the equivalent panmictic is found as [a (p•-q•) + 2 p•q• d] + mp. Using the example gene effects (white label "9" in the diagram), this mean is <math display="inline"> P_{\centerdot} = </math> 37.87. The equivalent mean in the dispersed bulk is 36.94 (black label "10"), which is depressed by the amount 0.93. This is the inbreeding depression from this Genetic Drift. However, as noted previously, three progenies were not depressed (k = 1, 2, 5), and had means even greater than that of the panmictic equivalent. These are the lines a plant breeder looks for in a line selection programme.

Extensive binomial sampling – is panmixia restored?

If the number of binomial samples is large (s → ∞ ), then p• → pg and q• → qg. It might be queried whether panmixia would effectively re-appear under these circumstances. However, the sampling of allele frequencies has still occurred, with the result that σ2p, q ≠ 0. In fact, as s → ∞, the <math display="inline"> \sigma^2_{p,\ q} \to \tfrac{p_g q_g} {2N} </math>, which is the variance of the whole binomial distribution. Upon further rearrangement, the earlier results from the binomial sampling were confirmed, along with some new arrangements. Two of these were potentially very useful, namely: (A) <math display="inline"> f_t = \Delta f \left[ 1 + f_{t-1} \left( 2N-1 \right) \right] </math>; and (B) <math display="inline"> f_t = \Delta f \left( 1 - f_{t-1} \right) + f_{t-1} </math>.

The recognition that selfing may intrinsically be a part of random fertilization leads to some issues about the use of the previous random fertilization 'inbreeding coefficient'. Clearly, then, it is inappropriate for any species incapable of self fertilization, which includes plants with self-incompatibility mechanisms, dioecious plants, and bisexual animals. The equation of Wright was modified later to provide a version of random fertilization that involved only cross fertilization with no self fertilization. The proportion 1/N formerly due to selfing now defined the carry-over gene-drift inbreeding arising from the previous cycle. The new version is:

Homozygosity and heterozygosity

In the sub-section on "The sample gamodemes – Genetic drift", a series of gamete samplings was followed, an outcome of which was an increase in homozygosity at the expense of heterozygosity. From this viewpoint, the rise in homozygosity was due to the gamete samplings. Levels of homozygosity can be viewed also according to whether homozygotes arose allozygously or autozygously. Recall that autozygous alleles have the same allelic origin, the likelihood (frequency) of which is the inbreeding coefficient (f) by definition. The proportion arising allozygously is therefore (1-f). For the A-bearing gametes, which are present with a general frequency of p, the overall frequency of those that are autozygous is therefore (f p). Similarly, for a-bearing gametes, the autozygous frequency is (f q). These two viewpoints regarding genotype frequencies must be connected to establish consistency.

Following firstly the auto/allo viewpoint, consider the allozygous component. This occurs with the frequency of (1-f), and the alleles unite according to the random fertilization quadratic expansion. Thus:

<math display="block"> \left( 1-f \right) \left[ p_0 + q_0 \right] ^2 = \left( 1-f \right) \left[ p_0^2 + q_0^2 \right] + \left( 1-f \right) \left[ 2 p_0 q_0 \right] </math> Consider next the autozygous component. As these alleles are autozygous, they are effectively selfings, and produce either AA or aa genotypes, but no heterozygotes. They therefore produce <math display="inline"> f p_0 </math> "AA" homozygotes plus <math display="inline"> f q_0 </math> "aa" homozygotes. Adding these two components together results in:

<math display="inline"> \left[ \left( 1-f \right) p_0^2 + f p_0 \right] </math> for the AA homozygote; <math display="inline"> \left[ \left( 1-f \right) q_0^2 + f q_0 \right] </math> for the aa homozygote; and <math display="inline"> \left( 1-f \right) 2 p_0 q_0 </math> for the Aa heterozygote.

In subsequent sections, these substitution effects help define the gene-model genotypes as consisting of a partition predicted by these new effects (substitution expectations), and a residual (substitution deviations) between these expectations and the previous gene-model effects. The expectations are also called the breeding values and the deviations are also called dominance deviations.

Ultimately, the variance arising from the substitution expectations becomes the so-called additive genetic variance (σ2A) while the other is based on the genotype substitution effects They are algebraically inter-convertible with each other. In this section, the basic random fertilization derivation is considered, with the effects of inbreeding and dispersion set aside. This is dealt with later to arrive at a more general solution. Until this mono-genic treatment is replaced by a multi-genic one, and until epistasis is resolved in the light of the findings of epigenetics, the Genotypic variance has only the components considered here.

Gene-model approach – Mather Jinks Hayman

thumb|300px|right|Components of genotypic variance using the gene-model effects

It is convenient to follow the biometrical approach, which is based on correcting the unadjusted sum of squares (USS) by subtracting the correction factor (CF). Because all effects have been examined through frequencies, the USS can be obtained as the sum of the products of each genotype's frequency' and the square of its gene-effect. The CF in this case is the mean squared. The result is the SS, which, again because of the use of frequencies, is also immediately the variance.

Here, σ2a is the homozygote or allelic variance, and σ2d is the heterozygote or dominance variance. The substitution deviations variance (σ2D) is also present. The (weighted_covariance)ad is abbreviated hereafter to " covad ".

These components are plotted across all values of p in the accompanying figure. Notice that covad is negative for p > 0.5.

Most of these components are affected by the change of central focus from homozygote mid-point (mp) to population mean (G), the latter being the basis of the Correction Factor. The covad and substitution deviation variances are simply artifacts of this shift. The allelic and dominance variances are genuine genetical partitions of the original gene-model, and are the only eu-genetical components. Even then, the algebraic formula for the allelic variance is effected by the presence of G: it is only the dominance variance (i.e. σ2d ) which is unaffected by the shift from mp to G.

If, following the last-given rearrangements, the first three terms are amalgamated together, rearranged further and simplified, the result is the variance of the Fisherian substitution expectation.

That is: <math>\sigma^2_A = \sigma^2_a + \mathsf{cov}_{ad} + \sigma^2_d</math>

Notice particularly that σ2A is not σ2a. The first is the substitution expectations variance, while the second is the allelic variance. Notice also that σ2D (the substitution-deviations variance) is not σ2d (the dominance variance), and recall that it is an artifact arising from the use of G for the Correction Factor. [See the "blue paragraph" above.] It now will be referred to as the "quasi-dominance" variance.

Also note that σ2D < σ2d ("2pq" being always a fraction); and note that (1) σ2D = 2pq σ2d, and that (2) σ2d = σ2D / (2pq). That is: it is confirmed that σ2D does not quantify the dominance variance in the model. It is σ2d which does that. However, the dominance variance (σ2d) can be estimated readily from the σ2D if 2pq is available.

From the Figure, these results can be visualized as accumulating σ2a, σ2d and covad to obtain σ2A, while leaving the σ2D still separated. It is clear also in the Figure that σ2D < σ2d, as expected from the equations.

The overall result (in Fisher's format) is

\begin{align}

\sigma^2_G & = 2pq \left[ a+(q-p)d \right]^2 + \left( 2pq \right)^2 d^2 \\ & = \sigma^2_A + \sigma^2_D \\ & = \left[ \left( \sigma^2_a + \mathsf{cov}_{ad} + \sigma^2_d \right) \right] + \left[ 2pq \ \sigma^2_d \right]

\end{align} </math>

The Fisherian components have just been derived, but their derivation via the substitution effects themselves is given also, in the next section.

Allele-substitution approach – Fisher

thumb|300px|right|Components of genotypic variance using the allele-substitution effects

Reference to the several earlier sections on allele substitution reveals that the two ultimate effects are genotype substitution expectations and genotype substitution deviations. Notice that these are each already defined as deviations from the random fertilization population mean (G). For each genotype in turn therefore, the product of the frequency and the square of the relevant effect is obtained, and these are accumulated to obtain directly a SS and σ2. Details follow.

σ2A = p2 βAA2 + 2pq βAa2 + q2 βaa2, which simplifies to σ2A = 2pqβ2—the Genic variance.

σ2D = p2 dAA2 + 2pq dAa2 + q daa2, which simplifies to σ2D = (2pq)2 d2—the quasi-Dominance variance.

Upon accumulating these results, σ2G = σ2A + σ2D . These components are visualized in the graphs to the right. The average allele substitution effect is graphed also, but the symbol is "α" (as is common in the citations) rather than "β" (as is used herein).

Once again, however, refer to the earlier discussions about the true meanings and identities of these components. Fisher himself did not use these modern terms for his components. The substitution expectations variance he named the "genetic" variance; and the substitution deviations variance he regarded simply as the unnamed residual between the "genotypic" variance (his name for it) and his "genetic" variance. [The terminology and derivation used in this article are completely in accord with Fisher's own.] Mather's term for the expectations variance—"genic" Therefore, σ2G(1) = [ σ2a + σ22aq − 2 cov(a, 2aq) ] . But a (an allele effect) and q (an allele frequency) are independent—so this covariance is zero. Furthermore, a is a constant from one line to the next, so σ2a is also zero. Further, 2a is another constant (k), so the σ22aq is of the type σ2k X. In general, the variance σ2k X is equal to k2 σ2X .

The environmental variance will appear in other sections, such as "Heritability" and "Correlated attributes".

Heritability and repeatability

The heritability of a trait is the proportion of the total (phenotypic) variance (σ2 P) that is attributable to genetic variance, whether it be the full genotypic variance, or some component of it. It quantifies the degree to which phenotypic variability is due to genetics: but the precise meaning depends upon which genetical variance partition is used in the numerator of the proportion. Research estimates of heritability have standard errors, just as have all estimated statistics.

Where the numerator variance is the whole Genotypic variance ( σ2G ), the heritability is known as the "broadsense" heritability (H2). It quantifies the degree to which variability in an attribute is determined by genetics as a whole. <math display="block"> \begin{align} H^2 & = \frac {\sigma ^2_G} {\sigma ^2_P} \\ & = \frac {\sigma ^2_A + \sigma ^2_D}{\sigma ^2_P} \\ & = \frac {\left[ \sigma ^2_a + \sigma ^2_d + cov_{ad} \right] + \sigma ^2_D}{\sigma ^2_P} \end{align} </math> [See section on the Genotypic variance.]

If only genic variance (σ2A) is used in the numerator, the heritability may be called "narrow sense" (h2). It quantifies the extent to which phenotypic variance is determined by Fisher's substitution expectations variance. <math display="block"> \begin{align} h^2 & = \frac {\sigma ^2_A}{\sigma ^2_P} \\ & = \frac {\sigma ^2_a + \sigma ^2_d + cov_{ad{\sigma ^2_P} \end{align} </math>Fisher proposed that this narrow-sense heritability might be appropriate in considering the results of natural selection, focusing as it does on change-ability, that is upon "adaptation". He proposed it with regard to quantifying Darwinian evolution.

Recalling that the allelic variance (σ 2a) and the dominance variance (σ 2d) are eu-genetic components of the gene-model [see section on the Genotypic variance], and that σ 2D (the substitution deviations or "quasi-dominance" variance) and covad are due to changing from the homozygote midpoint (mp) to the population mean (G), it can be seen that the real meanings of these heritabilities are obscure. The heritabilities <math display="inline"> H^2_{eu} = \tfrac {\sigma ^2_a + \sigma ^2_d}{\sigma ^2_P} </math> and <math display="inline"> h^2_{eu} = \tfrac {\sigma ^2_a}{\sigma ^2_P} </math> have unambiguous meaning.

Narrow-sense heritability has been used also for predicting generally the results of artificial selection. In the latter case, however, the broadsense heritability may be more appropriate, as the whole attribute is being altered: not just adaptive capacity. Generally, advance from selection is more rapid the higher the heritability. [See section on "Selection".] In animals, heritability of reproductive traits is typically low, while heritability of disease resistance and production are moderately low to moderate, and heritability of body conformation is high.

Repeatability (r2) is the proportion of phenotypic variance attributable to differences in repeated measures of the same subject, arising from later records. It is used particularly for long-lived species. This value can only be determined for traits that manifest multiple times in the organism's lifetime, such as adult body mass, metabolic rate or litter size. Individual birth mass, for example, would not have a repeatability value: but it would have a heritability value. Generally, but not always, repeatability indicates the upper level of the heritability.

r2 = (s2G + s2PE)/s2P

where s2PE = phenotype-environment interaction = repeatability.

The above concept of repeatability is, however, problematic for traits that necessarily change greatly between measurements. For example, body mass increases greatly in many organisms between birth and adult-hood. Nonetheless, within a given age range (or life-cycle stage), repeated measures could be done, and repeatability would be meaningful within that stage.

Relationship

thumb|200px|right|Connection between the inbreeding and co-ancestry coefficients

From the heredity perspective, relations are individuals that inherited genes from one or more common ancestors. Therefore, their "relationship" can be quantified on the basis of the probability that they each have inherited a copy of an allele from the common ancestor. In earlier sections, the Inbreeding coefficient has been defined as, "the probability that two same alleles ( A and A, or a and a ) have a common origin"—or, more formally, "The probability that two homologous alleles are autozygous." Previously, the emphasis was on an individual's likelihood of having two such alleles, and the coefficient was framed accordingly. It is obvious, however, that this probability of autozygosity for an individual must also be the probability that each of its two parents had this autozygous allele. In this re-focused form, the probability is called the co-ancestry coefficient for the two individuals i and j ( f ij ). In this form, it can be used to quantify the relationship between two individuals, and may also be known as the coefficient of kinship or the consanguinity coefficient. This average is their Genepool Relationship Coefficient—the "GRC".

For the first example (two full first-cousins), their GRC = 0.5; for the second case (a full first and second cousin), their GRC = 0.3536.

All of these relationships (GRC) are applications of path-analysis. A summary of some levels of relationship (GRC) follow.

{| class="wikitable"

! GRC !! Relationship examples

| 1.00 || full Sibs

| 0.7071 || Parent ↔ Offspring; Uncle/Aunt ↔ Nephew/Niece

| 0.5 || full First Cousins; half Sibs; grand Parent ↔ grand Offspring

| 0.3536 || full Cousins First ↔ Second; full First Cousins {1 remove}

| 0.25 || full Second Cousins; half First Cousins; full First Cousins {2 removes}

| 0.1768 || full First Cousin {3 removes}; full Second Cousins {1 remove}

| 0.125 || full Third Cousins; half Second Cousins; full 1st Cousins {4 removes}

|0.0884 || full First Cousins {5 removes}; half Second Cousins {1 remove}

| 0.0625 || full Fourth Cousins; half Third Cousins

Resemblances between relatives

These, in like manner to the Genotypic variances, can be derived through either the gene-model ("Mather") approach or the allele-substitution ("Fisher") approach. Here, each method is demonstrated for alternate cases.

Parent-offspring covariance

These can be viewed either as the covariance between any offspring and any one of its parents (PO), or as the covariance between any offspring and the "mid-parent" value of both its parents (MPO).

One-parent and offspring (PO)

This can be derived as the sum of cross-products between parent gene-effects and one-half of the progeny expectations using the allele-substitution approach. The one-half of the progeny expectation accounts for the fact that only one of the two parents is being considered. The appropriate parental gene-effects are therefore the second-stage redefined gene effects used to define the genotypic variances earlier, that is: a = 2q(a − qd) and d = (q-p)a + 2pqd and also (-a) = -2p(a + pd) [see section "Gene effects redefined"]. Similarly, the appropriate progeny effects, for allele-substitution expectations are one-half of the earlier breeding values, the latter being: aAA = 2qa, and aAa = (q-p)a and also aaa = -2pa [see section on "Genotype substitution – Expectations and Deviations"].

Because all of these effects are defined already as deviates from the genotypic mean, the cross-product sum using {genotype-frequency * parental gene-effect * half-breeding-value} immediately provides the allele-substitution-expectation covariance between any one parent and its offspring. After careful gathering of terms and simplification, this becomes cov(PO)A = pqa2 = s2A . Next, utilizing cov(PO) = [ s2A + s2D ] as cov(XY), and s2P as s2X, it is seen that 2 ßPO = [ 2 ( s2A + s2D )] / s2P = H2 .

Analysis of epistasis has previously been attempted via an interaction variance approach of the type s2AA , and s2AD and also s2DD. This has been integrated with these present covariances in an effort to provide estimators for the epistasis variances. However, the findings of epigenetics suggest that this may not be an appropriate way to define epistasis.

Siblings covariances

Covariance between half-sibs (HS) is defined easily using allele-substitution methods; but, once again, the dominance contribution has historically been omitted. However, as with the mid-parent/offspring covariance, the covariance between full-sibs (FS) requires a "parent-combination" approach, thereby necessitating the use of the gene-model corrected-cross-product method; and the dominance contribution has not historically been overlooked. The superiority of the gene-model derivations is as evident here as it was for the Genotypic variances.

Half-sibs of the same common-parent (HS)

The sum of the cross-products { common-parent frequency * half-breeding-value of one half-sib * half-breeding-value of any other half-sib in that same common-parent-group } immediately provides one of the required covariances, because the effects used [breeding values—representing the allele-substitution expectations] are already defined as deviates from the genotypic mean [see section on "Allele substitution – Expectations and deviations"]. After simplification. this becomes: cov(HS)A = pq a2 = s2A . The correlation between full-sibs is of little utility, being rFS = cov(FS) / s2all FS together = [ s2A + s2D ] / s2P . The suggestion that it "approximates" ( h2) is poor advice.

Of course, the correlations between siblings are of intrinsic interest in their own right, quite apart from any utility they may have for estimating heritabilities or genotypic variances.

It may be worth noting that [ cov(FS) − cov(HS)] = s2A . Experiments consisting of FS and HS families could utilize this by using intra-class correlation to equate experiment variance components to these covariances [see section on "Coefficient of relationship as an intra-class correlation" for the rationale behind this].

The earlier comments regarding epistasis apply again here [see section on "Applications (Parent-offspring"].

Selection

Basic principles

thumb|250px|right|Genetic advance and selection pressure repeated

Selection operates on the attribute (phenotype), such that individuals that equal or exceed a selection threshold (zP) become effective parents for the next generation. The proportion they represent of the base population is the selection pressure. The smaller the proportion, the stronger the pressure. The mean of the selected group (Ps) is superior to the base-population mean (P0) by the difference called the selection differential (S). All these quantities are phenotypic. To "link" to the underlying genes, a heritability (h2) is used, fulfilling the role of a coefficient of determination in the biometrical sense. The expected genetical change—still expressed in phenotypic units of measurement—is called the genetic advance (ΔG), and is obtained by the product of the selection differential (S) and its coefficient of determination (h2). The expected mean of the progeny (P1) is found by adding the genetic advance (ΔG) to the base mean (P0). The graphs to the right show how the (initial) genetic advance is greater with stronger selection pressure (smaller probability). They also show how progress from successive cycles of selection (even at the same selection pressure) steadily declines, because the Phenotypic variance and the Heritability are being diminished by the selection itself. This is discussed further shortly.

Thus <math> \Delta G = S h^2 </math>. within the field of population genetics. Recent studies have shown that traits such as height have evolved in humans during the past few thousands of years as a result of small allele frequency shifts at thousands of variants that affect height.

Background

Standardized selection – the normal distribution

The entire base population is outlined by the normal curve The mean of the selected group is μs, and the difference between it and the base mean (μ) represents the selection differential (S). By taking partial integrations over curve-sections of interest, and some rearranging of the algebra, it can be shown that the "selection differential" is S = [ y (σ / Prob.)] , where y is the frequency of the value at the "selection threshold" z (the ordinate of z). The latter reference also gives values of i adjusted for small populations (400 and less), With non-dispersed random fertilization, f(t-1)) = 0, giving b2 = , as used in the selection section above. However, being aware of its background, other fertilization patterns can be used as required. Another determination also involves inbreeding—the fertilization determination (a2) equals 1 / [ 2 ( 1 + ft ) ] . Also another correlation is an inbreeding indicator—rA = 2 ft / ( 1 + f(t-1) ), also known as the coefficient of relationship. [Do not confuse this with the coefficient of kinship—an alternative name for the co-ancestry coefficient. See introduction to "Relationship" section.] This rA re-occurs in the sub-section on dispersion and selection.

These links with inbreeding reveal interesting facets about sexual reproduction that are not immediately apparent. The graphs to the right plot the meiosis and syngamy (fertilization) coefficients of determination against the inbreeding coefficient. There it is revealed that as inbreeding increases, meiosis becomes more important (the coefficient increases), while syngamy becomes less important. The overall role of reproduction [the product of the previous two coefficients—r2] remains the same. This increase in b2 is particularly relevant for selection because it means that the selection truncation of the Phenotypic variance is offset to a lesser extent during a sequence of selections when accompanied by inbreeding (which is frequently the case).

Genetic drift and selection

The previous sections treated dispersion as an "assistant" to selection, and it became apparent that the two work well together. In quantitative genetics, selection is usually examined in this "biometrical" fashion, but the changes in the means (as monitored by ΔG) reflect the changes in allele and genotype frequencies beneath this surface. Referral to the section on "Genetic drift" brings to mind that it also effects changes in allele and genotype frequencies, and associated means; and that this is the companion aspect to the dispersion considered here ("the other side of the same coin"). However, these two forces of frequency change are seldom in concert, and may often act contrary to each other. One (selection) is "directional" being driven by selection pressure acting on the phenotype: the other (genetic drift) is driven by "chance" at fertilization (binomial probabilities of gamete samples). If the two tend towards the same allele frequency, their "coincidence" is the probability of obtaining that frequencies sample in the genetic drift: the likelihood of their being "in conflict", however, is the sum of probabilities of all the alternative frequency samples. In extreme cases, a single syngamy sampling can undo what selection has achieved, and the probabilities of it happening are available. It is important to keep this in mind. However, genetic drift resulting in sample frequencies similar to those of the selection target does not lead to so drastic an outcome—instead slowing progress towards selection goals.

Correlated attributes

Upon jointly observing two (or more) attributes (e.g. height and mass), it may be noticed that they vary together as genes or environments alter. This co-variation is measured by the covariance, which can be represented by " cov " or by θ. of the two variances of the attributes. Observations usually occur at the phenotype, but in research they may also occur at the "effective haplotype" (effective gene product) [see Figure to the right]. Covariance and correlation could therefore be "phenotypic" or "molecular", or any other designation which an analysis model permits. The phenotypic covariance is the "outermost" layer, and corresponds to the "usual" covariance in Biometrics/Statistics. However, it can be partitioned by any appropriate research model in the same way as was the phenotypic variance. For every partition of the covariance, there is a corresponding partition of the correlation. Some of these partitions are given below. The first subscript (G, A, etc.) indicates the partition. The second-level subscripts (X, Y) are "place-keepers" for any two attributes.

thumb|300px|right|Sources of phenotypic correlation

The first example is the un-partitioned phenotype.

:<math> {r_{P_{XY} = } \over {\sqrt {\sigma^2_{P_{X \sigma^2_{P_{Y} </math>

The genetical partitions (a) "genotypic" (overall genotype),(b) "genic" (substitution expectations) and (c) "allelic" (homozygote) follow.

(a) <math> {r_{G_{XY} = } \over {\sqrt {\sigma^2_{G_{X \sigma^2_{G_{Y} </math>

(b) <math> {r_{A_{XY} = } \over {\sqrt {\sigma^2_{A_{X \sigma^2_{A_{Y} </math>

With an appropriately designed experiment, a non-genetical (environment) partition could be obtained also.

:<math> {r_{E_{XY} = } \over {\sqrt {\sigma^2_{E_{X \sigma^2_{E_{Y} </math>

Underlying causes of correlation

There are several different ways that phenotypic correlation can arise. Study design, sample size, sample statistics, and other factors can influence the ability to distinguish between them with more or less statistical confidence. Each of these have different scientific significance, and are relevant to different fields of work.

Direct causation

One phenotype may directly affect another phenotype, by influencing development, metabolism, or behavior.

Genetic pathways

A common gene or transcription factor in the biological pathways for the two phenotypes can result in correlation.

Metabolic pathways

The metabolic pathways from gene to phenotype are complex and varied, but the causes of correlation amongst attributes lie within them.

Developmental and environmental factors

Multiple phenotypes may be affected by the same factors. For example, there are many phenotypic attributes correlated with age, and so height, weight, caloric intake, endocrine function, and more all have a correlation. A study looking for other common factors must rule these out first.

Correlated genotypes and selective pressures

Differences between subgroups in a population, between populations, or selective biases can mean that some combinations of genes are overrepresented compared with what would be expected. While the genes may not have a significant influence on each other, there may still be a correlation between them, especially when certain genotypes are not allowed to mix. Populations in the process of genetic divergence or having already undergone it can have different characteristic phenotypes, which means that when considered together, a correlation appears. Phenotypic qualities in humans that predominantly depend on ancestry also produce correlations of this type. This can also be observed in dog breeds where several physical features make up the distinctness of a given breed, and are therefore correlated. Assortative mating, which is the sexually selective pressure to mate with a similar phenotype, can result in genotypes remaining correlated more than would be expected.

Limitations

Some authors have discussed various aspects of quantitative genetics that they see as limiting. These include that many quantitative genetic models assume normal Gaussian distributions for phenotypic traits, a stable variance-covariance matrix over evolutionary time, and linearity of quantitative genetic parameters.

Footnotes and references

External links

The Breeder's Equation
Quantitative Genetics Resources by Michael Lynch and Bruce Walsh, including the two volumes of their textbook, Genetics and Analysis of Quantitative Traits and Evolution and Selection of Quantitative Traits.
Resources by Nick Barton et al.. from the textbook, Evolution.
The G-Matrix Online