Researchers have investigated the relationship between race and genetics as part of efforts to understand how biology may or may not contribute to human racial categorization. Today, the consensus among scientists is that race is a social construct, and that using it as a proxy for genetic differences among populations is misleading.
Many constructions of race are associated with phenotypical traits and geographic ancestry, and scholars like Carl Linnaeus have proposed scientific models for the organization of race since at least the 18th century. Following the discovery of Mendelian genetics and the mapping of the human genome, questions about the biology of race have often been framed in terms of genetics. A wide range of research methods have been employed to examine patterns of human variation and their relations to ancestry and racial groups, including studies of individual traits, studies of large populations and genetic clusters, and studies of genetic risk factors for disease.
Research into race and genetics has also been criticized as emerging from, or contributing to, scientific racism. Genetic studies of traits and populations have been used to justify social inequalities associated with race, despite the fact that patterns of human variation have been shown to be mostly clinal, with human genetic code being approximately 99.6% – 99.9% identical between individuals and without clear boundaries between groups.
Some researchers have argued that race can act as a proxy for genetic ancestry because individuals of the same racial category may share a common ancestry, but this view has fallen increasingly out of favor among experts. The mainstream view is that it is necessary to distinguish between biology and the social, political, cultural, and economic factors that contribute to conceptions of race.
Phenotype may have a tangential connection to DNA, but it is still only a rough proxy that would omit various other genetic information. Today, in a somewhat similar way that "gender" is differentiated from the more clear "biological sex", scientists state that potentially "race" / phenotype can be differentiated from the more clear "ancestry". However, this system has also still come under scrutiny as it may fall into the same problems – which would be large, vague groupings with little genetic value.
Overview
The concept of race
The concept of "race" as a classification system of humans based on visible physical characteristics emerged over the last five centuries, influenced by European colonialism. However, there is widespread evidence of what would be described in modern terms as racial consciousness throughout the entirety of recorded history. For example, in Ancient Egypt there were four broad racial divisions of human beings: Egyptians, Asiatics, Libyans, and Nubians. The concept has manifested in different forms based on social conditions of a particular group, often used to justify unequal treatment. Early influential attempts to classify humans into discrete races include 4 races in Carl Linnaeus's Systema Naturae (Homo europaeus, asiaticus, americanus, and afer) and 5 races in Johann Friedrich Blumenbach's On the Natural Variety of Mankind. Notably, over the next centuries, scholars argued for anywhere from 3 to more than 60 race categories. Race concepts have changed within a society over time; for example, in the United States social and legal designations of "White" have been inconsistently applied to Native Americans, Arab Americans, and Asian Americans, among other groups (See main article: Definitions of whiteness in the United States). Race categories also vary worldwide; for example, the same person might be perceived as belonging to a different category in the United States versus Brazil. Because of the arbitrariness inherent in the concept of race, it is difficult to relate it to biology in a straightforward way.
Race and human genetic variation
There is broad consensus across the biological and social sciences that race is a social construct, not an accurate representation of human genetic variation. However, this number should be understood as an average, any two specific individuals can have their genomes differ by more or less than 0.65%. Additionally, this average is an estimate, subject to change as additional sequences are discovered and populations sampled. In 2010, the genome of Craig Venter was found to differ by an estimated 1.59% from a reference genome created by the National Center for Biotechnology Information. <!--to do: add discussion of human genetic diversity in comparison to chimpanzees-->
We nonetheless see wide individual variation in phenotype, which arises from both genetic differences and complex gene-environment interactions. The vast majority of this genetic variation occurs within groups; very little genetic variation differentiates between groups.
Sources of human genetic variation
Genetic variation arises from mutations, from natural selection, migration between populations (gene flow) and from the reshuffling of genes through sexual reproduction. Mutations lead to a change in the DNA structure, as the order of the bases are rearranged. Resultantly, different polypeptide proteins are coded. Some mutations may be positive and can help the individual survive more effectively in their environment. Mutation is counteracted by natural selection and by genetic drift; note too the founder effect, when a small number of initial founders establish a population which hence starts with a correspondingly small degree of genetic variation. Epigenetic inheritance involves heritable changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the DNA sequence.
Human phenotypes are highly polygenic (dependent on interaction by many genes) and are influenced by environment as well as by genetics.
Nucleotide diversity is based on single mutations, single nucleotide polymorphisms (SNPs). The nucleotide diversity between humans is about 0.1 percent (one difference per one thousand nucleotides between two humans chosen at random). This amounts to approximately three million SNPs (since the human genome has about three billion nucleotides). There are an estimated ten million SNPs in the human population.
Research has shown that non-SNP (structural) variation accounts for more human genetic variation than single nucleotide diversity. Structural variation includes copy-number variation and results from deletions, inversions, insertions and duplications. It is estimated that approximately 0.4 to 0.6 percent of the genomes of unrelated people differ.
Genetic basis for race
Much scientific research has been organized around the question of whether or not there is genetic basis for race. In Luigi Luca Cavalli-Sforza's book (circa 1994) "The History and Geography of Human Genes" he writes, "From a scientific point of view, the concept of race has failed to obtain any consensus; none is likely, given the gradual variation in existence. It may be objected that the racial stereotypes have a consistency that allows even the layman to classify individuals. However, the major stereotypes, all based on skin color, hair color and form, and facial traits, reflect superficial differences that are not confirmed by deeper analysis with more reliable genetic traits and whose origin dates from recent evolution mostly under the effect of climate and perhaps sexual selection".
A 2015 review article concludes that with advances in detecting and dating admixture (mixing of groups), there is more evidence that admixture between populations has occurred throughout human history. Genetic differences between races are generally correlated to geographic distance.
In 2018 geneticist David Reich reaffirmed the conclusion that the traditional views which assert a biological basis for race are wrong:
In 1956, some scientists proposed that race may be similar to dog breeds within dogs. However, this theory has since been discarded, with one of the main reasons being that purebred dogs have been specifically bred artificially, whereas human races developed organically. Furthermore, the genetic variation between purebred dog breeds is far greater than that of human populations. Dog-breed intervariation is roughly 27.5%, whereas human population intervariation is only estimated to be between 5.4% and 15.6%.
Research methods
Scientists investigating human variation have used a series of methods to characterize how different populations vary.
Early studies of traits, proteins, and genes
Early racial classification attempts measured surface traits, particularly skin color, hair color and texture, eye color, and head size and shape. (Measurements of the latter through craniometry were repeatedly discredited in the late 19th and mid-20th centuries due to a lack of correlation of phenotypic traits with racial categorization.) In actuality, biological adaptation plays the biggest role in these bodily features and skin type. A relative handful of genes accounts for the inherited factors shaping a person's appearance. Humans have an estimated 19,000–20,000 human protein-coding genes. Richard Sturm and David Duffy describe 11 genes that affect skin pigmentation and explain most variations in human skin color, the most significant of which are MC1R, ASIP, OCA2, and TYR. There is evidence that as many as 16 different genes could be responsible for eye color in humans; however, the main two genes associated with eye color variation are OCA2 and HERC2, and both are localized in chromosome 15.
Analysis of blood proteins and between-group genetics
thumb|right|upright=1.4|alt=Multicolored world map|Geographic distribution of blood group A
thumb|right|upright=1.4|alt=Multicolored world map|Geographic distribution of blood group B
Before the discovery of DNA, scientists used blood proteins (the human blood group systems) to study human genetic variation. Research by Ludwik and Hanka Herschfeld during World War I found that the incidence of blood groups A and B differed by region; for example, among Europeans 15 percent were group B and 40 percent group A. Eastern Europeans and Russians had a higher incidence of group B; people from India had the greatest incidence. The Herschfelds concluded that humans comprised two "biochemical races", originating separately. It was hypothesized that these two races later mixed, resulting in the patterns of groups A and B. This was one of the first theories of racial differences to include the idea that human variation did not correlate with genetic variation. It was expected that groups with similar proportions of blood groups would be more closely related, but instead it was often found that groups separated by great distances (such as those from Madagascar and Russia), had similar incidences. It was later discovered that the ABO blood group system is not just common to humans, but shared with other primates, and likely predates all human groups.
In 1972, Richard Lewontin performed a F<sub>ST</sub> statistical analysis using 17 markers (including blood-group proteins). He found that the majority of genetic differences between humans (85.4 percent) were found within a population, 8.3 percent were found between populations within a race and 6.3 percent were found to differentiate races (Caucasian, African, Mongoloid, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines in his study). Since then, other analyses have found F<sub>ST</sub> values of 6–10 percent between continental human groups, 5–15 percent between different populations on the same continent and 75–85 percent within populations. This view has been affirmed by the American Anthropological Association and the American Association of Physical Anthropologists since.
Critiques of blood protein analysis
While acknowledging Lewontin's observation that humans are genetically homogeneous, A. W. F. Edwards in his 2003 paper "Human Genetic Diversity: Lewontin's Fallacy" argued that information distinguishing populations from each other is hidden in the correlation structure of allele frequencies, making it possible to classify individuals using mathematical techniques. Edwards argued that even if the probability of misclassifying an individual based on a single genetic marker is as high as 30 percent (as Lewontin reported in 1972), the misclassification probability nears zero if enough genetic markers are studied simultaneously. Edwards saw Lewontin's argument as based on a political stance, denying biological differences to argue for social equality. Edwards' paper is reprinted, commented upon by experts such as Noah Rosenberg, and given further context in an interview with philosopher of science Rasmus Grønfeldt Winther in a recent anthology.
As referred to before, Edwards criticises Lewontin's paper as he took 17 different traits and analysed them independently, without looking at them in conjunction with any other protein. Thus, it would have been fairly convenient for Lewontin to come up with the conclusion that racial naturalism is not tenable, according to his argument. Sesardic also strengthened Edwards' view, as he used an illustration referring to squares and triangles, and showed that if you look at one trait in isolation, then it will most likely be a bad predicator of which group the individual belongs to. In contrast, in a 2014 paper, reprinted in the 2018 Edwards Cambridge University Press volume, Rasmus Grønfeldt Winther argues that "Lewontin's Fallacy" is effectively a misnomer, as there really are two different sets of methods and questions at play in studying the genomic population structure of our species: "variance partitioning" and "clustering analysis." According to Winther, they are "two sides of the same mathematics coin" and neither "necessarily implies anything about the reality of human groups."
Current studies of population genetics
Researchers currently use genetic testing, which may involve hundreds (or thousands) of genetic markers or the entire genome.
Structure
thumb|Principal component analysis of fifty populations, color-coded by region, illustrates the differentiation and overlap of populations found using this method of analysis.
thumb|Individuals mostly have genetic variants which are found in multiple regions of the world. Based on data from "A unified genealogy of modern and ancient genomes".
Several methods to examine and quantify genetic subgroups exist, including cluster and principal components analysis. Genetic markers from individuals are examined to find a population's genetic structure. While subgroups overlap when examining variants of one marker only, when a number of markers are examined different subgroups have different average genetic structure. An individual may be described as belonging to several subgroups. These subgroups may be more or less distinct, depending on how much overlap there is with other subgroups.
In cluster analysis, the number of clusters to search for K is determined in advance; how distinct the clusters are varies.
The results obtained from cluster analyses depend on several factors:
- A large number of genetic markers studied facilitates finding distinct clusters.
- Some genetic markers vary more than others, so fewer are required to find distinct clusters.
- The more individuals studied, the easier it becomes to detect distinct clusters (statistical noise is reduced).
- A similar cluster structure is seen with different genetic markers when the number of genetic markers included is sufficiently large. The clustering structure obtained with different statistical techniques is similar. A similar cluster structure is found in the original sample with a subsample of the original sample.
Recent studies have been published using an increasing number of genetic markers.
Focus on study of structure has been criticized for giving the general public a misleading impression of human genetic variation, obscuring the general finding that genetic variants which are limited to one region tend to be rare within that region, variants that are common within a region tend to be shared across the globe, and most differences between individuals, whether they come from the same region or different regions, are due to global variants.
Distance
Genetic distance is genetic divergence between species or populations of a species. It may compare the genetic similarity of related species, such as humans and chimpanzees. Within a species, genetic distance measures divergence between subgroups. Genetic distance significantly correlates to geographic distance between populations, a phenomenon sometimes known as "isolation by distance". Genetic distance may be the result of physical boundaries restricting gene flow such as islands, deserts, mountains or forests. Genetic distance is measured by the fixation index (F<sub>ST</sub>). F<sub>ST</sub> is the correlation of randomly chosen alleles in a subgroup to a larger population. It is often expressed as a proportion of genetic diversity. This comparison of genetic variability within (and between) populations is used in population genetics. The values range from 0 to 1; zero indicates the two populations are freely interbreeding, and one would indicate that two populations are separate.
Many studies place the average F<sub>ST</sub> distance between human races at about 0.125. Henry Harpending argued that this value implies on a world scale a "kinship between two individuals of the same human population is equivalent to kinship between grandparent and grandchild or between half siblings". In fact, the formulas derived in Harpending's paper in the "Kinship in a subdivided population" section imply that two unrelated individuals of the same race have a higher coefficient of kinship (0.125) than an individual and their mixed race half-sibling (0.109).
Critiques of F<sub>ST</sub>
While acknowledging that F<sub>ST</sub> remains useful, a number of scientists have written about other approaches to characterizing human genetic variation. Long & Kittles (2009) stated that F<sub>ST</sub> failed to identify important variation and that when the analysis includes only humans, F<sub>ST</sub> = 0.119, but adding chimpanzees increases it only to F<sub>ST</sub> = 0.183.
Anthropologists (such as C. Loring Brace), philosopher Jonathan Kaplan and geneticist Joseph Graves have argued that while it is possible to find biological and genetic variation roughly corresponding to race, this is true for almost all geographically distinct populations: the cluster structure of genetic data is dependent on the initial hypotheses of the researcher and the populations sampled. When one samples continental groups, the clusters become continental; with other sampling patterns, the clusters would be different. Weiss and Fullerton note that if one sampled only Icelanders, Mayans and Maoris, three distinct clusters would form; all other populations would be composed of genetic admixtures of Maori, Icelandic and Mayan material. Kaplan therefore concludes that, while differences in particular allele frequencies can be used to identify populations that loosely correspond to the racial categories common in Western social discourse, the differences are of no more biological significance than the differences found between any human populations (e.g., the Spanish and Portuguese).
Historical and geographical analyses
Current-population genetic structure does not imply that differing clusters or components indicate only one ancestral home per group; for example, a genetic cluster in the US comprises Hispanics with European, Native American and African ancestry. Such geographic analysis works best in the absence of recent large-scale, rapid migrations.
Historic analyses use differences in genetic variation (measured by genetic distance) as a molecular clock indicating the evolutionary relation of species or groups, and can be used to create evolutionary trees reconstructing population separations.
Correspondence between genetic clusters in a population (such as the current US population) and self-identified race or ethnic groups does not mean that such a cluster (or group) corresponds to only one ethnic group. African Americans have an estimated 20–25-percent European genetic admixture; Hispanics have European, Native American and African ancestry. Ethnoracial self- classification in Brazilians is certainly not random with respect to genome individual ancestry, but the strength of the association between the phenotype and median proportion of African ancestry varies largely across population.
Critique of genetic-distance studies and clusters
thumb|upright|alt=Colored circles, illustrating gene-pool changes|A change in a [[gene pool may be abrupt or clinal.]] Genetic distances generally increase continually with geographic distance, which makes a dividing line arbitrary. Any two neighboring settlements will exhibit some genetic difference from each other, which could be defined as a race. Therefore, attempts to classify races impose an artificial discontinuity on a naturally occurring phenomenon. This explains why studies on population genetic structure yield varying results, depending on methodology.
Rosenberg and colleagues (2005) have argued, based on cluster analysis of the 52 populations in the Human Genetic Diversity Panel, that populations do not always vary continuously and a population's genetic structure is consistent if enough genetic markers (and subjects) are included.
They also wrote, regarding a model with five clusters corresponding to Africa, Eurasia (Europe, Middle East, and Central/South Asia), East Asia, Oceania, and the Americas:
