There’s a Gene for That

History is littered with horrifying examples of the misuse of evolutionary theory to justify power and inequality. Welcome to a new age of biological determinism.

If you want to understand why humans wage wars, there is a gene for that. Want to understand why men rape women? There is a gene for that. Want to understand why the “national characters” of East Asia, the West, and Africa are different? We have those genes covered too. Indeed, if we are to believe most popular media, there is a gene for just about every inequality and inequity in modern society.

Genetic determinism and its uglier cousin, social Darwinism, are making a comeback. Armed with large genomic datasets and an arsenal of statistical techniques, a small but vocal band of scientists are determined to hunt down the genetic basis of all we are and all we do.

The relationship between genetics and biological determinism is almost as old as the field itself. After all, one of the foremost modern institutes of genetics, the Cold Spring Harbor Laboratory, began as a eugenics institute whose activities included “lobbying for eugenic legislation to restrict immigration and sterilize `defectives,’ educating the public on eugenic health, and disseminating eugenic ideas widely.”

The latest wave of biological determinism continues this long history, but differs in a crucial way from the past. We are at the beginning of the genomics era — an era in which advances in molecular biology make it possible to precisely measure minute genetic differences between humans. Combined with the fact that we live in a new Gilded Age where a small global elite has access to, and needs justification for, unprecedented amounts of wealth and power, the conditions are ripe for a dangerous resurgence of biological determinism.

Today it costs $5000 to sequence a genome, to identify the 6 billion As, Cs, Ts, and Gs that define an individual’s DNA. Soon it will cost less — much less. We are told that this is a revolutionary moment. With access to detailed genetic information, medical professionals and genetic counselors will soon be able to identify the diseases we’re predisposed to, and help prevent or minimize their impact through “personalized medicine.”

The scientific knowledge extracted from these data is priceless. We are beginning to understand how viruses evolve, the genetic mutations that give rise to cancers, and the genetic basis of cellular identity. The sequencing revolution has allowed us to study the molecular basis of genetic regulation and identify amazing new players like non-coding RNAs and chromatin modifications. All of our ideas about biology are being reshaped.

One of the most striking results of the new sequencing studies is how similar humans actually are — we differ from each other only in 0.1% of our DNA. Yet this 0.1% of the genome gives rise to the variations we see between people in traits such as skin color, height, and proclivity for disease. An important goal of modern genetics is to relate a particular genomic variant to a specific trait or disease. To do so, scientists are developing powerful new statistical tools to analyze a wealth of sequence data from populations around the world.

The relationship between genes and observable traits is indisputable. Tall parents tend to have tall kids. Dark-haired parents have dark-haired kids. That traits are inherited has been clear since Mendel codified his famous Laws of Inheritance, inferred from statistical observations of over 29,000 pea plants. In classical Mendelian genetics, separate genes encoding for separate traits are passed independently from each other to their offspring. Thus, there is a clear mapping between genetic information, or genotype, and observable traits, or phenotype. A single gene (technically a locus or genetic location) encodes for a single trait and is not influenced by the other traits a person possesses. Furthermore, environmental factors have little influence on most Mendelian traits. Famous examples that fall into this framework include sickle-cell anemia and cystic fibrosis, each caused by a mutation to a specific gene.

However, it is now clear that the simple assumptions underlying Mendelian genetics are not applicable to most traits and diseases. Nearly all phenotypes, from height and eye color to diseases such as diabetes, emerge from extremely complex interactions between multiple genes (loci) and the environment. In contrast to Mendelian genetics, where one can easily identify the gene that encodes for a particular trait, for many traits there is no simple mapping from genotype to phenotype.

The sheer volume of DNA sequence data now available has convinced scientists they can overcome this challenge. To do so, they are developing new scientific and statistical tools geared toward analyzing and extracting genetic information from sequence data. The goal of these genome-wide association studies (GWAS) is to provide a blueprint for decoding the information contained in our DNA, and to identify the genetic basis of complex traits and diseases. GWAS are now a staple of modern population genetics. This is reflected in the astronomical increase in the number of published GWAS in the last decade, from single digits in 2005 to more than thirteen hundred to date. There are GWAS on body height, birth weight, Inflammatory Bowel Disease, how people respond to particular drugs or vaccines, cancers, diabetes, Parkinson’s disease, and more. There are actually so many GWAS that specialized viewers have been created to help scientists visualize the results of all these studies.

Given the increasing prevalence of GWAS, it is useful to explicate the basic logic underlying these studies. The concepts of phenotypic and genetic variations play a central role in GWAS. Phenotypic variation is defined as the variation of a trait in a population (such as the distribution of heights in the population of American men). Note that in order to define phenotypic variation, we must specify a population. This is an a priori choice that must be made to construct a statistical model. The choice of population is often an important source of bias where hidden social assumptions enter GWAS — this is especially true for studies that try to understand genetic variation across “racial” groups.

GWAS try to statistically explain the observed phenotypic variation in terms of the genetic variation in the same population. This is where modern genomics shines. Whereas in the pre-genomic era one had to work hard to measure genetic variation at a single locus, now one can consult a readily available public database to get the genetic variation for thousands of individuals across the entire genome. Most GWAS focus on single-nucleotide polymorphisms (SNPs): DNA sequence variations that occur at a single base in the genome (e.g. AAGGCT vs. AAGTCT). Scientists have observed approximately 12 million SNPs in human populations. This number may seem incredibly large, but there are 6 billion bases in human DNA. So only 0.2% of all DNA bases exhibit any variation across all sampled human populations. For a trait such as height, there are about 180 SNPs known to contribute to human height variation.

The goal of GWAS is to relate genotypic variation to phenotypic variance. This is often expressed in a concept called heritability, which seeks to partition the phenotypic variance into a genetic and an environmental component. Roughly speaking, heritability is defined as the fraction of the phenotypic variation that we can ascribe to genetic variation. A heritability of zero means that all the phenotypic variance is environmental whereas a heritability of one means it is entirely genetic.

Behind the concept of heritability lies a whole world of simplifying assumptions about how biology works and how genes and environment interact, all filtered through increasingly complicated and obtuse statistical models. Heritability depends on the populations chosen and the environments probed by the experiments. Even the clean distinction between environment and genes is at some level artificial. As Richard Lewontin points out:

The very physical nature of the environment as it is relevant to organisms is determined by the organisms themselves. . . . A bacterium living in liquid does not feel gravity because it is so small . . . but its size is determined by its genes, so it is the genetic difference between us and bacteria that determines whether the force of gravity is relevant to us.

All this is to say that though heritability is a useful concept, it is an abstraction — one that depends entirely on the statistical models (with all their assumptions and prejudices) we use to define it.

Most importantly for our purposes, even for an extremely heritable trait such as height, the environment can drastically change the observed traits. For example, during the Guatemalan Civil War, US-backed death squads and paramilitaries brutalized the rural, indigenous population of Guatemala, resulting in widespread malnutrition. Many Mayan refugees fled to the United States to escape the violence. Comparing the heights of six- to twelve-year-old children of Guatemala Maya with American Maya, researchers found that the Americans were 10.24 centimeters taller than their Guatemalan counterparts, largely due to nutrition and access to healthcare. By comparison, the gene known to most influence height, the growth factor gene GDF5, is associated with changes in height of just 0.3 to 0.7 centimeters, and this only for participants with European ancestry.

Such dramatic environmental influences are commonplace. For example, the heritability of Type II diabetes, adjusted for age and Body Mass Index (BMI), is thought to be between 0.5 and 0.75 (a little less than that of height, but as discussed above, this number should be taken with a grain of salt). Currently, GWAS are able to explain only about 6% of this heritability, with no loci (genes) particularly predictive for whether an individual will develop diabetes. In contrast to genetics, an unhealthy BMI — a simple measure of how overweight a person is — increases the odds of developing diabetes nearly eightfold.

The same story holds for IQ — a staple of genetic studies on “intelligence.” Putting the validity of IQ tests aside for a moment, studies show a long and sustained increase in IQ scores over the course of the twentieth century (the Flynn Effect), pointing to the importance of environment rather than genetics in determining IQ.

Schizophrenia is another example. In his excellent blog Cross-Check, John Horgan discusses CMYA5, touted as the “schizophrenia gene” in the popular press. He points out that if you carry this gene, your risk of schizophrenia rises by just 0.07% to 1.07%. In contrast, “if you have a schizophrenic first-degree relative, such as a sibling, your probability of becoming schizophrenic is about 10%, which is more than 100 times the added risk of having the CMYA5 gene.” Such results are not uncommon. The field is actually very concerned about the lack of predictive power of GWAS (often discussed in the context of the “missing heritability” problem).

Despite the limited success of GWAS, it is doubtful that genetic determinist claims will abate in the near future. The main reason for this is the sheer volume of new genetic data currently being generated. This data deluge is a biological determinist’s wet dream. In case you think I am exaggerating, here’s a quote from a recent study on “the genetic architecture of economic and political preference” published in PNAS, a leading scientific journal. Not surprisingly, the SNPs they identified “explain only a small part of the total variance.” Far from discouraged, the authors conclude their abstract on an optimistic note:

These results convey a cautionary message for whether, how, and how soon molecular genetic data can contribute to, and potentially transform, research in social science. We propose some constructive responses to the inferential challenges posed by the small explanatory power of individual SNPS.

The sheer hubris speaks for itself. Given the difficulty of using GWAS to explain height — an easily quantified, easily measured trait — the absurdity of claiming to identify the genetic basis of ill-defined, temporally variable, hard-to-quantify traits such as intelligence, aggression, or political preference is patently clear.

Regardless, the genetic determinist’s playbook in the genomics era is clear: Collect mass quantities of sequence data. Find an ill-defined trait (like political preference). Find a gene that is statistically overrepresented in the sub-population that “possesses” that trait. Declare victory. Ignore the fact that these genes don’t really explain the phenotypic variance of the trait. Instead, claim that if we only had more data the statistics would all work out. Further generalize these results to the level of societies and claim they explain the fundamental genetic basis of human behavior. Write a press release and wait for the media to write glowing reviews. Repeat with another data set and another trait.

Biological determinism seems plausible precisely because it gives the illusion that it is grounded in scientific observation. No scientist disagrees that the basic building blocks of an organism are encoded in its genetic material, and that evolution, through some combination of genetic drift and selection, has shaped those genes. But trying to ascribe human behavior, whether eating a whole bag of potato chips or waging war, to a set of genes is clearly a quixotic exercise.

As Nigel Goldenfeld and Leo Kadanoff implore in a beautiful article discussing complex systems: “Use the right level of description to catch the phenomena of interest. Don’t model bulldozers with quarks.” While it is certainly true that all the properties of a bulldozer result from the particles that make it up, like quarks and electrons, it is useless to think about the properties of a bulldozer (its shape, its color, its function) in terms of those particles. The shape and function of a bulldozer are emergent properties of the system as a whole. Just as you can’t reduce the properties of a bulldozer to those of quarks, you can’t reduce the complex behaviors and traits of an organism to its genes. Marx made the same point when he stated that “merely quantitative differences beyond a certain point pass into qualitative changes.”

If the philosophical and scientific bases of genetic determinist claims are so problematic, why is such sloppy thinking rewarded with front-page articles in the New York Times Science section? To answer this, we must consider not just science, but politics.

We live in an era in which corporations make unprecedented profits, an elite few accumulate enormous wealth, and inequality is reaching levels approaching those of the Gilded Age. The contradictions between neoliberal capitalism and democratic impulses are continually exposed. The claims of equal opportunity underlying much of liberal thinking are becoming farce. The incongruity between what capitalism professes to be and the reality of capitalism is becoming increasingly apparent.

The appeal of biological determinism is that it offers plausible, scientific explanations for societal contradictions engendered by capitalism. If Type II diabetes is reduced to the problem of genetics (which it surely is to some degree), then we don’t have to think about the rise of obesity and its underlying causes: the agro-business monopoly, income inequality, and class-based disparities in food quality. Combine this with the prevalence of drug-based solutions to disease pushed by the pharmaceutical industry and it is no surprise that we are left with the impression that complex social phenomena are reducible to simple scientific fact.

Biological determinism, to paraphrase the great literary critic Roberto Schwarz, is a socially necessary illusion well-grounded in appearance. Much like art and literature, science “is historically shaped and . . . registers the social process to which it owes its existence.” Scientists inherit the prejudices of the societies in which they live and work. Nowhere is this more obvious than in the modern incarnation of biological determinism with its decidedly neoliberal assumptions about humans and societies.

The history of biology is littered with horrifying examples of the misuse of genetics (and evolutionary theory) to justify power and inequality: evolutionary justifications for slavery and colonialism, scientific explanations for rape and patriarchy, and genetic explanations for the inherent superiority of the ruling elite. We must work tirelessly to ensure that history does not repeat itself in the genomics era.