Hunting down the genes that cause disease is a difficult task, even with the human genome now fully decoded. Although humans have many fewer genes than once thought, scientists still estimate there are about thirty thousand genes held within the three billion or so nucleotides in each of our cells. So where and how do researchers begin to look for genes that cause disease?
Long before researchers finished sequencing the genome, some scientists began the painstaking process of isolating the location of genes that cause such rare disorders as Huntington disease and cystic fibrosis. Much of this pioneering work focused on families with well-documented lineages. Researchers traced diseases from one generation to the next, and compared the genomes of living family members in search of differences among them. The more two people are alike -- as are siblings, for instance -- the easier it is to find differences between them. Easier, yes. But not easy.
Researchers looking for "disease genes" received a boost in the late-1980s with the discovery of what are called single nucleotide polymorphisms, or SNPs (pronounced "snips"). Scientists believe that these single-letter misspellings in the DNA code are the key to finding the causes of many common diseases, given that many genetic diseases result from such misspellings. The problem is, scientists have identified about 1.4 million SNPs in the human genome. Even with today's advanced computer technology, testing each one of them for a correlation with a common disease is practically impossible. So how do you identify which SNPs cause disease, when there are too many to test?
Fortunately, SNPs travel in pairs, which means that one variation in the genetic code always has a corresponding variation somewhere else along the same strand. These paired variations act like bookends and, combined with the letters between them, form a relatively long segment of DNA. Thus, researchers wanting to find a specific variation can look for the whole segment -- a significantly larger target than the variation itself. And the bigger the segment, the easier it is to locate.
Until recently, scientists were unsure how long an average SNP segment might be. Estimates ranged as low as three thousand base pairs long -- still dauntingly tiny considering the genome is three billion bases long. It turns out that the average SNP segment (in northern Europeans) is about sixty thousand letters long. This makes searching for the location of a disease-related SNP about twenty times easier than once thought.