"Today we are learning the language in which God created
life."
That's a quote from President Bill Clinton on
June 26, 2000 when a "rough draft" of the human genome was
announced to widespread, international fanfare. Clinton was certainly not the
only one to make big claims about the Human Genome Project (HGP). Journalists
and politicians throughout the developed world heralded that the results
would lead to "the end of disease." Of course, things are never
that simple.
The HGP was declared officially complete in April 2003, yet the
promise of a new era of diagnosing and treating disease remains just a
promise. That's not to say that scientists learned nothing from the effort.
Genetics has come a long way since the Austrian monk and scientist Gregor Mendel discovered that certain traits are
inherited as units (we call these units of inheritance "genes")
while breeding pea plants in the mid-1800s. And to understand the genome as a
whole, it seems that the first logical step is to map the sequence of nucleic
acid bases (i.e.,
the As, Gs, Cs, and Ts)
that read out linearly and comprise the DNA that makes up the genome.
But at the end of the day, the draft of the genome was basically just
a set of three billion boring letters. What science needs now is something to
bring those letters to life, to translate them into an instruction manual for
actually building a person; then we can better understand the roots of
disease and generate treatments. That's where a more recent, less-known
scientific collaboration comes into play…
September 5, 2012. It was this day that one of the most ambitious
international science projects you may have never heard of started to show
the fruits of its labor. A collection of thirty papers simultaneously
published in the journals Nature,
Genome Research,
and Genome Biology
provided results from a multiyear research endeavor – involving over 400
scientists from 32 labs around the world – known as the ENCODE Project.
ENCODE, or the "Encyclopedia of DNA Elements," is one of the
most ambitious biological research projects ever undertaken, and the natural
follow-up to the HGP.
While the initial genome-sequencing effort succeeded in laying out the
three billion DNA bases found in a human cell, it failed to describe what you
might expect to find inside those billions of bits of biological data. It was
like a map without labels or a legend – just lines on the paper... important,
but only a first step. ENCODE, which has generated hundreds of terabytes of
raw data so far, was designed to pick up where the HGP left off. It sought to
build upon this foundation by annotating the specific regions of the genome
that are used in the various cells of the human body, and to catalogue the
biochemical products of this activity.
Throughout the latter half of the twentieth century, the "central
dogma" of genetics and molecular biology revolved around the flow of
information from DNA to RNA to proteins through the processes of
transcription and translation. Genes – the portions of the DNA sequence that
contain the "recipes" for particular proteins – have traditionally
taken center stage for research purposes. And since humans have only around
21,000 such genes, accounting for only around 2% of the genome, it was
assumed by many that a lot of our DNA simply performed no useful function.
Lending credence to this notion is the fact that when the genome is
viewed as a linear sequence, there are long intergenic
regions (known as "deserts") with no protein-coding activity. It's
hard to imagine how these regions could have any functional connection to the
genes; thus, the notion of "junk DNA" was invoked. It was suggested
that over the course of our evolutionary development, our DNA had been
invaded by various parasitic agents – viruses, retrotransposons,
etc. – which were simply using our biological success for their
own reproductive purposes.
Now it seems that the notion of "junk DNA" is, well, junk. A
key takeaway from the ENCODE Project is that much of this DNA performs
regulatory functions, which can be thought of as regions that act like
switches attached to a particular gene that determines whether or not it will
be expressed. Scientists have long been aware of these DNA regions, but
assumed that their numbers were on par with the number of genes.
It turns out, however, that there are millions of such regions
throughout the genome, linked to each other (and to the protein-coding genes)
in an extremely complicated hierarchical network. (The metaphor of a
"hairball of wires" was offered by one ENCODE scientist.)
It turns out that the linear ordering of the genome, as offered by the
HGP, provides a further source of confusion – the three-dimensional folding
of the chromosomes inside the nucleus allows promoter regions to maintain a
close connection to genes that apparently lie far away on the linear
sequence. This explains why so much biochemical activity can be found even
deep in the deserts of the alleged "junk DNA." Many of these
promoter regions manifest themselves in the cell as "functional RNA"
molecules – types of RNA that are an end product in themselves, rather than
merely an intermediate step on the way to becoming a protein, and which play
a key role in switching genes on and off.
Another key takeaway from the ENCODE Project is that there is far more
functional RNA activity than was previously believed. In fact, there are
nearly as many genes that code for functional RNA as there are protein-coding
genes. Some scientists are now suggesting that the definition of the word
"gene" should be changed to avoid even implicit emphasis on protein
as the end product of the recipe it contains. (Technically, the word
"gene" is usually defined to allow for RNA genes as well, however.)
Needless to say, it will take years before the implications of the ENCODE
Project are fully worked out on a scientific level, let alone translated into
pharmacological products. But the project does represent a big step forward
in our understanding of the human genome, and there is no doubt that this new
understanding of what actually goes on inside our cell nuclei will prove
enormously helpful in our battle to treat and prevent disease.
Many investors think the only way to make big money in technology is
to invest in risky startups. This simply isn't true – in fact, you can make
outstanding gains in well-established tech companies worth many billions of
dollars. Here's proof.