Devangshu Datta: What's in a genome
The results of the Encode project may not be sensational but will clearly accelerate research in several key areas

In early September, several teams working on the human genome published a flood of papers detailing their findings. The Encyclopaedia of DNA Elements Projects (Encode) has made great progress in detailing human genome functions.
This is the biggest set of breakthroughs since the human genome project (HGP) mapped the DNA sequence between 2000 and 2003. However, Encode’s results have caused some controversy within the scientific community. The results are not being disputed. But several scientists have stated the press releases were misleading or misinterpreted.
Encode involved 442 scientists in 30-odd labs across several countries. The $200-million project studied 147 types of cells to try and understand the functionality of specific DNA sequences. In a coordinated set of releases, separate teams published six papers in Nature, another 24 papers in Genome Research & Genome Biology, six more papers in The Journal of Biological Chemistry and one in Science.
Unusually, for a collaborative project of this scale, the papers were embargoed to ensure coordinated release. The raw data were made available for use, however. All data and papers are now freely available with a special search application on the portal, http://nature. com/encode/.
DNA (deoxyribonucleic acid) carries the information that offers inheritable characteristics. The HGP identified 20,000-plus DNA genes that carried protein-coded information unique to individuals. The coded sequences are responsible for most cell functions. If that set of protein-coded genes is replicated, it produces a clone, or identical twin.
Also Read
But the coded DNA sequences are a very small proportion – roughly 1 to 1.5 per cent – of the entire genome. The genome contains many more DNA sequences that possess no protein-coding. It also has RNA (ribonucleic acid) sequences. RNA is required to copy and replicate DNA. RNA passes through the nuclear membrane of cells carrying selective DNA information to be replicated.
It was known that non-coded DNA sequences included switches that controlled and regulated the activity of coded sequences. However, many non-coded sequences also seem redundant. Some are broken bits of discarded genes and disabled viruses. Even some coded DNA is redundant. One hypothesis is that these bits of useless DNA are leftovers from evolutionary history.
Since functions of specific bits of non-coded DNA weren’t known and since some bits were apparently useless, these sections were misleadingly labelled “junk”. They are also referred to as “dark matter”.
Encode has figured out the biochemical activity in much of the junk and also confirms that there are many switches controlling the coded sections. The switches tell coded genes when to switch on and off and determine, for example, which cells become muscles, and which pancreas cells, or neurons.
Encode claims that at least 80 per cent of the junk is biochemically active. This is where confusion has arisen. It was reported that 80 per cent of junk was useful. But “biochemically active” doesn’t necessarily translate to useful. Carrying a useless, disabled gene doesn’t hurt the organism and most such sequences are biochemically active.
There are about 4 million switches within the junk. These switches are key to mutation (changes in genetic code) and to susceptibility to inherited diseases. In terms of actual utility, Encode identified around eight per cent of junk as useful.
The project examined 147 cell types and the human genome has thousands of cell types. Given ongoing research into more cell types, project leader, Ewan Birney of the European Bioinformatics Institute, Cambridge (UK), told Nature that the useful component could be bumped up to around 20 per cent. That is a massive advance on one to 1.5 per cent.
There is a local control effect. Switches have greater influence on the gene sequences close to them. Every cell contains 2 to 3 metres of DNA tightly coiled inside the human cell nucleus, which is only 6 micrometres in diameter (there are 1,000 micrometres in 1 millimetre and 1,000 millimetres in a metre). The non-coded control sequences are coiled in between the codes. These “introns”, as they are called, influence nearby genes and this proximity response can only be understood when the genome is examined in a 3D layout.
The researchers have found switches that control sensitivity to diseases like multiple sclerosis, lupus (auto-immune disease), rheumatoid arthritis, Crohn’s disease and so on. Encode has discovered around 400 such switching areas that seem worth examining. Ever since HGP, it has been known that small changes in junk DNA sequences increased the probability of getting hereditary diseases.
The discovery of the specific gene switches controlling those genetic changes could lead to new avenues of treatment. In cancer, for instance, some treatments target specific protein-coded sequences (genes), which change with the onset of the disease. Targeting switches instead to stop changes triggering may be promising.
Apart from the basic roadmap provided by the HGP, advances in computational power over the past decade were required to make Encode research possible. The project generated over 15 terabytes of data and required more than 300 years of computing time on very fast dedicated processors. This is not surprising. A “circuit diagram” of 20,000-odd circuits connected by 4 million on/off switches would give rise to a very large number of combinations.
So, the results aren’t quite as sensational as the initial press releases make out. But they do represent a huge advance in knowledge about the genome and the findings will clearly accelerate research in several key areas. The hype and misunderstanding about the implications of biochemical activity may have led to inflated expectations. But the reality of what Encode has achieved is pretty impressive.
More From This Section
Don't miss the most important news and views of the day. Get them on our Telegram channel
First Published: Sep 21 2012 | 12:55 AM IST
