Another post from an MDC highlight… See more stories at http://www.mdc-berlin.de. Click on one of the highlights in the center and follow the links to past archives.
Wei Chen’s group captures the first full view of one of nature’s most complex genes
“One gene makes one enzyme,” declared George Beadle and Edward Tatum in 1941, in work that led to a 1958 Nobel prize in Physiology or Medicine. This established a research pathway that forms the heart of modern genetics, but their principle has been vastly refined. Studies of the genomes of humans and other organisms have revealed that the vast majority of genes have a boxcar-like structure built of protein-encoding regions called exons and noncoding information called introns. Exons can be mixed and matched into a variety of proteins, each with a unique chemical recipe, in a process called alternative splicing. This allows amazing diversity from a limited number of genes and underpins many biological processes. A gene called Dscam in the “simple” fruit fly Drosophila melanogaster is the current record-holder; it has 115 exons that can potentially be used to produce 38,016 distinct proteins. Each version may make an important contribution to the wiring of neurons in its brain, yet it has been extremely difficult to figure out which of all these possible candidates the fly actually produces, in which types of cells, and why the fly genome encodes such a seemingly unnecessary diversity. A new method by Wei Chen’s group reveals a way to answer these questions. The work is a collaboration with the lab of Dietmar Schmucker at the Vesalius Research Center in Leuven, Belgium, and appears in the June 21 issue of the EMBO Journal.
Dscam is an abbreviation for Down syndrome cell adhesion molecule. In 2007 scientists discovered that its potential diversity plays an important role in the wiring of the fly brain. Neighboring neurons in flies that produce identical forms repel each other, while those that become attached expressed different ones. In humans this process is largely governed by related cell-adhesion molecules of the so-called clustered protocadherin receptors.
Traditionally, it has been almost impossible to detect different forms of such complex molecules. Wei Sun and other members of Wei Chen’s group managed this with Dscam by developing a method called CAMSeq(for “Circularization-Assisted Multi-Segment Sequencing”).
“Cells transcribe Dscam into a huge RNA molecule that then undergoes a process called ‘alternative splicing,’” Wei Chen says. “A few regions remain in all versions of the protein, but the RNA also has four blocks containing multiple exons from which it chooses one version of each.”
It’s a bit like assembling your wardrobe out of a catalogue that offers only one type of shoe, but 12 styles of socks, 48 types of trousers, 33 shirts, and two different hats. Altogether, those items could be combined in different ways to create 38,016 possible wardrobes. In the past, Wei Chen says, it was possible to look at just the “socks” exon and determine which form a molecule had, or the “shirts” exon. But you couldn’t step back and view the whole ensemble when comparing different versions of Dscam. It would be like knowing that 3,000 individual proteins had received the exon equivalent of a Hawaiian shirt, and 1,000 the blue shorts, but you couldn’t tell whether they were being worn together.
Part of the problem in studying Dscam diversity has been fundamental limitations on the high-throughput technologies such as microarrays or deep sequencing methods that prepare the RNA transcripts and then analyze their complete composition. Normally, “deep sequencing” methods can only approach molecules that have a maximum of 1,000 bases in length, and then “read” their composition by starting at either end and working inward. “This is only accurate to about 150 ‘letters’ of the code, meaning that you can analyze about 300 nucleotides long from molecules shorter than 1,000 bases,” Wei Chen says. “But the variable region of Dscam is much longer, which means that the normal method won’t work. An alternative has been to look at the single exons present in an RNA separately, but again, this doesn’t give us a view of how they are combined.”
To solve these technical problems, Wei and his colleagues added a few new steps to the sequencing process. They began by using PCR to produce cDNA molecules that contained the “variable regions.” But about half of this section is occupied by a very long stretch right in the middle that doesn’t vary and thus wasn’t interesting to look at.
“We realized that we could eliminate this section by drawing the cDNA into a ring, which puts the variable sections much closer together,” Wei says. “That places them in a stretch that is about 1,000 bases long and can be approached by our methods.”
Now the scientists could copy just the relevant stretch of Dscam using PCR. This allowed them to study combinations of the three most variable exons in RNAs, produced by cells in different tissues at various stages of development. They found 18,496 out of the 19,008 possible forms – another landmark in the paper.
“Previously scientists had no way to know all these possible combinations of exons were actually being used in the fly,” Wei Chen says. “They might just be ‘theoretical possibilities.’ For instance, the selection of a particular exon at one place might determine which one was being selected from another variable group, meaning that some combinations never appeared.”
But based on their results, Dscam doesn’t seem to be very particular about matching its “wardrobe”: the choice of one exon doesn’t seem to influence the selection at another.
“These measurements are permitting us to make a thorough evaluation of the total protein diversity in an organism, as well as different types that might be made by single cells,” Wei Chen says. “Those factors are essential in the way neurons weave together to make a functional brain architecture. Interestingly, the isoforms of Dscam were expressed at very different, fluctuating amounts. Some appeared at quantities tens of thousands of times higher than others, in a way significantly biased in specific cells and tissues and at various developmental stages. Until now this has been underappreciated, but such bias can dramatically reduce the ability of neurons to display unique surface receptor codes.”
One of the great puzzles related to Dscam has been the question of why flies would need to create a protein in so many different forms – producing each one costs energy and requires a great deal of cellular management. “What we see is that given the splicing biases and the random nature of the splicing process, this seemingly excessive diversity might nevertheless be essential so that neurons can clearly distinguish between ‘self’ and ‘non-self’ types.”
The method can also be applied, he says, to other cases of complex genes – including those of humans – that are spliced in many different ways to fulfill a wide variety of biological functions.
Note: The title of this story is a reference to Charles Darwin, taken from the last sentence of the 6th edition of the Origin of Species: “There is grandeur in this view of life, with its several powers, having been originally breathed by the Creator into a few forms or into one; and that, whilst this planet has gone circling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being evolved.” Darwin and his contemporaries knew almost nothing about cellular chemistry, but this basic idea applies equally well to alternative splicing.

Highlight Reference:
Sun W, You X, Gogol-Döring A, He H, Kise Y, Sohn M, Chen T, Klebes A, Schmucker D, Chen W. Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing. EMBO J. 2013 Jun 21.