Skip to Main Content (access key 1)
Skip to Search (access key 2)
Skip to Search GO (access key 3)
Skip to comments (access key 4)
Skip to navigation (access key 5)
Skip to top of page (access key 6)
Monday, May 19, 2008 | Reason : In the News | print version Print | Comments

Document New Tool To Understand Evolution Of Multi-domain Genes Developed

by Science Daily

Thanks to GP for the link.

http://www.sciencedaily.com/releases/2008/05/080515205640.htm

New Tool To Understand Evolution Of Multi-domain Genes Developed

ScienceDaily (May 18, 2008) — Carnegie Mellon scientists have discovered critical flaws in the standard method used to analyze gene evolution. Standard methods fail when applied to genes that encode multi-domain proteins, an important class of proteins crucial to human health.

Computational biologist Dannie Durand and colleagues have for the first time tackled the dilemma of how to study the ancestry of multi-domain genes.

Correctly identifying gene ancestry is a linchpin of computational genomics. Genes passed down from a common ancestor tend to perform similar functions in the cell. Scientists exploit this similarity to perform tasks such as predicting gene function, mapping human chromosomal regions to corresponding regions in model organisms, and reconstructing the regulatory circuitry that turns genes on and off.

Although computational biologists have developed methods to identify genes that share a common ancestor, current methods often lead to spurious conclusions when applied genes encode multi-domain proteins. Domains are sequence fragments that encode the basic building blocks of protein structure. Evolution makes new genes by mixing and matching domains in novel combinations, much like a child who builds a house, a car and a helicopter from the same LEGO kit by combining LEGO blocks in different ways. This process, called domain shuffling, creates complex proteins that perform specific, critical tasks such as cell communication and binding to other cells. When one of these proteins fails, cancer is often the result. Domain shuffling allows rapid evolution of new proteins, but it also makes it close to impossible for scientists to determine their ancestry.

In a paper published online in Public Library of Science Computational Biology May 15, Durand's team presents a novel method to determine whether a pair of similar genes evolved from a common ancestor, or whether they just look similar because the same domain was inserted into both genes. Their method, called "Neighborhood Correlation," is the first to tackle this problem.

"We needed a completely new approach to determine which multi-domain proteins share a common ancestor, and we are the first group to propose such a method," Durand said. "Ours is the first approach to define and analyze common ancestry in a traditional vertical way, even when domain shuffling occurs."

Neighborhood Correlation exploits the structure of a statistically weighted sequence similarity network to differentiate multi-domain genes with shared ancestries from multi-domain genes that result from domain shuffling. Gene duplication creates a specific signature in the network, while domain insertion creates a different characteristic signature. Neighborhood Correlation captures these signatures, giving pairs that arose through duplication, and hence share common ancestry, a higher score than genes that share an inserted domain, but not a common ancestor.

The Carnegie Mellon scientists tested Neighborhood Correlation against 20 protein families -- including Kinases, the largest multi-domain family found in humans -- whose ancestral relationships are well established through lab-based research. The tool worked remarkably well in verifying the ancestral patterns of multi-domain gene evolution for these families, much better than the tools we use today, Durand said.

Today's computational tools use sequence similarity, assuming that genes with similar sequences indicate common ancestry. Those methods also use the length of the similar region to rule out similarity that arose due to inserted domains. They reason that the longer the sequence shared by two multi-domain genes, the more likely that those two genes share a common ancestor.

But Durand's tests showed that this assumption often does not hold. Her team found disturbing results when they compared sequence similarity to their Neighborhood Correlation method in evaluating the 20 gene families with established histories. The sequence similarity method actually yielded false ancestral associations and missed true ancestral relationships.

Neighborhood Correlation is successful because it takes both gene duplication and domain insertion into account.

"Not only do we show that Neighborhood Correlation works empirically, we also provide a sound evolutionary argument as to why it should work," Durand observed. "Our results show that the organization of sequence similarity network contains evidence of ancient evolutionary processes. This has exciting implications for future studies. We hope that comparing the sequence similarity networks of different species will reveal how evolutionary processes differ in plants, animals and fungi," Durand said. "Multicellularity evolved independently in each of those groups. To go from a single cell to many cells acting together, each time nature had to solve the same problems of cellular communication and control. But are the solutions the same in each lineage" How those problems were solved is a fascinating question."

Although designed for multi-domain families, Durand notes that Neighborhood Correlation also accurately predicts ancestry in single domain sequences. The researchers hope that scientists will begin to apply the analysis to genomic studies to better understand the role multi-domain proteins play in important evolutionary events, such as the emergence of multicellular animals and the vertebrate immune system.

Other study authors include Carnegie Mellon's Nan Song, Jacob Joseph and George Davis. Team members are affiliated with the Ray and Stephanie Lane Center for Computational Biology, the Department of Biological Sciences and the School of Computer Science.

The study was funded by the National Science Foundation, National Institutes of Health, and the David and Lucille Packard Foundation.


Comments 1 - 9 of 9 |

Reload Comments | Back to Top | Page Numbers

1. Comment #182073 by Sunfish Rule on May 19, 2008 at 7:34 am

 avatarNeat!

Other Comments by Sunfish Rule

2. Comment #182100 by flobear on May 19, 2008 at 8:15 am

 avatarGood job CMU!

<--- CMU Alum

Other Comments by flobear

3. Comment #182128 by rod-the-farmer on May 19, 2008 at 8:59 am

 avatarI would have understood this a bit more if the term "multi-domain protein" was explained. I THINK this is exciting, just not sure why.

Other Comments by rod-the-farmer

4. Comment #182162 by HandyGeek on May 19, 2008 at 10:08 am

 avatarWasn't something like this tried fairly recently and failed?

Other Comments by HandyGeek

5. Comment #182197 by Andrew Stich on May 19, 2008 at 11:15 am

What sort of thing, HandyGeek? Do you mean a similar technique that proved to be inaccurate, or an experiment that did not yield the expected results, or what?

But, cool. I understand the main point (a new technique has been devised that takes into account and isolates shared ancestry and domain shuffling from each other in order to better predict ancestry), but I don't know what a multi-domain protein is.

Time to make use of those Internets.

Other Comments by Andrew Stich

6. Comment #182198 by Andrew Stich on May 19, 2008 at 11:17 am

http://en.wikipedia.org/wiki/Protein_domain

Multi-domain proteins are the fourth primary bullet.

Other Comments by Andrew Stich

7. Comment #182199 by black wolf on May 19, 2008 at 11:20 am

 avatarFrom Wikipedia:
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are self-stable, domains can be "swapped" by genetic engineering between one protein and another to make chimera proteins.

The majority of genomic proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins created as a result of gene duplication events.[20] Many domains in multidomain structures could have once existed as independent proteins. More and more domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes.
http://en.wikipedia.org/wiki/Structural_domain

Other Comments by black wolf

8. Comment #182470 by RickM on May 20, 2008 at 7:48 am

 avatarGreat article; thanks.

Other Comments by RickM

9. Comment #182772 by adrrrian on May 21, 2008 at 12:45 am

In simple terms:

Multi-domain proteins are proteins made up of multiple independent units, which maybe on the same amino acid sequence; these units are often referred to as "domains".

E.g. Some domains are responsible for binding DNA, some other for binding other proteins.

Functional domains can be replaced artificially or by natural processes, through replacing the corresponding parts of the gene encoding the protein. After that the function of the whole protein will change accordingly.

Original work of Neighborhood Correlation:
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000063;jsessionid=D06FE6D07BDA62746438C5B852D8DA02

Other Comments by adrrrian
Reload Comments | Back to Top

Comment Entry: Please Login

Register a new account

Username:

Password: