Methods

Evolutionary Rate Covariation (ERC) is a sequence analysis method to reveal genes with similar evolutionary histories. The fundamental concept behind ERC is that functionally related genes have experienced shared evolutionary pressures and hence will have responded to them in parallel. In turn, ERC allows for the inference of gene-gene networks and gene function.

The ERC statistic itself is the correlation coefficient of the branch-specific evolutionary rates of 2 genes. The calculation begins by estimating the branch lengths for each gene over a fixed species topology. The resulting branch lengths are then transformed into relative rates compared to a unit length genome-wide average tree. Finally, a correlation coefficient –the ERC value– is calculated between every pair of genes using relative rates from corresponding branches. Gene pairs with more highly correlated histories have ERC values approaching 1.

How to Use ERC

The first consideration is to realize that not all genetic systems have an ERC signature. For example our studies found that roughly 60% of protein complexes and 25% of genetic diseases show an ERC signature. Accordingly, the most important first step is to run Group Test on a set of training genes that belong to a single pathway, complex, or specific biological function. If that test returns a low P-value, indicating that those genes show a consistent signal of ERC between them, then you know it is appropriate to continue with other analyses to infer new genes, functions, or to prioritize genes with that training set. Of course you could continue even when Group Test is negative, but take those predicted genes with a measure of skepticism. In short, start with Group Test, and only proceed if it shows a significant ERC signal.

Even when Group Test shows a signature, predictions from Top Genes and Top Functions are only enriched for genes functionally related to yours. Scan the list and be prepared to screen.

The most successful strategy has been to reduce the search space in Top Genes. The genome is a big place. It has been shown that a smaller search space to within a few thousand or hundred genes, for example genes with a specific expression profile or protein domain, greatly increases success rates (Findlay et al. PLOS Gen. 2014).

There is no magic ERC value cut-off. Functionally related genes will be more enriched for higher values. In practice, when values drop below 0.4, the enrichment can be low. A good guide for your specific test is to study the range of ERC values found in your initial Group Test.

Data

Mammals

This dataset is Homo sapiens rooted, and was generated with the protein sequences of mammalian species from UCSC

Download

The following paper is associated with this dataset:

Priedigkeit, Nolan; Wolfe, Nicholas; Clark, Nathan L. "Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene networks." PLoS Genetics.



Drosophila

This dataset was generated with the protein sequences of species from FlyBase

Download

The following paper is associated with this dataset:

Findlay, Geoffrey D; Sitnik, Jessica L.; Wang, Wenke; Aquadro, Charles F; Clark, Nathan L; Wolfner, Mariana F. "Evolutionary Rate Covariation Identifies New Members of a Protein Network Required for Drosophila melanogaster Female Post-Mating Responses." PLoS genetics 10(1): e1004108.



Yeast

This dataset was generated with the protein sequences of yeast species from SGD

Download

The following paper is associated with this dataset:

Clark, Nathan L; Alani, Eric; Aquadro, Charles F. "Evolutionary rate covariation reveals shared functionality and coexpression of genes." Genome Res 22(4): 714-720.