Clusters of ancestrally related genes (paraclusters) catalogued in 14 eukaryotic species

Description

Here we applied a statistical approach utilizing the hypergeometric distribution function to estimate probabilities for clusters of structurally related genes. Clusters are generated by performing a chromosome walk and grouping genes together into chains based on their proximity to one another, common structural annotation features and sequence similarity. The annotation databases utilized are Interpro, SCOP, Panther, and Ensembl paralogs and Ensembl protein families. The p-values obtained are subsequently converted to expectation values by multiplying the results by the size of the corresponding genome. The size of a chain is limited by selecting a cap on the largest allowed gap size (number of contiguous unrelated genes) within a cluster. The clusters presented here are derived using a limited gap size of 15 genes and clusters are deemed significant if they have expectation values lower than 0.01.

The orthology of clusters across the 13 species is derived by utilizing the InParanoid database which is designed to distinguish in-paralogs from out-paralogs. Clusters are designated as orthologous between two species if their share at least one in-paralog between them.

Reference

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species
Walker MB, King BL, Paigen K.
PLoS One. 2012;7(4):e35274. PMCID: PMC3338513 [ Full Text ]

Datasets

Documentation

Human
Chimp
Macaque
Mouse
Rat
Dog
Cow
Opossum
Chicken
Zebrafish
Fly
elegans
Yeast
Arabidopsis