UNASSIGNED: We investigated characteristics of 176,401 ERV-ORFs containing retroviral-like protein domains (gag, pro, pol, and env) in 19 mammalian genomes. The fractions of ERVs possessing ORFs were overall small (~ 0.15%) although they varied depending on domain types as well as species. The observed divergence of ERV-ORF from their consensus sequences showed bimodal distributions, suggesting that a large proportion of ERV-ORFs either recently, or anciently, inserted themselves into mammalian genomes. Alternatively, very few ERVs lacking ORFs were found to exhibit similar divergence patterns. To identify candidates for ERV-derived genes, we estimated the ratio of non-synonymous to synonymous substitution rates (dN/dS) for ERV-ORFs in human and non-human mammalian pairs, and found that approximately 42% of the ERV-ORFs showed dN/dS < 1. Further, using functional genomics data including transcriptome sequencing, we determined that approximately 9.7% of these selected ERV-ORFs exhibited transcriptional potential.
UNASSIGNED: These results suggest that purifying selection operates on a certain portion of ERV-ORFs, some of which may correspond to uncharacterized functional genes hidden within mammalian genomes. Together, our analyses suggest that more ERV-ORFs may be co-opted in a host-species specific manner than we currently know, which are likely to have contributed to mammalian evolution and diversification.