关键词: bioinformatics low‐coverage molecular sexing sex assessment sex chromosome

来  源:   DOI:10.1002/ece3.9185   PDF(Pubmed)

Abstract:
Accurate sex identification is crucial for elucidating the biology of a species. In the absence of directly observable sexual characteristics, sex identification of wild fauna can be challenging, if not impossible. Molecular sexing offers a powerful alternative to morphological sexing approaches. Here, we present SeXY, a novel sex-identification pipeline, for very low-coverage shotgun sequencing data from a single individual. SeXY was designed to utilize low-effort screening data for sex identification and does not require a conspecific sex-chromosome assembly as reference. We assess the accuracy of our pipeline to data quantity by downsampling sequencing data from 100,000 to 1000 mapped reads and to reference genome selection by mapping to a variety of reference genomes of various qualities and phylogenetic distance. We show that our method is 100% accurate when mapping to a high-quality (highly contiguous N50 > 30 Mb) conspecific genome, even down to 1000 mapped reads. For lower-quality reference assemblies (N50 < 30 Mb), our method is 100% accurate with 50,000 mapped reads, regardless of reference assembly quality or phylogenetic distance. The SeXY pipeline provides several advantages over previously implemented methods; SeXY (i) requires sequencing data from only a single individual, (ii) does not require assembled conspecific sex chromosomes, or even a conspecific reference assembly, (iii) takes into account variation in coverage across the genome, and (iv) is accurate with only 1000 mapped reads in many cases.
摘要:
准确的性别鉴定对于阐明物种的生物学至关重要。在没有直接观察到的性特征的情况下,野生动物的性别鉴定可能具有挑战性,如果不是不可能。分子性别鉴定为形态学性别鉴定方法提供了强大的替代方法。这里,我们介绍SeXY,一个新的性别识别管道,来自单个个体的非常低覆盖率的鸟枪测序数据。SeXY被设计为利用低努力筛选数据进行性别鉴定,并且不需要特异性染色体组装作为参考。我们通过从100,000到1000个映射读数的下采样测序数据来评估数据数量的管道的准确性,以及通过映射到各种质量和系统发育距离的各种参考基因组来评估参考基因组选择的准确性。我们表明,当映射到高质量(高度连续的N50>30Mb)同源基因组时,我们的方法是100%准确的,甚至低至1000个映射读取。对于较低质量的参考组件(N50<30Mb),我们的方法100%准确,有50,000个映射读数,无论参考装配质量或系统发育距离如何。与以前实施的方法相比,SeXY管道提供了几个优点;SeXY(i)只需要来自单个个体的测序数据,(ii)不需要组装的同种性染色体,甚至是一个特定的引用程序集,(iii)考虑到整个基因组的覆盖率变化,和(iv)是准确的,在许多情况下仅有1000个映射读段。
公众号