关键词: Methylation contamination germline machine learning somatic sperm

Mesh : Male Humans Spermatozoa DNA Methylation Computational Biology Machine Learning Epigenesis, Genetic Logistic Models Sperm Count

来  源:   DOI:10.1080/19396368.2024.2368716

Abstract:
The assessment of epigenetic profiles in sperm is sensitive to somatic cell contamination, which can influence methylation signals at gene promoters. This contamination is particularly problematic in the assessment of DNA methylation in samples with low sperm counts, where fractional amounts of somatic cell DNA can lead to significant shifts in measured methylation state. In this study, a new method of detecting possible somatic cell contamination is proposed through two multi-region bioinformatic models: a traditional differential methylation analysis and a machine learning logistic regression model. These models were trained on publicly available sperm (n = 489) and blood (n = 1029) DNA methylation array data and tested on a contamination set, wherein the sperm of four donors with normal sperm counts were run on a 450k methylation array with four permutations each, including pure blood, half blood and half sperm by DNA concentration, half blood and half sperm by cell count, and pure sperm (n = 16). The DMR and logistic regression model classified the contamination testing set with 100% and 94% accuracy, respectively. These new methods of detecting the effects of somatic cell contamination allow for more accurate differentiation between epigenetic profiles that contain a biological somatic-like shift and those that have somatic-like signatures because of contamination.
摘要:
精子表观遗传谱的评估对体细胞污染敏感,可以影响基因启动子处的甲基化信号。这种污染在精子数量低的样品中DNA甲基化的评估中尤其成问题。其中体细胞DNA的分数可以导致测量的甲基化状态的显著变化。在这项研究中,通过两个多区域生物信息模型:传统的差异甲基化分析和机器学习逻辑回归模型,提出了一种检测可能的体细胞污染的新方法。这些模型在公开可用的精子(n=489)和血液(n=1029)DNA甲基化阵列数据上进行了训练,并在污染集上进行了测试。其中4个精子计数正常的捐献者的精子在450k甲基化阵列上运行,每个阵列有4个排列,包括纯血,一半血液和一半精子的DNA浓度,半血半精细胞计数,和纯精子(n=16)。DMR和逻辑回归模型以100%和94%的准确度对污染测试集进行分类,分别。这些检测体细胞污染影响的新方法允许更准确地区分包含生物体细胞样变化的表观遗传谱和由于污染而具有体细胞样特征的表观遗传谱。
公众号