dark genes

  • 文章类型: Journal Article
    背景:在复杂的生物学过程中存在关键的过渡或临界点。这种关键的过渡通常伴随着灾难性的后果。因此,寻找临界点或临界状态对于防止或延迟灾难性后果的发生至关重要。然而,基于高维小样本数据的临界状态预测是一个难题,特别是对于单细胞表达数据。
    结果:在这项研究中,我们提出了基于邻域的综合扰动互信息(CPMI)方法来检测复杂生物过程的临界状态。CPMI方法考虑了基因和邻居之间的关系,从而降低噪声,增强鲁棒性。该方法应用于一个模拟数据集和六个真实数据集,包括流感数据集,两个单细胞表达式数据集和三个批量数据集。该方法不仅能成功检测出临界点,还可以识别它们的动态网络生物标志物(DNB)。此外,转录因子(TFs)的发现可以调节DNB基因和非差异的“暗基因”验证了我们方法的有效性。数值仿真验证了CPMI方法在不同噪声强度下的鲁棒性,在识别临界状态方面优于现有方法。
    结论:结论:我们提出了一种鲁棒的计算方法,即,CPMI,这适用于批量和单细胞数据集。CPMI方法在为复杂的生物过程提供早期预警信号和实现早期疾病诊断方面具有巨大潜力。
    BACKGROUND: There exists a critical transition or tipping point during the complex biological process. Such critical transition is usually accompanied by the catastrophic consequences. Therefore, hunting for the tipping point or critical state is of significant importance to prevent or delay the occurrence of catastrophic consequences. However, predicting critical state based on the high-dimensional small sample data is a difficult problem, especially for single-cell expression data.
    RESULTS: In this study, we propose the comprehensive neighbourhood-based perturbed mutual information (CPMI) method to detect the critical states of complex biological processes. The CPMI method takes into account the relationship between genes and neighbours, so as to reduce the noise and enhance the robustness. This method is applied to a simulated dataset and six real datasets, including an influenza dataset, two single-cell expression datasets and three bulk datasets. The method can not only successfully detect the tipping points, but also identify their dynamic network biomarkers (DNBs). In addition, the discovery of transcription factors (TFs) which can regulate DNB genes and nondifferential \'dark genes\' validates the effectiveness of our method. The numerical simulation verifies that the CPMI method is robust under different noise strengths and is superior to the existing methods on identifying the critical states.
    CONCLUSIONS: In conclusion, we propose a robust computational method, i.e., CPMI, which is applicable in both the bulk and single cell datasets. The CPMI method holds great potential in providing the early warning signals for complex biological processes and enabling early disease diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    PRR12基因的单倍功能不全与人类神经眼综合征有关。尽管被鉴定为在胚胎小鼠大脑中高度表达的核蛋白,PRR12分子功能仍然难以捉摸。本研究探讨了斑马鱼PRR12同源物的时空表达,prr12a和prr12b,作为阐明其功能的第一步。计算机模拟分析揭示了两种直系同源物的DNA相互作用域的高度进化保守性,在prr12b基因座上观察到显着的同势保守性。斑马鱼胚胎和幼虫的原位杂交和RT-qPCR分析揭示了不同的表达模式:prr12a在合子发育早期表达,主要在中枢神经系统,而prr12b表达在原肠胚形成过程中开始,后来定位到多巴胺能端脑和间脑细胞簇。两种转录本都富集在72hpf视网膜的神经节细胞和内部神经层中,prr12b广泛分布在睫状边缘区。在成年人的大脑中,prr12a和prr12b在小脑中发现,杏仁核和腹侧端脑,这代表了自闭症患者受影响的主要领域。总的来说,这项研究表明,PRR12可能参与眼睛和大脑发育,为PRR12相关神经行为障碍的进一步研究奠定基础。
    Haploinsufficiency of the PRR12 gene is implicated in a human neuro-ocular syndrome. Although identified as a nuclear protein highly expressed in the embryonic mouse brain, PRR12 molecular function remains elusive. This study explores the spatio-temporal expression of zebrafish PRR12 co-orthologs, prr12a and prr12b, as a first step to elucidate their function. In silico analysis reveals high evolutionary conservation in the DNA-interacting domains for both orthologs, with significant syntenic conservation observed for the prr12b locus. In situ hybridization and RT-qPCR analyses on zebrafish embryos and larvae reveal distinct expression patterns: prr12a is expressed early in zygotic development, mainly in the central nervous system, while prr12b expression initiates during gastrulation, localizing later to dopaminergic telencephalic and diencephalic cell clusters. Both transcripts are enriched in the ganglion cell and inner neural layers of the 72 hpf retina, with prr12b widely distributed in the ciliary marginal zone. In the adult brain, prr12a and prr12b are found in the cerebellum, amygdala and ventral telencephalon, which represent the main areas affected in autistic patients. Overall, this study suggests PRR12\'s potential involvement in eye and brain development, laying the groundwork for further investigations into PRR12-related neurobehavioral disorders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:早期诊断5q-脊髓性肌萎缩症(5q-SMA)的重要性提高,因为早期干预可以显著改善临床预后。在96%的案例中,5q-SMA是由SMN1的纯合缺失引起的。大约4%的患者在另一个等位基因上携带SMN1缺失和单核苷酸变异(SNV)。传统上,诊断基于多重连接探针扩增(MLPA)以检测SMN1中的纯合或杂合外显子7缺失。由于SMN1/SMN2基因座内的高度同源性,通过标准Sanger或短读取下一代测序(srNGS)方法来鉴定SMN1基因的SNV的序列分析是不可靠的。
    目的:目的是克服高通量srNGS的局限性,目的是为SMA患者提供快速可靠的诊断,使其能够及时治疗。
    方法:在srNGS分析中检测纯合SMN1缺失和SMN1SNV的生物信息学工作流程被应用于建议的神经肌肉疾病(1684例患者)和产前诊断中的胎儿样本(260例患者)的诊断性完整外显子组和小组测试。通过将来自SMN1和SMN2的测序读段与SMN1参考序列比对来检测SNV。纯合SMN1缺失通过过滤序列读数来鉴定,,基因决定变异(GDV)。
    结果:10例患者根据(i)SMN1缺失和半合子SNV(2例)诊断为5q-SMA,(ii)纯合SMN1缺失(6例),和(iii)SMN1(2名患者)中的复合杂合SNV。
    结论:在基于srNGS的面板和全外显子组测序(WES)中应用我们的工作流程在临床实验室中至关重要,因为最初不怀疑患有SMA的临床表现不典型的患者仍未被诊断。
    BACKGROUND: The importance of early diagnosis of 5q-Spinal muscular atrophy (5q-SMA) has heightened as early intervention can significantly improve clinical outcomes. In 96% of cases, 5q-SMA is caused by a homozygous deletion of SMN1. Around 4 % of patients carry a SMN1 deletion and a single-nucleotide variant (SNV) on the other allele. Traditionally, diagnosis is based on multiplex ligation probe amplification (MLPA) to detect homozygous or heterozygous exon 7 deletions in SMN1. Due to high homologies within the SMN1/SMN2 locus, sequence analysis to identify SNVs of the SMN1 gene is unreliable by standard Sanger or short-read next-generation sequencing (srNGS) methods.
    OBJECTIVE: The objective was to overcome the limitations in high-throughput srNGS with the aim of providing SMA patients with a fast and reliable diagnosis to enable their timely therapy.
    METHODS: A bioinformatics workflow to detect homozygous SMN1 deletions and SMN1 SNVs on srNGS analysis was applied to diagnostic whole exome and panel testing for suggested neuromuscular disorders (1684 patients) and to fetal samples in prenatal diagnostics (260 patients). SNVs were detected by aligning sequencing reads from SMN1 and SMN2 to an SMN1 reference sequence. Homozygous SMN1 deletions were identified by filtering sequence reads for the ,, gene-determining variant\" (GDV).
    RESULTS: 10 patients were diagnosed with 5q-SMA based on (i) SMN1 deletion and hemizygous SNV (2 patients), (ii) homozygous SMN1 deletion (6 patients), and (iii) compound heterozygous SNVs in SMN1 (2 patients).
    CONCLUSIONS: Applying our workflow in srNGS-based panel and whole exome sequencing (WES) is crucial in a clinical laboratory, as otherwise patients with an atypical clinical presentation initially not suspected to suffer from SMA remain undiagnosed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    2型糖尿病(T2DM)是由多种病因引起的代谢性疾病,其发展可分为三种状态:正常状态,临界状态/疾病前状态,和疾病状态。为了避免不可逆转的发展,在T2DM发病前检测早期预警信号非常重要。然而,基于高通量和强噪声数据检测复杂疾病的临界状态仍然是一项具有挑战性的任务。在这项研究中,我们开发了一种新方法,即,度矩阵网络熵(DMNE),基于样本特定网络(SSN)检测T2DM的临界状态。通过将该方法应用于三种不同组织的数据集,用于大鼠涉及T2DM的实验,检测到临界状态,并成功鉴定了动态网络生物标志物(DNB)。具体来说,肝脏和肌肉,关键转变发生在4周和16周。对于脂肪,关键过渡是在8周。此外,我们发现一些“暗基因”没有表现出差异表达,但在DMNE评分方面表现出敏感性,与T2DM的进展密切相关。我们研究中发现的信息不仅提供了有关T2DM分子机制的进一步证据,而且还可能有助于预防这种疾病的策略的开发。
    Type 2 diabetes mellitus (T2DM) is a metabolic disease caused by multiple etiologies, the development of which can be divided into three states: normal state, critical state/pre-disease state, and disease state. To avoid irreversible development, it is important to detect the early warning signals before the onset of T2DM. However, detecting critical states of complex diseases based on high-throughput and strongly noisy data remains a challenging task. In this study, we developed a new method, i.e., degree matrix network entropy (DMNE), to detect the critical states of T2DM based on a sample-specific network (SSN). By applying the method to the datasets of three different tissues for experiments involving T2DM in rats, the critical states were detected, and the dynamic network biomarkers (DNBs) were successfully identified. Specifically, for liver and muscle, the critical transitions occur at 4 and 16 weeks. For adipose, the critical transition is at 8 weeks. In addition, we found some \"dark genes\" that did not exhibit differential expression but displayed sensitivity in terms of their DMNE score, which is closely related to the progression of T2DM. The information uncovered in our study not only provides further evidence regarding the molecular mechanisms of T2DM but may also assist in the development of strategies to prevent this disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    可以将复杂疾病的演化建模为与时间相关的非线性动力学系统,它的发展可以分为三种状态,即,正常状态,疾病前状态和疾病状态。疾病的突然恶化可以被视为动态系统在临界状态或疾病前状态下的状态转变。如何基于单样本数据检测个体疾病前的临界状态引起了众多研究者的关注。
    在这项研究中,我们提出了一种新的方法,即,基于单样本的Jensen-ShannonDivergence(sJSD)方法,用于基于单个单样本数据在关键转变之前检测复杂疾病的预警信号。该方法旨在构建基于SJSD的评分指标,即,不一致指数(ICI)。
    此方法适用于五个真实数据集,包括前列腺癌,膀胱尿路上皮癌,流感病毒感染,宫颈鳞癌、宫颈腺癌和胰腺腺癌。5个数据集及其相应的sJSD信号生物标志物的临界状态被成功地识别,以诊断和预测每个单独的样本,并揭示了一些没有差异表达但对ICI评分敏感的“暗基因”。这种方法是一种数据驱动和无模型的方法,不仅可以应用于个体的疾病预测,还可以应用于每种疾病的靶向药物设计。同时,sJSD信号生物标志物的鉴定对于从动态角度研究疾病进展的分子机制也具有重要意义。
    BACKGROUND: The evolution of complex diseases can be modeled as a time-dependent nonlinear dynamic system, and its progression can be divided into three states, i.e., the normal state, the pre-disease state and the disease state. The sudden deterioration of the disease can be regarded as the state transition of the dynamic system at the critical state or pre-disease state. How to detect the critical state of an individual before the disease state based on single-sample data has attracted many researchers\' attention.
    METHODS: In this study, we proposed a novel approach, i.e., single-sample-based Jensen-Shannon Divergence (sJSD) method to detect the early-warning signals of complex diseases before critical transitions based on individual single-sample data. The method aims to construct score index based on sJSD, namely, inconsistency index (ICI).
    RESULTS: This method is applied to five real datasets, including prostate cancer, bladder urothelial carcinoma, influenza virus infection, cervical squamous cell carcinoma and endocervical adenocarcinoma and pancreatic adenocarcinoma. The critical states of 5 datasets with their corresponding sJSD signal biomarkers are successfully identified to diagnose and predict each individual sample, and some \"dark genes\" that without differential expressions but are sensitive to ICI score were revealed. This method is a data-driven and model-free method, which can be applied to not only disease prediction on individuals but also targeted drug design of each disease. At the same time, the identification of sJSD signal biomarkers is also of great significance for studying the molecular mechanism of disease progression from a dynamic perspective.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人类基因组包含“暗”基因区域,无法使用标准短读取测序技术充分组装或对齐,阻止研究人员识别这些基因区域内可能与人类疾病相关的突变。这里,我们通过深度来识别几乎没有可映射读数的区域,我们称之为黑暗,和其他有模糊对齐的人,叫做伪装。我们评估长读取或链接读取技术如何很好地解决这些区域。
    基于标准全基因组Illumina测序数据,我们从对人类健康重要的途径中鉴定出6054个基因体中的36,794个暗区,发展,和繁殖。在这些基因体中,8.7%为完全黑暗,35.2%为≥5%黑暗。我们确定了748个基因的蛋白质编码外显子中存在的暗区。来自10x基因组学的链接读取或长读取测序技术,PacBio,牛津纳米孔技术将暗蛋白编码区减少到大约50.5%,35.6%,9.6%,分别。我们提出了一种算法来解决大多数伪装区域,并将其应用于阿尔茨海默病测序项目。我们拯救了CR1中罕见的十核苷酸移码缺失,CR1是阿尔茨海默病的顶级基因,在疾病病例中发现,但在对照中没有发现。
    由于样本量不足,我们无法正式评估CR1移码突变与阿尔茨海默病的关联,我们认为值得在更大的队列中进行调查。仍有数千个潜在重要的基因组区域被短读取测序所忽略,这些区域主要由长读取技术解决。
    The human genome contains \"dark\" gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.
    Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are ≥ 5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer\'s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer\'s disease gene, found in disease cases but not in controls.
    While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer\'s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号