GTEx

GTEx
  • 文章类型: Journal Article
    肿瘤微环境和IRGs与肿瘤的发生高度相关,programming,和预后。然而,它们在II级和III级胶质瘤中的作用,在这项研究中被称为LGG,还有待充分阐明。我们的研究旨在开发免疫相关特征,用于LGG的风险分层和预后预测。
    使用ssGSEA方法,我们评估了LGG人群的免疫特征。我们使用来自TCGA数据库的LGG样本和来自GTEx的正常样本进行了差异分析,鉴定412个差异表达的免疫相关基因(DEIRG)。随后,我们利用单变量Cox,拉索,和多变量Cox回归分析,建立基因预测模型和列线图预测模型。
    这里,我们发现估计得分,免疫评分和高免疫基质评分,高级别和异柠檬酸脱氢酶(IDH)野生型胶质瘤高于相应组,肿瘤纯度较低。更高的估计分数,间质评分和免疫评分提示LGG患者预后不良。与其他分子特征相比,我们的四基因预后模型显示出更高的准确性。使用CGGA作为测试集以及TCGA和CGGA组合队列的验证证实了其强大的预后价值。此外,整合预后模型和临床变量的列线图显示出增强的预测能力.
    我们的研究强调了在LGG患者中确定的四个DEIRG(KLRC3,MR1,PDIA2和RFXAP)的预后意义。本文开发的预测模型和列线图为LGG中的个性化治疗策略提供了有价值的工具。未来的研究应该集中在进一步验证这些发现,并探索这些DEIRG在LGG肿瘤微环境中的功能作用。
    UNASSIGNED: The tumor microenvironment and IRGs are highly correlated with tumor occurrence, progression, and prognosis. However, their roles in grade II and III gliomas, termed LGGs in this study, remain to be fully elucidated. Our research aims to develop immune-related features for risk stratification and prognosis prediction in LGG.
    UNASSIGNED: Using the ssGSEA method, we assessed the immune characteristics of the LGG population. We conducted differential analysis using LGG samples from the TCGA database and normal samples from GTEx, identifying 412 differentially expressed immune-related genes (DEIRGs). Subsequently, we utilized univariate Cox, LASSO, and multivariate Cox regression analyses to establish both a gene predictive model and a nomogram predictive model.
    UNASSIGNED: Here, we found that the ESTIMATE score, immune score and stromal score of high-immunity, high-grade and isocitrate dehydrogenase (IDH) wild-type glioma were higher than those of the corresponding group, and the tumor purity was lower. Higher ESTIMATE scores, stromal scores and immune scores indicated a poor prognosis in patients with LGG. Our four-gene prognostic model demonstrated superior accuracy compared to other molecular features. Validation using the CGGA as a testing set and the combined TCGA and CGGA cohort confirmed its robust prognostic value. Additionally, a nomogram integrating the prognostic model and clinical variables showed enhanced predictive capability.
    UNASSIGNED: Our study highlights the prognostic significance of the identified four DEIRGs (KLRC3, MR1, PDIA2, and RFXAP) in LGG patients. The predictive model and nomogram developed herein offer valuable tools for personalized treatment strategies in LGG. Future research should focus on further validating these findings and exploring the functional roles of these DEIRGs within the LGG tumor microenvironment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:已观察到肺腺癌(LUAD)的发病率存在显着性别差异,预后,和对治疗的反应。然而,造成这些差异的分子机制尚未得到广泛研究.
    方法:样品特异性基因调控网络方法用于分析来自基因型组织表达计划(GTEx)的非癌性人肺样品和来自癌症基因组图谱(TCGA)的肺腺癌原发肿瘤样品的RNA测序数据;结果在独立数据上进行验证。
    结果:我们发现与包括细胞增殖在内的关键生物学途径相关的基因,在健康的肺组织和肿瘤中,免疫反应和药物代谢在男性和女性之间受到差异调节,吸烟进一步扰乱了这些监管差异。我们还发现,在临床可操作的癌基因和抑癌基因的转录因子靶向模式中存在显著的性别偏见,包括AKT2和KRAS。使用健康和肿瘤样本之间的差异调节基因,结合药物再利用工具,我们确定了几种可能具有性别偏倚疗效的小分子药物作为癌症治疗药物,并使用独立的细胞系数据库进一步验证了这一观察结果.
    结论:这些发现强调了在制定疾病预防和管理策略时将性别作为生物学变量并考虑基因调控过程的重要性。
    肺腺癌(LUAD)是一种影响男性和女性的疾病。生物性别不仅影响疾病发展的机会,以及疾病的进展以及各种疗法的有效性。我们分析了由转录因子和它们在健康肺组织和LUAD中调节的基因组成的性别特异性基因调节网络,并确定了性别偏见的差异。我们发现与细胞增殖相关的基因,免疫反应,和药物代谢在男性和女性之间被转录因子不同地靶向。我们还发现了一些在LUAD中作为药物靶标的基因,在男性和女性之间也有不同的调节。重要的是,这些差异也受到个人吸烟史的影响。使用药物再利用工具扩展我们的分析,我们发现了候选药物,有证据表明它们可能对一种性别或另一种性别更好。这些结果表明,如果我们要制定预防和治疗LUAD的精准医学策略,那么考虑男性和女性之间基因调控的差异将是必不可少的。
    BACKGROUND: Lung adenocarcinoma (LUAD) has been observed to have significant sex differences in incidence, prognosis, and response to therapy. However, the molecular mechanisms responsible for these disparities have not been investigated extensively.
    METHODS: Sample-specific gene regulatory network methods were used to analyze RNA sequencing data from non-cancerous human lung samples from The Genotype Tissue Expression Project (GTEx) and lung adenocarcinoma primary tumor samples from The Cancer Genome Atlas (TCGA); results were validated on independent data.
    RESULTS: We found that genes associated with key biological pathways including cell proliferation, immune response and drug metabolism are differentially regulated between males and females in both healthy lung tissue and tumor, and that these regulatory differences are further perturbed by tobacco smoking. We also discovered significant sex bias in transcription factor targeting patterns of clinically actionable oncogenes and tumor suppressor genes, including AKT2 and KRAS. Using differentially regulated genes between healthy and tumor samples in conjunction with a drug repurposing tool, we identified several small-molecule drugs that might have sex-biased efficacy as cancer therapeutics and further validated this observation using an independent cell line database.
    CONCLUSIONS: These findings underscore the importance of including sex as a biological variable and considering gene regulatory processes in developing strategies for disease prevention and management.
    Lung adenocarcinoma (LUAD) is a disease that affects males and females differently. Biological sex not only influences chances of developing the disease, but also how the disease progresses and how effective various therapies may be. We analyzed sex-specific gene regulatory networks consisting of transcription factors and the genes they regulate in both healthy lung tissue and in LUAD and identified sex-biased differences. We found that genes associated with cell proliferation, immune response, and drug metabolism are differentially targeted by transcription factors between males and females. We also found that several genes that are drug targets in LUAD, are also regulated differently between males and females. Importantly, these differences are also influenced by an individual’s smoking history. Extending our analysis using a drug repurposing tool, we found candidate drugs with evidence that they might work better for one sex or the other. These results demonstrate that considering the differences in gene regulation between males and females will be essential if we are to develop precision medicine strategies for preventing and treating LUAD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    N6-甲基腺苷(m6A)和表观基因组之间的串扰对于基因调控至关重要。但其调控方向性和疾病意义尚不清楚.这里,我们利用数量性状基因座(QTLs)作为遗传工具来描绘m6A和两个表观基因组性状之间的串扰的方向图,DNA甲基化(DNAme)和H3K27ac。我们确定了47个m6A到H3K27ac和4,733个m6A到DNA,在相反的方向,106个H3K27ac至m6A和61,775个DNAme至m6A调节基因座,对于不同的调节方向,观察到不同的基因组位置偏好。将这些地图与复杂的疾病结合起来,我们优先考虑神经质的20个全基因组关联研究(GWAS)基因座,抑郁症,和大脑中的发作性睡病;1,767种哮喘和肺呼气流量特征的变体;249种冠状动脉疾病,血压,和肌肉的脉搏率。这项研究建立了疾病调节路径,如rs3768410-DNAme-m6A-哮喘和rs56104944-m6A-DNAme-高血压,揭示m6A和表观基因组层之间的基因座特异性串扰,并提供对人类疾病潜在调节回路的见解。
    Crosstalk between N6-methyladenosine (m6A) and epigenomes is crucial for gene regulation, but its regulatory directionality and disease significance remain unclear. Here, we utilize quantitative trait loci (QTLs) as genetic instruments to delineate directional maps of crosstalk between m6A and two epigenomic traits, DNA methylation (DNAme) and H3K27ac. We identify 47 m6A-to-H3K27ac and 4,733 m6A-to-DNAme and, in the reverse direction, 106 H3K27ac-to-m6A and 61,775 DNAme-to-m6A regulatory loci, with differential genomic location preference observed for different regulatory directions. Integrating these maps with complex diseases, we prioritize 20 genome-wide association study (GWAS) loci for neuroticism, depression, and narcolepsy in brain; 1,767 variants for asthma and expiratory flow traits in lung; and 249 for coronary artery disease, blood pressure, and pulse rate in muscle. This study establishes disease regulatory paths, such as rs3768410-DNAme-m6A-asthma and rs56104944-m6A-DNAme-hypertension, uncovering locus-specific crosstalk between m6A and epigenomic layers and offering insights into regulatory circuits underlying human diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:本研究旨在探讨ZIC2在肺腺癌(LUAD)免疫浸润中的作用及其机制。
    方法:分析TCGA数据中几种正常组织中ZIC2的表达,并分析其与LUAD患者基线特征的相关性。LUAD患者的免疫浸润分析采用CIBERSORT算法。对ZIC2与免疫细胞组成进行相关性分析。此外,预测ZIC2的潜在上游调控机制,以鉴定在LUAD中可能调控ZIC2的miRNA和lncRNA.还进行了体外和体内实验以证实ZIC2对LUAD细胞的细胞增殖和侵袭能力的潜在影响。
    结果:ZIC2在各种正常组织中表达降低,但在多个肿瘤中增加,包括LUAD,并与LUAD患者的预后相关。GO和KEGG的富集表明ZIC2可能与细胞周期和p53信号通路有关。ZIC2表达与T细胞CD4记忆静息、巨噬细胞M1和浆细胞,表明LUAD中ZIC2表达失调可能直接影响免疫浸润。ZIC2可能受几种不同的lncRNA介导的ceRNA机制调控。体外实验验证了ZIC2对LUAD细胞活力和侵袭能力的促进作用。体内实验验证了ZIC2可以加速裸鼠中的肿瘤生长。
    结论:由不同lncRNA介导的ceRNA机制调控的ZIC2可能通过介导肿瘤微环境中免疫细胞的组成在LUAD中发挥关键的调节作用。
    OBJECTIVE: This study aimed to conclude the effect and mechanism of ZIC2 on immune infiltration in lung adenocarcinoma (LUAD).
    METHODS: Expression of ZIC2 in several kinds of normal tissues of TCGA data was analyzed and its correlation with the baseline characteristic of LUAD patients were analyzed. The immune infiltration analysis of LUAD patients was performed by CIBERSORT algorithm. The correlation analysis between ZIC2 and immune cell composition was performed. Additionally, the potential upstream regulatory mechanisms of ZIC2 were predicted to identify the possible miRNAs and lncRNAs that regulated ZIC2 in LUAD. In vitro and in vivo experiments were also conducted to confirm the potential effect of ZIC2 on cell proliferation and invasion ability of LUAD cells.
    RESULTS: ZIC2 expression was decreased in various normal tissues, but increased in multiple tumors, including LUAD, and correlated with the prognosis of LUAD patients. Enrichment by GO and KEGG suggested the possible association of ZIC2 with cell cycle and p53 signal pathway. ZIC2 expression was significantly correlated with T cells CD4 memory resting, Macrophages M1, and plasma cells, indicating that dysregulated ZIC2 expression in LUAD may directly influence immune infiltration. ZIC2 might be regulated by several different lncRNA-mediated ceRNA mechanisms. In vitro experiments validated the promotive effect of ZIC2 on cell viability and invasion ability of LUAD cells. In vivo experiments validated ZIC2 can accelerate tumor growth in nude mouse.
    CONCLUSIONS: ZIC2 regulated by different lncRNA-mediated ceRNA mechanisms may play a critical regulatory role in LUAD through mediating the composition of immune cells in tumor microenvironment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    组织基因表达研究受到生物和技术变异来源的影响,可以大致分为想要的和不想要的变化。后者,如果没有解决,导致误导性的生物学结论。已经提出了减少不必要变化的方法,如归一化和批量校正。对变异的所有原因的更准确理解可以显着提高这些方法去除不需要的变异同时保留与感兴趣的生物学问题相对应的变异的能力。我们在基因型组织表达(GTEx)数据集(v8)中使用来自49个人体组织的17,282个样品来研究表达变异的模式和原因。将转录物表达转化为z-评分,并且仅评估最可变的2%的转录物并基于共表达模式进行聚类。根据组织学外观和元数据元素,将聚集的基因集分配给不同的生物学或技术原因。我们在样品中鉴定了522个可变转录物簇(每个组织的中值为11)。其中,63%的人被自信地解释,16%可能被解释了,7%是低置信度解释,14%没有明确的原因。组织学剖析注解46个簇。变异性的其他常见原因包括性别,测序污染,免疫球蛋白多样性,和组成组织的差异。较不常见的生物学原因包括死亡间隔(Hardy评分),疾病状态,和年龄。技术原因包括抽血时间和收获差异。在单细胞表达的TabulaSapiens数据集中可识别大量组织表达变化的许多原因。这是组织表达变异的潜在来源的最大探索之一。它揭示了可变基因表达的预期和意外原因,并证明了匹配的组织学标本的实用性。它进一步证明了获取有意义的组织采集元数据元素以用于改进的标准化的价值,批量校正,以及大量和单细胞RNA-seq数据的分析。
    Tissue gene expression studies are impacted by biological and technical sources of variation, which can be broadly classified into wanted and unwanted variation. The latter, if not addressed, results in misleading biological conclusions. Methods have been proposed to reduce unwanted variation, such as normalization and batch correction. A more accurate understanding of all causes of variation could significantly improve the ability of these methods to remove unwanted variation while retaining variation corresponding to the biological question of interest. We used 17,282 samples from 49 human tissues in the Genotype-Tissue Expression data set (v8) to investigate patterns and causes of expression variation. Transcript expression was transformed to z-scores, and only the most variable 2% of transcripts were evaluated and clustered based on coexpression patterns. Clustered gene sets were assigned to different biological or technical causes based on histologic appearances and metadata elements. We identified 522 variable transcript clusters (median: 11 per tissue) among the samples. Of these, 63% were confidently explained, 16% were likely explained, 7% were low confidence explanations, and 14% had no clear cause. Histologic analysis annotated 46 clusters. Other common causes of variability included sex, sequencing contamination, immunoglobulin diversity, and compositional tissue differences. Less common biological causes included death interval (Hardy score), disease status, and age. Technical causes included blood draw timing and harvesting differences. Many of the causes of variation in bulk tissue expression were identifiable in the Tabula Sapiens data set of single-cell expression. This is among the largest explorations of the underlying sources of tissue expression variation. It uncovered expected and unexpected causes of variable gene expression and demonstrated the utility of matched histologic specimens. It further demonstrated the value of acquiring meaningful tissue harvesting metadata elements to use for improved normalization, batch correction, and analysis of both bulk and single-cell RNA-seq data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    巨自噬/自噬是一个重要的分解代谢过程,靶向多种细胞成分,包括蛋白质,细胞器,和病原体。ATG7是一种参与自噬过程的蛋白质,在维持细胞稳态方面起着至关重要的作用,并且可以促进癌症等疾病的发展。ATG7通过促进生长的自噬体膜中ATG8蛋白的脂化而启动自噬。非规范同工型ATG7(2)无法进行ATG8脂化;但是,它的细胞调节和功能是未知的。这里,我们发现了ATG7(2)与ATG7(1)不同的调节和功能,规范的同工型。首先,亲和纯化质谱分析显示,ATG7(2)建立了直接的蛋白质-蛋白质相互作用(PPI)与代谢蛋白,而ATG7(1)主要与自噬机制蛋白相互作用。此外,我们确定ATG7(2)介导代谢活性的降低,强调了这种重要的自噬蛋白的新型剪接依赖性功能。然后,我们发现ATG7(1)和ATG7(2)在人体组织中的表达模式不同。最后,我们的工作揭示了不同的表达方式,蛋白质相互作用,与ATG7(1)相比,ATG7(2)的功能。这些发现表明,通过关键自噬基因的同种型依赖性表达,在主要分解代谢过程之间存在分子转换。
    Macroautophagy/autophagy is an essential catabolic process that targets a wide variety of cellular components including proteins, organelles, and pathogens. ATG7, a protein involved in the autophagy process, plays a crucial role in maintaining cellular homeostasis and can contribute to the development of diseases such as cancer. ATG7 initiates autophagy by facilitating the lipidation of the ATG8 proteins in the growing autophagosome membrane. The noncanonical isoform ATG7(2) is unable to perform ATG8 lipidation; however, its cellular regulation and function are unknown. Here, we uncovered a distinct regulation and function of ATG7(2) in contrast with ATG7(1), the canonical isoform. First, affinity-purification mass spectrometry analysis revealed that ATG7(2) establishes direct protein-protein interactions (PPIs) with metabolic proteins, whereas ATG7(1) primarily interacts with autophagy machinery proteins. Furthermore, we identified that ATG7(2) mediates a decrease in metabolic activity, highlighting a novel splice-dependent function of this important autophagy protein. Then, we found a divergent expression pattern of ATG7(1) and ATG7(2) across human tissues. Conclusively, our work uncovers the divergent patterns of expression, protein interactions, and function of ATG7(2) in contrast to ATG7(1). These findings suggest a molecular switch between main catabolic processes through isoform-dependent expression of a key autophagy gene.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    运动对身体的不同组织有不同的影响,这些总和可以确定对运动的反应和健康益处。在本研究中,我们的目的是研究体育锻炼是否调节骨骼肌(SM)和皮下脂肪组织(SAT)共同的转录网络。在两个组织中发现了八个这样的共享转录群落。18名超重的年轻人自愿参加了7周的力量和耐力综合训练(每周五次训练)。在训练前后从SM和SAT进行活检。五个网络社区受到SM培训的监管,但SAT没有变化。参与胰岛素-AMPK信号传导和葡萄糖利用的一个群落在SM中上调,但在SAT中下调。在两项独立研究中证实了这种不同的运动调节,并且在一个独立队列中也与BMI和糖尿病有关。因此,目前的发现与不同组织的差异反应一致,并表明身体成分可能会影响观察到的个体对运动训练的全身代谢反应,并有助于解释运动训练后观察到的全身胰岛素敏感性减弱,即使它对锻炼肌肉有重大影响。
    Exercise has different effects on different tissues in the body, the sum of which may determine the response to exercise and the health benefits. In the present study, we aimed to investigate whether physical training regulates transcriptional network communites common to both skeletal muscle (SM) and subcutaneous adipose tissue (SAT). Eight such shared transcriptional communities were found in both tissues. Eighteen young overweight adults voluntarily participated in 7 weeks of combined strength and endurance training (five training sessions per week). Biopsies were taken from SM and SAT before and after training. Five of the network communities were regulated by training in SM but showed no change in SAT. One community involved in insulin- AMPK signaling and glucose utilization was upregulated in SM but downregulated in SAT. This diverging exercise regulation was confirmed in two independent studies and was also associated with BMI and diabetes in an independent cohort. Thus, the current finding is consistent with the differential responses of different tissues and suggests that body composition may influence the observed individual whole-body metabolic response to exercise training and help explain the observed attenuated whole-body insulin sensitivity after exercise training, even if it has significant effects on the exercising muscle.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    与性状相关的单核苷酸多态性(SNP)通常解释性状遗传遗传的一小部分-其余的被认为分布在整个基因组中。这样的SNP可能改变生物学相关基因的表达水平。表达定量性状基因座(eQTL)网络分析已帮助功能性表征此类变体。我们系统分析了29个组织特异性eQTL网络中10个性状的SNP遗传力分布。我们发现,遗传力聚集在一个小的数量或组织特异性,功能相关的SNP基因模块,最大的发生在局部的“集线器”,这既是网络模块的基石,也是组织特异性调控元件。因此,网络结构既可以放大基因型-表型连接,又可以缓冲遗传变异对其他性状的有害影响。一起,这些结果为理解复杂的性状结构和识别携带大部分遗传力的关键突变定义了一个概念框架.
    Single Nucleotide Polymorphisms (SNPs) associated with traits typically explain a small part of the trait genetic heritability-with the remainder thought to be distributed throughout the genome. Such SNPs are likely to alter expression levels of biologically relevant genes. Expression Quantitative Trait Locus (eQTL) networks analysis has helped to functionally characterize such variants. We systematically analyze the distribution of SNP heritability for ten traits across 29 tissue-specific eQTL networks. We find that heritability is clustered in a small number or tissue-specific, functionally relevant SNP-gene modules and that the greatest occurs in local \"hubs\" that are both the cornerstone of the network\'s modules and tissue-specific regulatory elements. The network structure could thus both amplify the genotype-phenotype connection and buffer the deleterious effect of the genetic variations on other traits. Together, these results define a conceptual framework for understanding complex trait architecture and identifying key mutations carrying most of the heritability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阿尔茨海默病(AD)是一种进行性神经退行性疾病,影响全球超过5000万老年人。虽然AD的发病机制尚不完全清楚,根据目前的研究,研究人员能够鉴定出潜在的生物标志物基因和蛋白质,这些基因和蛋白质可以作为抗AD的有效靶点.本文旨在对AD生物标志物鉴定的最新进展进行全面综述。重点介绍了各种算法的使用,探索相关的生物过程,以及共同发生疾病的共同生物标志物的研究。此外,本文包括对研究文献中报道的关键基因的统计分析,并从AlzGen等数据库中确定与AD相关基因集的交集,GeneCard,和DisGeNet。对于这些基因集,除了富集分析,蛋白质-蛋白质相互作用(PPI)网络用于识别重叠基因中的中心基因。富集分析,蛋白质相互作用网络分析,以及基于GTEx数据库对多组重叠基因进行的组织特异性连通性分析。我们的工作为更好地理解AD的分子机制和更准确地识别关键AD标记奠定了基础。
    Alzheimer\'s disease (AD) is a progressive neurodegenerative disorder that affects over 50 million elderly individuals worldwide. Although the pathogenesis of AD is not fully understood, based on current research, researchers are able to identify potential biomarker genes and proteins that may serve as effective targets against AD. This article aims to present a comprehensive overview of recent advances in AD biomarker identification, with highlights on the use of various algorithms, the exploration of relevant biological processes, and the investigation of shared biomarkers with co-occurring diseases. Additionally, this article includes a statistical analysis of key genes reported in the research literature, and identifies the intersection with AD-related gene sets from databases such as AlzGen, GeneCard, and DisGeNet. For these gene sets, besides enrichment analysis, protein-protein interaction (PPI) networks utilized to identify central genes among the overlapping genes. Enrichment analysis, protein interaction network analysis, and tissue-specific connectedness analysis based on GTEx database performed on multiple groups of overlapping genes. Our work has laid the foundation for a better understanding of the molecular mechanisms of AD and more accurate identification of key AD markers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    基因共表达网络(GCN)描述表达的基因之间的关系,这些基因是维持细胞同一性和体内平衡的关键。然而,典型的RNA-seq实验的小样本量比基因数量少几个数量级,太低而不能可靠地推断GCN。recount3,一个公开可用的数据集,由316,443个统一处理的人类RNA-seq样本组成,提供了一个机会来提高准确网络重建的能力,并从所产生的网络中获得生物学洞察力。
    我们比较了替代聚合策略,以确定通过数据聚合进行GCN推断的最佳工作流程,并推断了三个共识网络:通用网络,一个非癌症网络,除了27个组织背景特异性网络外,还有一个癌症网络。来自我们共识网络的中心网络基因被丰富用于进化受限的基因和普遍存在的生物学途径,而中心上下文特异性网络基因包括组织特异性转录因子,基于中心的因式分解导致相关组织上下文的聚类.我们发现,与从聚合数据推断的上下文特定网络相对应的注释在性状遗传力方面得到了丰富,超出了已知的功能基因组注释,并且当我们在更多数量的样本上进行聚合时,它们的特征遗传力得到了丰富。
    本研究概述了通过数据聚合进行网络GCN推断和评估的最佳实践。我们建议在汇总之前估计和回归每个数据集中的混杂因素,并优先考虑GCN重建的大样本量研究。推断特定于上下文的网络的统计能力增加,可以推导出变体注释,这些注释丰富了与上下文无关的功能基因组注释的一致性性状遗传力。虽然我们观察到数据聚合严格增加了保留的对数可能性,我们注意到边际改善在递减。未来的方向旨在用于估计混杂因素和整合来自Hi-C和ChIP-seq等模态的正交信息的替代方法可以进一步改善GCN推断。
    UNASSIGNED: Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks.
    UNASSIGNED: We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples.
    UNASSIGNED: This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号