FDR

FDR
  • 文章类型: Journal Article
    需要从高维数据源中选择中介,比如神经影像数据和基因数据,出现在许多科学研究中。在这项工作中,我们制定了一个多假设检验框架,用于从高维候选集中选择介体,并提出了一种方法,这扩展了最近在错误发现率(FDR)控制变量选择中的最新发展,该变量选择具有FDR控制的介体。我们证明了所提出的方法和算法实现了有限样本FDR控制。我们提供了大量的仿真结果,以证明与现有方法相比的功率和有限样本性能。最后,我们展示了分析青少年脑认知发育(ABCD)研究的方法,其中所提出的方法选择了几种静息态功能磁共振成像连接标志物作为NIH工具箱中不良儿童事件与结晶复合评分之间关系的介质.
    The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:胎儿性别影响妊娠期间胎儿和产妇的健康结局,但是这种联系仍然知之甚少。由于胎盘是胎儿交流的途径,来源于胎儿基因组,胎盘基因表达的性别差异可能解释了这些结果。
    目的:我们利用下一代测序技术来研究孕早期和孕晚期两种性别的正常人胎盘,以产生基于性别和妊娠的规范转录组。
    方法:我们分析了124个孕早期(T1,59名女性和65名男性)和43个孕晚期(T3,18名女性和25名男性)样本在每个孕期的性别差异和性别特异性妊娠差异。
    结果:胎盘在T1表现出更明显的性二态性,有94个T1和26个T3差异表达基因(DEGs)。性染色体在T1中占DEGs的60.6%,在T3中占DEGs的80.8%,不包括X/Y伪常染色体区域。有6个来自伪常染色体区域的DEGs,仅在T1中显着,在男性中均上调。DEGs在X染色体上的分布表明Xp(短臂)上的基因在胎盘性别差异中可能特别重要。X/Y同源基因的剂量补偿分析显示表达主要由X染色体贡献。在妊娠早期和晚期的性别特异性分析中,在T1中,男女共有2815个DEG上调,在T3中3263个普通DEG上调。T1有7个女性专属DEG上调,T3有15个女性专属DEG上调,T1有10个男性专属DEG上调,T3有20个男性专属DEG上调。
    结论:这是从健康妊娠开始的最大的胎盘队列,定义了规范的性别双态基因表达和性别共同,跨妊娠的性别特异性和性别专有基因表达。孕早期有最多的性二态笔录,在两个三个月中,与男性相比,女性中的大多数人都被上调。X染色体的短臂和伪常染色体区域在确定妊娠早期胎盘的性别差异方面尤为重要。由于怀孕是一个动态的状态,妊娠期性别特异性DEGs可能导致总体结局的性别二态变化.
    BACKGROUND: Fetal sex affects fetal and maternal health outcomes in pregnancy, but this connection remains poorly understood. As the placenta is the route of fetomaternal communication and derives from the fetal genome, placental gene expression sex differences may explain these outcomes.
    OBJECTIVE: We utilized next generation sequencing to study the normal human placenta in both sexes in first and third trimester to generate a normative transcriptome based on sex and gestation.
    METHODS: We analyzed 124 first trimester (T1, 59 female and 65 male) and 43 third trimester (T3, 18 female and 25 male) samples for sex differences within each trimester and sex-specific gestational differences.
    RESULTS: Placenta shows more significant sexual dimorphism in T1, with 94 T1 and 26 T3 differentially expressed genes (DEGs). The sex chromosomes contributed 60.6% of DEGs in T1 and 80.8% of DEGs in T3, excluding X/Y pseudoautosomal regions. There were 6 DEGs from the pseudoautosomal regions, only significant in T1 and all upregulated in males. The distribution of DEGs on the X chromosome suggests genes on Xp (the short arm) may be particularly important in placental sex differences. Dosage compensation analysis of X/Y homolog genes shows expression is primarily contributed by the X chromosome. In sex-specific analyses of first versus third trimester, there were 2815 DEGs common to both sexes upregulated in T1, and 3263 common DEGs upregulated in T3. There were 7 female-exclusive DEGs upregulated in T1, 15 female-exclusive DEGs upregulated in T3, 10 male-exclusive DEGs upregulated in T1, and 20 male-exclusive DEGs upregulated in T3.
    CONCLUSIONS: This is the largest cohort of placentas across gestation from healthy pregnancies defining the normative sex dimorphic gene expression and sex common, sex specific and sex exclusive gene expression across gestation. The first trimester has the most sexually dimorphic transcripts, and the majority were upregulated in females compared to males in both trimesters. The short arm of the X chromosome and the pseudoautosomal region is particularly critical in defining sex differences in the first trimester placenta. As pregnancy is a dynamic state, sex specific DEGs across gestation may contribute to sex dimorphic changes in overall outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    神经肽代表一类独特的信号分子,已引起很多关注,但在从质谱中收集鉴定时需要特别考虑。具有高度可变的序列长度,神经肽必须在其内源性状态进行分析。Further,神经肽在家族内具有很大的同源性,只有一个氨基酸残基,甚至复杂的常规分析和需要优化的计算策略,以自信和准确的识别。我们介绍EndoGenius,一种数据库搜索策略,专门设计用于通过利用优化的肽谱匹配方法从质谱中阐明神经肽鉴定,一个扩展的主题数据库,以及一种新颖的评分算法,以实现神经肽组的更广泛表示并最大程度地减少重新识别。这项工作描述了一种算法,该算法能够在5种Callinectessapidus神经元组织类型中以1%的错误发现率比替代软件报告更多的神经肽鉴定。
    Neuropeptides represent a unique class of signaling molecules that have garnered much attention but require special consideration when identifications are gleaned from mass spectra. With highly variable sequence lengths, neuropeptides must be analyzed in their endogenous state. Further, neuropeptides share great homology within families, differing by as little as a single amino acid residue, complicating even routine analyses and necessitating optimized computational strategies for confident and accurate identifications. We present EndoGenius, a database searching strategy designed specifically for elucidating neuropeptide identifications from mass spectra by leveraging optimized peptide-spectrum matching approaches, an expansive motif database, and a novel scoring algorithm to achieve broader representation of the neuropeptidome and minimize reidentification. This work describes an algorithm capable of reporting more neuropeptide identifications at 1% false-discovery rate than alternative software in five Callinectes sapidus neuronal tissue types.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自上而下的蛋白质组学(TDP)直接分析完整的蛋白质,因此比依赖消化肽和蛋白质推断的常规自下而上的蛋白质组学(BUP)提供更全面的定性和定量的蛋白质形式水平信息。虽然TDP在样品制备方面取得了重大进展,分离,仪器仪表,和数据分析,可靠和可重复的数据分析仍然是TDP的主要瓶颈之一。稳健数据分析的关键步骤是在蛋白形式识别中建立蛋白形式水平错误发现率(FDR)的客观估计。最广泛使用的FDR估计方案是基于目标诱饵方法(TDA),这主要是为BUP建立的。WepresentevidencethattheTDA-basedFDRestimationmaynotworkattheproteoform-levelduetoanignoredfactors,即前体质量的错误反褶积,这导致错误的FDR估计。我们认为,除非考虑到前体去卷积错误率,否则蛋白形式鉴定中基于TDA的常规FDR实际上是蛋白质水平的FDR,而不是蛋白形式水平的FDR。为了解决这个问题,我们提出了一个公式,通过结合基于TDA的FDR和前体去卷积错误率来校正蛋白形式水平的FDR偏差。
    Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于其在许多生物过程中的相关性,磷酸化是研究人员非常感兴趣的翻译后修饰。LC-MS/MS技术实现了高通量数据采集,研究声称鉴定和定位了数千个磷酸位点。磷酸化位点的识别和定位来自不同的分析管道和评分算法,不确定性贯穿整个管道。对于许多管道和算法,使用任意阈值,但在这些研究中,对实际的全球错误定位率知之甚少。最近,有人建议使用诱饵氨基酸来估计磷酸位点的全球错误定位率,在报告的肽谱匹配中。这里,我们描述了一个简单的管道,旨在通过客观地从肽谱匹配到肽形式位点水平的折叠来最大化从这些研究中提取的信息,以及结合多项研究的结果,同时保持对错误定位率的跟踪。我们表明,该方法比使用更简单的机制来处理研究内部和跨研究的磷酸位点识别冗余的当前过程更有效。在我们使用八个水稻磷酸蛋白质组学数据集的案例研究中,与使用未知错误定位率的传统阈值法的4687相比,使用我们的诱饵方法确定了6368个独特位点。
    Phosphorylation is a post-translational modification of great interest to researchers due to its relevance in many biological processes. LC-MS/MS techniques have enabled high-throughput data acquisition, with studies claiming identification and localization of thousands of phosphosites. The identification and localization of phosphosites emerge from different analytical pipelines and scoring algorithms, with uncertainty embedded throughout the pipeline. For many pipelines and algorithms, arbitrary thresholding is used, but little is known about the actual global false localization rate in these studies. Recently, it has been suggested to use decoy amino acids to estimate global false localization rates of phosphosites, among the peptide-spectrum matches reported. Here, we describe a simple pipeline aiming to maximize the information extracted from these studies by objectively collapsing from peptide-spectrum match to the peptidoform-site level, as well as combining findings from multiple studies while maintaining track of false localization rates. We show that the approach is more effective than current processes that use a simpler mechanism for handling phosphosite identification redundancy within and across studies. In our case study using eight rice phosphoproteomics data sets, 6368 unique sites were confidently identified using our decoy approach compared to 4687 using traditional thresholding in which false localization rates are unknown.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    神经肽是一类在生化中具有关键调节作用的内源性肽,生理,和行为过程。神经肽的质谱分析通常依赖于用于数据库搜索和肽鉴定的蛋白质信息学工具。由于神经肽数据库通常是通过实验建立的,并且由彼此具有高度序列相似性的短序列组成,我们开发了一种新颖的数据库搜索工具,HyPep,它利用序列同源性搜索进行肽鉴定。HyPep比对从头测序的肽,通过PEAKS软件生成,与神经肽数据库序列,并根据比对得分识别神经肽。使用LC-MS/MS测量各种神经组织类型的肽提取物来优化HyPep性能,并与商业数据库搜索软件进行比较。峰值DB。HyPep在1%的错误发现率下,从每种组织类型中识别出的神经肽比PEAKSDB更多,两个程序的错误匹配率为2%。除了识别,本报告描述了HyPep如何帮助发现新的神经肽.
    Neuropeptides are a class of endogenous peptides that have key regulatory roles in biochemical, physiological, and behavioral processes. Mass spectrometry analyses of neuropeptides often rely on protein informatics tools for database searching and peptide identification. As neuropeptide databases are typically experimentally built and comprised of short sequences with high sequence similarity to each other, we developed a novel database searching tool, HyPep, which utilizes sequence homology searching for peptide identification. HyPep aligns de novo sequenced peptides, generated through PEAKS software, with neuropeptide database sequences and identifies neuropeptides based on the alignment score. HyPep performance was optimized using LC-MS/MS measurements of peptide extracts from various Callinectes sapidus neuronal tissue types and compared with a commercial database searching software, PEAKS DB. HyPep identified more neuropeptides from each tissue type than PEAKS DB at 1% false discovery rate, and the false match rate from both programs was 2%. In addition to identification, this report describes how HyPep can aid in the discovery of novel neuropeptides.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Protein methylation is a widespread post-translational modification (PTM) involved in several important biological processes including, but not limited to, RNA splicing, signal transduction, translation, and DNA repair. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is considered today the most versatile and accurate technique to profile PTMs with high precision and proteome-wide depth; however, the identification of protein methylations by MS is still prone to high false discovery rates. In this chapter, we describe the heavy methyl SILAC metabolic labeling strategy that allows high-confidence identification of in vivo methyl-peptides by MS-based proteomics. We provide a general protocol that covers the steps of heavy methyl labeling of cultured cells, protein sample preparation, LC-MS/MS analysis, and downstream computational analysis of the acquired MS data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:全基因组关联研究(GWAS)检测超过严格的全基因组显著性阈值的关联的能力有限。这种限制可以通过利用相关辅助数据来缓解,比如功能基因组数据。为此,已经开发了利用有条件错误发现率的框架,并且已经显示出在控制错误发现率的同时增加了GWAS发现的能力。然而,这些方法目前仅适用于连续的辅助数据,不能用于利用具有二进制表示的辅助数据,例如SNP是同义词还是非同义词,或者它们是否存在于具有特定活性状态的基因组区域。
    结果:我们描述了二进制辅助数据的cFDR框架的扩展,称为“二进制cFDR”。我们使用详细的仿真演示了我们方法的FDR控制,并表明二进制cFDR在灵敏度和FDR控制方面优于比较器方法。我们介绍了一个包罗万象的面向用户的CRANR包(https://annahutch。github.io/fcfdr/;https://cran.r-project.org/web/packages/fcfdr/index。html)并展示其在1型糖尿病应用中的实用性,在那里我们确定了额外的遗传关联。
    结论:我们包罗万象的R包,fcfdr,作为一个综合工具包,将GWAS和功能基因组数据结合起来,以增加检测遗传关联的统计能力。
    BACKGROUND: Genome-wide association studies (GWAS) are limited in power to detect associations that exceed the stringent genome-wide significance threshold. This limitation can be alleviated by leveraging relevant auxiliary data, such as functional genomic data. Frameworks utilising the conditional false discovery rate have been developed for this purpose, and have been shown to increase power for GWAS discovery whilst controlling the false discovery rate. However, the methods are currently only applicable for continuous auxiliary data and cannot be used to leverage auxiliary data with a binary representation, such as whether SNPs are synonymous or non-synonymous, or whether they reside in regions of the genome with specific activity states.
    RESULTS: We describe an extension to the cFDR framework for binary auxiliary data, called \"Binary cFDR\". We demonstrate FDR control of our method using detailed simulations, and show that Binary cFDR performs better than a comparator method in terms of sensitivity and FDR control. We introduce an all-encompassing user-oriented CRAN R package ( https://annahutch.github.io/fcfdr/ ; https://cran.r-project.org/web/packages/fcfdr/index.html ) and demonstrate its utility in an application to type 1 diabetes, where we identify additional genetic associations.
    CONCLUSIONS: Our all-encompassing R package, fcfdr, serves as a comprehensive toolkit to unite GWAS and functional genomic data in order to increase statistical power to detect genetic associations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    腹泻型肠易激综合征(IBS-D)和功能性腹泻(FDr)是以腹泻为特征的两种主要功能性肠病。尽管他们的患病率很高,IBS-D和FDr与主要不确定性相关,特别是关于他们的最佳诊断工作和管理。与来自10个欧洲国家的专家进行了Delphi共识,他们对31个陈述进行了文献总结和投票过程。使用建议的分级来评估证据质量,评估,发展,和评价标准。所有声明都达成了共识(定义为>80%的协议)。专家组同意IBS-D和FDr的潜在重叠。在诊断方面,共识支持基于症状的方法,也排除了警报症状,推荐全血细胞计数的评估,C反应蛋白,乳糜泻血清学,和粪便钙卫蛋白,并考虑诊断胆汁酸性腹泻。对于50岁以上且存在警报特征的患者,建议在右侧和左侧结肠进行随机活检的结肠镜检查。关于治疗,对于使用低可发酵的低聚糖饮食达成了强烈的共识,di-,单糖和多元醇,以肠道为导向的心理治疗,利福昔明,洛哌丁胺,还有伊洛沙多林.对抗痉挛药提出了弱或有条件的建议,益生菌,tryciclic抗抑郁药,胆汁酸螯合剂,5-羟色胺-3拮抗剂(即阿洛司琼,昂丹司琼,或雷莫司琼)。一个由欧洲专家组成的跨国小组总结了关于该定义的共识现状,诊断,以及IBS-D和FDr的管理。
    Irritable bowel syndrome with diarrhoea (IBS-D) and functional diarrhoea (FDr) are the two major functional bowel disorders characterized by diarrhoea. In spite of their high prevalence, IBS-D and FDr are associated with major uncertainties, especially regarding their optimal diagnostic work-up and management. A Delphi consensus was performed with experts from 10 European countries who conducted a literature summary and voting process on 31 statements. Quality of evidence was evaluated using the grading of recommendations, assessment, development, and evaluation criteria. Consensus (defined as >80% agreement) was reached for all the statements. The panel agreed with the potential overlapping of IBS-D and FDr. In terms of diagnosis, the consensus supports a symptom-based approach also with the exclusion of alarm symptoms, recommending the evaluation of full blood count, C-reactive protein, serology for coeliac disease, and faecal calprotectin, and consideration of diagnosing bile acid diarrhoea. Colonoscopy with random biopsies in both the right and left colon is recommended in patients older than 50 years and in presence of alarm features. Regarding treatment, a strong consensus was achieved for the use of a diet low fermentable oligo-, di-, monosaccharides and polyols, gut-directed psychological therapies, rifaximin, loperamide, and eluxadoline. A weak or conditional recommendation was achieved for antispasmodics, probiotics, tryciclic antidepressants, bile acid sequestrants, 5-hydroxytryptamine-3 antagonists (i.e. alosetron, ondansetron, or ramosetron). A multinational group of European experts summarized the current state of consensus on the definition, diagnosis, and management of IBS-D and FDr.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蛋白质组学是指基因组和蛋白质组的综合分析,利用基于质谱(MS)的蛋白质组学数据来改善基因组注释,通过蛋白质形式了解基因表达控制,并发现序列变异,为疾病分类和治疗策略开发新的见解。然而,蛋白质基因组研究通常由于数据库大小膨胀而导致灵敏度和特异性降低。为了控制错误率,蛋白质基因组学依赖于目标诱饵搜索策略,蛋白质组学中错误发现率(FDR)估计的事实方法。从三或六帧核苷酸数据库翻译构建的蛋白质基因组数据库不仅增加了搜索空间和计算时间,而且违反了目标和诱饵数据库的等效性。这些搜索导致目标和诱饵得分之间的分离较差,导致严格的FDR阈值。了解这些因素并应用修改的策略,如两遍数据库搜索或肽类特异性FDR,可以更好地解释MS数据,而不会引入额外的统计偏差。基于这些考虑,用户可以适当地解释蛋白质基因组学结果,并以更知情的方式控制假阳性和阴性。在这次审查中,首先,我们简要讨论了蛋白质组学工作流程和数据库构建中的局限性,其次是各种可能影响蛋白质基因组研究中潜在新发现的考虑因素。最后,我们提出了应对这些挑战的建议,以更好地解释蛋白质基因组数据。
    Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号