variant calling

变体调用
  • 文章类型: Journal Article
    背景:结构变体(SV)在遗传研究和精准医学中起着重要作用。由于现有的SV检测方法通常包含大量的假阳性呼叫,需要对检测结果进行过滤的方法。
    结果:我们开发了一种新颖的基于深度学习的SV过滤工具,CSV-Filter,对于短期和长期阅读。CSV-Filter采用一种新颖的基于CIGAR串对齐结果的多级灰度图像编码方法,并采用图像增强技术来改善SV特征提取。CSV-Filter还利用自监督学习网络作为分类模型进行传输,并采用混合精密操作来加速训练。实验表明,CSV-Filter与流行的SV检测工具的集成可以大大减少短读取和长读取的假阳性SV,同时保持真正的正SV几乎不变。与DeepSVFilter相比,用于短读取的SV过滤工具,CSV-Filter可以识别更多的误报呼叫,并支持长读取作为附加功能。
    方法:https://github.com/xzyschumacher/CSV-Filter。
    BACKGROUND: Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed.
    RESULTS: We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature.
    METHODS: https://github.com/xzyschumacher/CSV-Filter.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    揭示细菌适应压力的策略构成了一个具有挑战性的研究领域。对控制抗微生物药物耐药性出现的机制的理解对于全球范围内抗生素耐药性对公共卫生的日益增加的威胁特别重要。在过去的几十年里,测序技术的快速民主化以及处理数据的专用生物信息学工具的发展为表征细菌适应性的基因组变异提供了新的机会。因此,研究小组现在有可能通过鉴定介导生存到应激的特定遗传靶标来更深入地破译细菌适应性机制。在这一章中,我们提出了一个逐步的生物信息管道,能够使用大肠杆菌作为说明性模型来鉴定与抗微生物耐药性发展相关的杀生物应激适应背后的突变事件。
    Unveiling the strategies of bacterial adaptation to stress constitute a challenging area of research. The understanding of mechanisms governing emergence of resistance to antimicrobials is of particular importance regarding the increasing threat of antibiotic resistance on public health worldwide. In the last decades, the fast democratization of sequencing technologies along with the development of dedicated bioinformatical tools to process data offered new opportunities to characterize genomic variations underlying bacterial adaptation. Thereby, research teams have now the possibility to dive deeper in the deciphering of bacterial adaptive mechanisms through the identification of specific genetic targets mediating survival to stress. In this chapter, we proposed a step-by-step bioinformatical pipeline enabling the identification of mutational events underlying biocidal stress adaptation associated with antimicrobial resistance development using Escherichia marmotae as an illustrative model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:循环肿瘤DNA(ctDNA)是由肿瘤细胞释放到血流中的无细胞DNA(cfDNA)的子集。循环肿瘤DNA已显示出作为生物标志物在癌症患者中告知治疗的巨大潜力。收集ctDNA是微创的,反映了患者癌症的整个基因组成。由于丰度低,NGS数据中的ctDNA变体可能难以与测序和PCR伪像区分开来,特别是在癌症的早期阶段。独特分子标识符(UMI)是在扩增之前连接到测序文库的短序列。这些序列对于滤除低频伪像是有用的。ctDNA作为癌症生物标志物的效用取决于癌症变体的准确检测。
    结果:在这项研究中,我们对六种变体调用工具进行了基准测试,包括两个UMI感知调用者,因为它们能够调用ctDNA变体。测试的标准变体呼叫者包括Mutect2,bcftools,LoFreq和FreeBayesUMI感知变体调用者基准为UMI-VarCal和UMIErrorCorrect。我们使用了这两个数据集,这些数据集在低频处增加了已知的变异,和包含ctDNA的数据集,并为这些数据集生成合成UMI序列。变体呼叫者对敏感性和特异性表现出不同的偏好。Mutect2显示出高灵敏度,同时在没有合成UMI的数据中返回比任何其他调用者更多的私下调用的变体-误报变体发现的指标。在用合成UMI编码的数据中,在合成数据集中,UMI-VarCal检测到比所有其他调用者更少的推定假阳性变体。Mutect2在用合成UMI编码的数据中显示出高灵敏度和特异性之间的平衡。
    结论:我们的结果表明,UMI感知变异调用者与标准变异调用工具相比,有可能提高低频率ctDNA变异调用者的敏感性和特异性。如果要实现使用ctDNA样品的癌症的有效早期检测方法,则对进一步开发UMI感知变体调用工具的需求日益增长。
    BACKGROUND: Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient\'s cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants.
    RESULTS: In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs - an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs.
    CONCLUSIONS: Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近几十年来,治疗性癌症疫苗被认为是能够导致肿瘤消退的重要免疫治疗策略。在这些疫苗的开发中,新表位的鉴定起着关键作用,和不同的计算方法已经被提出和采用来指导和加速这一过程。在这种情况下,这篇综述确定并系统分析了最近发表在文献中的关于治疗性疫苗开发表位的计算预测的研究,概述关键步骤,以及相关程序的优势和局限性。在PRISMA扩展(PRISMA-ScR)之后进行了范围审查。在数据库中进行搜索(Scopus,PubMed,WebofScience,科学直接)使用关键词:新表位,表位,疫苗,预测,算法,癌症,和肿瘤。对2012年至2024年发表的49篇文章进行了综合分析。大多数已确定的研究集中在预测对实体瘤中的MHCI分子具有亲和力的表位,如肺癌。预测具有II类MHC亲和力的表位的开发相对不足。除了从高通量测序数据中预测新表位外,确定了其他步骤,例如新表位的优先排序和验证。Mutect2是最常用的变体调用工具,而NetMHCpan有利于新表位预测。人工/卷积神经网络是用于新表位预测的优选方法。为了优先考虑免疫原性表位,随机森林算法是最常用的分类算法。与新表位预测和优先级排序的计算模型相关的性能值很高;然而,大部分研究仍使用微生物组数据库进行培训。在55%的分析研究中验证了预测的新表位的体外/体内验证。确定了导致肿瘤成功缓解的临床试验,强调这种免疫治疗方法可以使这些患者受益。整合高通量测序,复杂的生物信息学工具,和严格的验证方法,通过体外/体内试验以及临床试验,基于肿瘤新表位的疫苗方法有望开发针对特定肿瘤癌症的个性化治疗性疫苗.
    Therapeutic cancer vaccines have been considered in recent decades as important immunotherapeutic strategies capable of leading to tumor regression. In the development of these vaccines, the identification of neoepitopes plays a critical role, and different computational methods have been proposed and employed to direct and accelerate this process. In this context, this review identified and systematically analyzed the most recent studies published in the literature on the computational prediction of epitopes for the development of therapeutic vaccines, outlining critical steps, along with the associated program\'s strengths and limitations. A scoping review was conducted following the PRISMA extension (PRISMA-ScR). Searches were performed in databases (Scopus, PubMed, Web of Science, Science Direct) using the keywords: neoepitope, epitope, vaccine, prediction, algorithm, cancer, and tumor. Forty-nine articles published from 2012 to 2024 were synthesized and analyzed. Most of the identified studies focus on the prediction of epitopes with an affinity for MHC I molecules in solid tumors, such as lung carcinoma. Predicting epitopes with class II MHC affinity has been relatively underexplored. Besides neoepitope prediction from high-throughput sequencing data, additional steps were identified, such as the prioritization of neoepitopes and validation. Mutect2 is the most used tool for variant calling, while NetMHCpan is favored for neoepitope prediction. Artificial/convolutional neural networks are the preferred methods for neoepitope prediction. For prioritizing immunogenic epitopes, the random forest algorithm is the most used for classification. The performance values related to the computational models for the prediction and prioritization of neoepitopes are high; however, a large part of the studies still use microbiome databases for training. The in vitro/in vivo validations of the predicted neoepitopes were verified in 55% of the analyzed studies. Clinical trials that led to successful tumor remission were identified, highlighting that this immunotherapeutic approach can benefit these patients. Integrating high-throughput sequencing, sophisticated bioinformatics tools, and rigorous validation methods through in vitro/in vivo assays as well as clinical trials, the tumor neoepitope-based vaccine approach holds promise for developing personalized therapeutic vaccines that target specific tumor cancers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组组装和测序技术的进步使得全基因组序列(WGS)数据和参考基因组可用于研究多倍体物种。与流行的减少代表性测序方法相比,WGS数据提供的全基因组覆盖和更大的标记密度可以大大提高我们对多倍体物种和多倍体生物学的理解.然而,使多倍体物种变得有趣的生物学特征也给阅读作图带来了挑战,变体识别,和基因型估计。考虑变体调用中的特征,如等位基因剂量不确定性,亚基因组之间的同源性,染色体遗传模式的变异可以减少错误。这里,我讨论了多倍体WGS数据中变体调用的挑战,并讨论了可以将潜在解决方案整合到标准变体调用管道中的地方。
    Advancements in genome assembly and sequencing technology have made whole genome sequence (WGS) data and reference genomes accessible to study polyploid species. Compared to popular reduced-representation sequencing approaches, the genome-wide coverage and greater marker density provided by WGS data can greatly improve our understanding of polyploid species and polyploid biology. However, biological features that make polyploid species interesting also pose challenges in read mapping, variant identification, and genotype estimation. Accounting for characteristics in variant calling like allelic dosage uncertainty, homology between subgenomes, and variance in chromosome inheritance mode can reduce errors. Here, I discuss the challenges of variant calling in polyploid WGS data and discuss where potential solutions can be integrated into a standard variant calling pipeline.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    简介:结构变异(SV)是一种变异,可以显着影响表型并导致疾病。因此,SVs的准确检测是现代遗传分析的重要组成部分。长读测序技术的出现开启了一个更准确、更全面的SV调用的新时代,并且已经开发了许多工具来使用长读取数据调用SV。单倍型标记是一种可以在读段上标记单倍型信息的程序,因此可以潜在地改善SV检测;然而,很少有方法利用这些信息。在这篇文章中,我们介绍HapKled,一种新的SV检测工具,可以从牛津纳米孔技术(ONT)长读比对数据中准确检测SV。方法:HapKled通过使用Whatshap对读数进行单倍型标记来利用比对数据的基础单倍型信息,以提高检测性能,具有三个独特的调用机制,包括根据签名的单倍型信息改变聚类条件,基于单倍型信息确定相似的SV,和基于单倍型质量的松弛过滤条件。结果:在我们的评估中,HapKled的性能优于最先进的工具,并且可以在模拟和真实测序数据上提供更好的SV检测结果。HapKled的代码和实验可以从https://github.com/CoREse/HapKled获得。讨论:凭借HapKled可以提供的出色的SV检测性能,HapKled可能在生物信息学研究中有用,临床诊断,和医学研究与开发。
    Introduction: Structural Variants (SVs) are a type of variation that can significantly influence phenotypes and cause diseases. Thus, the accurate detection of SVs is a vital part of modern genetic analysis. The advent of long-read sequencing technology ushers in a new era of more accurate and comprehensive SV calling, and many tools have been developed to call SVs using long-read data. Haplotype-tagging is a procedure that can tag haplotype information on reads and can thus potentially improve the SV detection; nevertheless, few methods make use of this information. In this article, we introduce HapKled, a new SV detection tool that can accurately detect SVs from Oxford Nanopore Technologies (ONT) long-read alignment data. Methods: HapKled utilizes haplotype information underlying alignment data by conducting haplotype-tagging using Whatshap on the reads to improve the detection performance, with three unique calling mechanics including altering clustering conditions according to haplotype information of signatures, determination of similar SVs based on haplotype information, and slack filtering conditions based on haplotype quality. Results: In our evaluations, HapKled outperformed state-of-the-art tools and can deliver better SV detection results on both simulated and real sequencing data. The code and experiments of HapKled can be obtained from https://github.com/CoREse/HapKled. Discussion: With the superb SV detection performance that HapKled can deliver, HapKled could be useful in bioinformatics research, clinical diagnosis, and medical research and development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在全球范围内,SARS-CoV-2病毒在很长一段时间内没有保持其初始基因型,2020年底首次发布全球关注变种(VOCs)报告。随后,基因组测序已成为表征正在进行的大流行的不可或缺的工具,特别是用于从患者或环境监测中获得的SARS-CoV-2样本的分型。对于这种SARS-CoV-2分型,存在各种体外和计算机工作流程,到目前为止,没有系统的跨平台验证报告.
    结果:在这项工作中,我们提出了第一个全面的跨平台评估和验证silicoSARS-CoV-2分型工作流程。评估依赖于在所有相关的现有技术测序平台上用几种不同的体外方法测序的54个患者来源的样品的数据集。此外,我们介绍UnCoVar,一个健壮的,生产级可重复的SARS-CoV-2分型工作流程,在精确度和召回率方面优于所有其他测试方法。
    结论:在许多方面,SARS-CoV-2大流行加速了技术和分析方法的发展。我们认为,这可以作为应对未来流行病的蓝图。因此,UnCoVar很容易推广到其他病毒病原体和未来的大流行。全自动工作流程从患者样本中组装病毒基因组,识别现有的血统,并提供对个体突变的高分辨率见解。UnCoVar包括广泛的质量控制,并自动生成交互式可视化报告。UnCoVar作为Snakemake工作流实现。开源代码可在github.com/IKIM-Essen/uncovar上获得BSD2条款许可。
    BACKGROUND: At a global scale, the SARS-CoV-2 virus did not remain in its initial genotype for a long period of time, with the first global reports of variants of concern (VOCs) in late 2020. Subsequently, genome sequencing has become an indispensable tool for characterizing the ongoing pandemic, particularly for typing SARS-CoV-2 samples obtained from patients or environmental surveillance. For such SARS-CoV-2 typing, various in vitro and in silico workflows exist, yet to date, no systematic cross-platform validation has been reported.
    RESULTS: In this work, we present the first comprehensive cross-platform evaluation and validation of in silico SARS-CoV-2 typing workflows. The evaluation relies on a dataset of 54 patient-derived samples sequenced with several different in vitro approaches on all relevant state-of-the-art sequencing platforms. Moreover, we present UnCoVar, a robust, production-grade reproducible SARS-CoV-2 typing workflow that outperforms all other tested approaches in terms of precision and recall.
    CONCLUSIONS: In many ways, the SARS-CoV-2 pandemic has accelerated the development of techniques and analytical approaches. We believe that this can serve as a blueprint for dealing with future pandemics. Accordingly, UnCoVar is easily generalizable towards other viral pathogens and future pandemics. The fully automated workflow assembles virus genomes from patient samples, identifies existing lineages, and provides high-resolution insights into individual mutations. UnCoVar includes extensive quality control and automatically generates interactive visual reports. UnCoVar is implemented as a Snakemake workflow. The open-source code is available under a BSD 2-clause license at github.com/IKIM-Essen/uncovar.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    靶标捕获系统与下一代测序的整合已成为探索具有高分辨率的特定遗传区域并促进新等位基因的快速发现的有效工具。尽管取得了这些进步,靶向测序方法的应用,比如myBaits技术,在多倍体燕麦物种中仍然相对未被探索。在这项研究中,我们利用DaicelArborBiosciences提供的myBaits靶标捕获方法来检测变异体,并评估其在燕麦基因组学和育种中变异体检测的可靠性.精心选择了10种燕麦基因型进行靶向测序,专注于染色体2A上的特定区域以检测变异。所选区域包含98个基因。靶向这些区域内的基因的精确设计的诱饵用于靶捕获测序。我们采用了各种映射器和变体调用者来识别变体。在识别变体之后,我们重点研究了通过所有变体调用者鉴定的变体,以评估myBaits测序方法在燕麦育种中的适用性。在我们努力验证已识别的变体时,我们专注于两个SNP,通过基因型KF-318和NOS819111-70中的所有变体调用者鉴定了一个缺失和一个插入,但在其余八个基因型中不存在。靶向SNP的Sanger测序未能重现通过myBaits技术获得的靶标捕获数据。同样,通过高分辨率熔解(HRM)曲线分析验证缺失和插入变体也未能重现靶标捕获数据,再次表明,使用短读取测序进行燕麦基因组变异检测的myBaits靶捕获测序的可靠性存在局限性。这项研究阐明了在采用myBaits目标捕获策略进行燕麦变异检测时谨慎行事的重要性。这项研究为育种者寻求使用myBaits靶标捕获测序来推进燕麦育种工作和标记开发提供了有价值的见解,强调方法测序在燕麦基因组学研究中的重要性。
    The integration of target capture systems with next-generation sequencing has emerged as an efficient tool for exploring specific genetic regions with a high resolution and facilitating the rapid discovery of novel alleles. Despite these advancements, the application of targeted sequencing methodologies, such as the myBaits technology, in polyploid oat species remains relatively unexplored. In this study, we utilized the myBaits target capture method offered by Daicel Arbor Biosciences to detect variants and assess their reliability for variant detection in oat genomics and breeding. Ten oat genotypes were carefully chosen for targeted sequencing, focusing on specific regions on chromosome 2A to detect variants. The selected region harbors 98 genes. Precisely designed baits targeting the genes within these regions were employed for the target capture sequencing. We employed various mappers and variant callers to identify variants. After the identification of variants, we focused on the variants identified via all variants callers to assess the applicability of the myBaits sequencing methodology in oat breeding. In our efforts to validate the identified variants, we focused on two SNPs, one deletion and one insertion identified via all variant callers in the genotypes KF-318 and NOS 819111-70 but absent in the remaining eight genotypes. The Sanger sequencing of targeted SNPs failed to reproduce target capture data obtained through the myBaits technology. Similarly, the validation of deletion and insertion variants via high-resolution melting (HRM) curve analysis also failed to reproduce target capture data, again suggesting limitations in the reliability of the myBaits target capture sequencing using short-read sequencing for variant detection in the oat genome. This study shed light on the importance of exercising caution when employing the myBaits target capture strategy for variant detection in oats. This study provides valuable insights for breeders seeking to advance oat breeding efforts and marker development using myBaits target capture sequencing, emphasizing the significance of methodological sequencing considerations in oat genomics research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全基因组测序被广泛用于研究感兴趣的生物体中的群体基因组变异。已独立开发了分类工具,以从与参考基因组对齐的短读取测序数据中调用变体,包括单核苷酸多态性(SNP)和结构变异(SV)。我们开发了SNP-SVant,一个综合的,灵活,和计算有效的生物信息学工作流程,可预测生物体中的高置信度SNP和SV,而无需基准变体,传统上用于区分测序错误与真实变体。在没有这些基准数据集的情况下,我们利用多轮统计重新校准来提高变体预测的精度。SNP-SVant工作流程灵活,与用户选项来权衡精度的灵敏度。该工作流程使用基因组分析工具包(GATK)预测SNP和小的插入和删除,并使用基因组重排识别软件套件(GRIDSS)预测SV,,它使用自定义脚本在变体注释中达到顶峰。SNP-SVant的关键效用是其可扩展性。变体调用是一个计算昂贵的过程,因此,SNP-SVant使用具有中间检查点步骤的工作流管理系统,通过最小化冗余计算和省略依赖文件可用的步骤来确保资源的有效利用。SNP-SVant还提供指标来评估所调用变体的质量,并在VCF和对齐的FASTA格式输出之间进行转换,以确保与下游工具的兼容性来计算选择统计信息。这在人口基因组学研究中很常见。通过考虑小型和大型结构变体,该工作流程的用户可以获得感兴趣的生物体中基因组改变的广泛视图。总的来说,这个工作流程提高了我们评估不同类型基因组改变的功能后果的能力,最终提高我们将基因型与表型相关联的能力。©2024作者WileyPeriodicalsLLC出版的当前协议。基本方案:预测单核苷酸多态性和结构变异支持方案1:下载公开可用的测序数据支持方案2:使用整合的基因组查看器可视化变异基因座支持方案3:在VCF和对齐的FASTA格式之间转换。
    Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在癌症基因组学中,变体调用已经高级,但是传统的平均准确性评估不足以用于生物标志物,例如肿瘤突变负担,不同样本之间差异很大,影响免疫治疗患者的选择和阈值设置。在这项研究中,我们介绍TMBstable,一种创新的方法,使用元学习框架为特定的基因组区域动态选择最佳的变体调用策略,用统一的全样本策略将其与传统的呼叫者区分开来。该过程从将样本分割为窗口并提取用于聚类的元特征开始,然后使用预训练的元模型为每个集群选择合适的算法,从而解决策略样本不匹配的问题,减少性能波动并确保各种样品的性能一致。我们使用模拟和真实的非小细胞肺癌和鼻咽癌样本评估了TMBstable,将其与高级呼叫者进行比较。评估,以稳定措施为重点,如假阳性率的方差和变异系数,假阴性率,精确度和召回率,涉及300个模拟肿瘤样本和106个真实肿瘤样本。基准结果显示TMBstable具有优异的稳定性,各性能指标的方差和变异系数最低,强调其在分析基于计数的生物标志物方面的有效性。TMBstable算法可以在https://github.com/hello-json/TMBstable访问,仅供学术使用。
    In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable\'s superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号