关键词: BAF, B-allele frequency Baseline region Bioinformatic pipeline Breast cancer CN, Copy number CNAs, Copy number alterations CNVs, Copy Number Variations CR, Correction Factor Clustering methods Copy number alteration Data correction F-CL, Final Chromosome List FISH, Fluorescence In Situ Hybridization HD, Hyperdiploidy HR, High Risk LOH, Loss of Heterozygosity MM, Multiple Myeloma Multiple myeloma NGS, Next Generation Sequencing R-ISS, Revised International Staging System S-CL, Starting Chromosome List SNP, Single-Nucleotide Polymorphism SR, Standard Risk WES, Whole Exome Sequencing WGD, Whole-genome doubling BAF, B-allele frequency Baseline region Bioinformatic pipeline Breast cancer CN, Copy number CNAs, Copy number alterations CNVs, Copy Number Variations CR, Correction Factor Clustering methods Copy number alteration Data correction F-CL, Final Chromosome List FISH, Fluorescence In Situ Hybridization HD, Hyperdiploidy HR, High Risk LOH, Loss of Heterozygosity MM, Multiple Myeloma Multiple myeloma NGS, Next Generation Sequencing R-ISS, Revised International Staging System S-CL, Starting Chromosome List SNP, Single-Nucleotide Polymorphism SR, Standard Risk WES, Whole Exome Sequencing WGD, Whole-genome doubling

来  源:   DOI:10.1016/j.csbj.2022.06.062   PDF(Pubmed)

Abstract:
Human cancer arises from a population of cells that have acquired a wide range of genetic alterations, most of which are targets of therapeutic treatments or are used as prognostic factors for patient\'s risk stratification. Among these, copy number alterations (CNAs) are quite frequent. Currently, several molecular biology technologies, such as microarrays, NGS and single-cell approaches are used to define the genomic profile of tumor samples. Output data need to be analyzed with bioinformatic approaches and particularly by employing computational algorithms. Molecular biology tools estimate the baseline region by comparing either the mean probe signals, or the number of reads to the reference genome. However, when tumors display complex karyotypes, this type of approach could fail the baseline region estimation and consequently cause errors in the CNAs call. To overcome this issue, we designed an R-package, BoBafit , able to check and, eventually, to adjust the baseline region, according to both the tumor-specific alterations\' context and the sample-specific clustered genomic lesions. Several databases have been chosen to set up and validate the designed package, thus demonstrating the potential of BoBafit to adjust copy number (CN) data from different tumors and analysis techniques. Relevantly, the analysis highlighted that up to 25% of samples need a baseline region adjustment and a redefinition of CNAs calls, thus causing a change in the prognostic risk classification of the patients. We support the implementation of BoBafit within CN analysis bioinformatics pipelines to ensure a correct patient\'s stratification in risk categories, regardless of the tumor type.
摘要:
人类癌症产生于获得广泛遗传改变的细胞群,其中大部分是治疗性治疗的目标,或被用作患者危险分层的预后因素。其中,拷贝数改变(CNA)相当频繁。目前,几种分子生物学技术,如微阵列,NGS和单细胞方法用于定义肿瘤样品的基因组谱。需要使用生物信息学方法,特别是使用计算算法来分析输出数据。分子生物学工具通过比较任一平均探针信号来估计基线区域,或参考基因组的读数数量。然而,当肿瘤表现出复杂的核型时,这种类型的方法可能会使基线区域估计失败,从而导致CNA调用中的错误。为了克服这个问题,我们设计了一个R包,BoBafit,能够检查,最终,要调整基线区域,根据肿瘤特异性改变的背景和样本特异性聚集的基因组病变。选择了几个数据库来设置和验证设计的软件包,从而证明了BoBafit调整来自不同肿瘤和分析技术的拷贝数(CN)数据的潜力。相关性,分析强调,多达25%的样本需要基线区域调整和重新定义CNAs调用,从而导致患者预后风险分类的改变。我们支持在CN分析生物信息学管道中实施BoBafit,以确保患者在风险类别中的正确分层,无论肿瘤类型。
公众号