genomic data

基因组数据
  • 文章类型: Journal Article
    基因组构建的生命周期跨越了组装的互锁支柱,注释,和比较基因组学来驱动生物学洞察力。虽然存在单独解决每个支柱的工具,越来越需要工具来从整体上整合基因组计划的不同支柱。例如,比较方法可以提供装配或注释的质量控制;基因组装配,反过来,可以帮助识别可能使基因组比较解释复杂化的工件。JCVI库是一个多功能的基于Python的库,它提供了一套在这些支柱上脱颖而出的工具。采用模块化设计,JCVI库为格式解析等任务提供高级实用程序,图形生成,以及基因组组装和注释的操作。支持基因组学算法,如MCscan和ALLMAPS被广泛用于构建基因组释放,制作可用于质量评估和进化推断的出版物数据。合作开发和维护,JCVI库强调质量和可重用性。
    The life cycle of genome builds spans interlocking pillars of assembly, annotation, and comparative genomics to drive biological insights. While tools exist to address each pillar separately, there is a growing need for tools to integrate different pillars of a genome project holistically. For example, comparative approaches can provide quality control of assembly or annotation; genome assembly, in turn, can help to identify artifacts that may complicate the interpretation of genome comparisons. The JCVI library is a versatile Python-based library that offers a suite of tools that excel across these pillars. Featuring a modular design, the JCVI library provides high-level utilities for tasks such as format parsing, graphics generation, and manipulation of genome assemblies and annotations. Supporting genomics algorithms like MCscan and ALLMAPS are widely employed in building genome releases, producing publication-ready figures for quality assessment and evolutionary inference. Developed and maintained collaboratively, the JCVI library emphasizes quality and reusability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • DOI:
    文章类型: Journal Article
    本节探讨了将基因组研究转化为基因组医学所涉及的挑战。在澳大利亚国家健康基因组学框架中已经确定了许多优先事项来应对这些挑战。负责收集,storage,基因组数据的使用和管理是这些优先事项之一,是本节的主要主题。最近发布的Genomical,澳大利亚的数据共享平台,被用作案例研究,以说明在解决这一优先事项时可以向卫生保健部门提供的援助类型。本节首先介绍了国家框架和其他涉及基因组医学发展的驱动因素。然后,本节将检查关键的道德,基因组学中的法律和社会因素,特别关注隐私和同意。最后,该部分检查了如何使用Genomical来帮助确保向基因组医学的转变是道德上的,在法律和社会上都是合理的,它优化了基因组和信息技术的进步。
    This section explores the challenges involved in translating genomic research into genomic medicine. A number of priorities have been identified in the Australian National Health Genomics Framework for addressing these challenges. Responsible collection, storage, use and management of genomic data is one of these priorities, and is the primary theme of this section. The recent release of Genomical, an Australian data-sharing platform, is used as a case study to illustrate the type of assistance that can be provided to the health care sector in addressing this priority. The section first describes the National Framework and other drivers involved in the move towards genomic medicine. The section then examines key ethical, legal and social factors at play in genomics, with particular focus on privacy and consent. Finally, the section examines how Genomical is being used to help ensure that the move towards genomic medicine is ethically, legally and socially sound and that it optimises advances in both genomic and information technology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    非小细胞肺癌(NSCLC)是一种普遍且侵袭性的肺癌,转移性疾病预后不良。免疫疗法,特别是免疫检查点抑制剂(ICIs),彻底改变了NSCLC的管理,但是反应率是高度可变的。识别可靠的预测性生物标志物对于优化患者选择和治疗结果至关重要。本系统综述旨在评估人工智能(AI)和机器学习(ML)在预测NSCLC免疫治疗反应方面的应用现状。一项全面的文献检索确定了19项符合纳入标准的研究。这些研究采用了不同的AI/ML技术,包括深度学习,人工神经网络,支持向量机,和梯度增强方法,应用于各种数据模式,如医学成像,基因组数据,临床变量,和免疫组织化学标记。几项研究证明了AI/ML模型能够准确预测免疫治疗反应。无进展生存期,非小细胞肺癌患者的总生存期。然而,数据可用性仍然存在挑战,质量,以及这些模型的可解释性。已经努力开发可解释的AI/ML技术,但是需要进一步的研究来提高透明度和可解释性。此外,将AI/ML模型从研究环境转化为临床实践带来了与监管批准相关的挑战,数据隐私,并整合到现有的医疗保健系统中。尽管如此,AI/ML模型的成功实施可以实现个性化治疗策略,改善治疗结果,并减少与无效治疗相关的不必要的毒性和医疗费用。
    Non-small cell lung carcinoma (NSCLC) is a prevalent and aggressive form of lung cancer, with a poor prognosis for metastatic disease. Immunotherapy, particularly immune checkpoint inhibitors (ICIs), has revolutionized the management of NSCLC, but response rates are highly variable. Identifying reliable predictive biomarkers is crucial to optimize patient selection and treatment outcomes. This systematic review aimed to evaluate the current state of artificial intelligence (AI) and machine learning (ML) applications in predicting the response to immunotherapy in NSCLC. A comprehensive literature search identified 19 studies that met the inclusion criteria. The studies employed diverse AI/ML techniques, including deep learning, artificial neural networks, support vector machines, and gradient boosting methods, applied to various data modalities such as medical imaging, genomic data, clinical variables, and immunohistochemical markers. Several studies demonstrated the ability of AI/ML models to accurately predict immunotherapy response, progression-free survival, and overall survival in NSCLC patients. However, challenges remain in data availability, quality, and interpretability of these models. Efforts have been made to develop interpretable AI/ML techniques, but further research is needed to improve transparency and explainability. Additionally, translating AI/ML models from research settings to clinical practice poses challenges related to regulatory approval, data privacy, and integration into existing healthcare systems. Nonetheless, the successful implementation of AI/ML models could enable personalized treatment strategies, improve treatment outcomes, and reduce unnecessary toxicities and healthcare costs associated with ineffective treatments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    医疗保健正处于一个转折点。我们正在从原始医学转向精准医学,数字医疗系统正在促进这一转变。通过为临床医生提供每位患者的详细信息以及在护理点为决策提供分析支持,数字健康技术正在开启精准医疗的新时代。基因组数据还为临床医生提供了可以提高诊断准确性和及时性的信息,优化处方,和目标风险降低策略,所有这些都是精准医疗的关键要素。然而,基因组数据主要被视为诊断信息,没有被常规地整合到电子病历的临床工作流程中.基因组数据的使用具有精确医学的巨大潜力;然而,由于基因组数据与常规实践中收集的信息根本不同,在数字健康环境中使用此信息需要特别考虑。本文概述了基因组数据与电子记录整合的潜力,以及这些数据如何实现精准医疗。
    Health care is at a turning point. We are shifting from protocolized medicine to precision medicine, and digital health systems are facilitating this shift. By providing clinicians with detailed information for each patient and analytic support for decision-making at the point of care, digital health technologies are enabling a new era of precision medicine. Genomic data also provide clinicians with information that can improve the accuracy and timeliness of diagnosis, optimize prescribing, and target risk reduction strategies, all of which are key elements for precision medicine. However, genomic data are predominantly seen as diagnostic information and are not routinely integrated into the clinical workflows of electronic medical records. The use of genomic data holds significant potential for precision medicine; however, as genomic data are fundamentally different from the information collected during routine practice, special considerations are needed to use this information in a digital health setting. This paper outlines the potential of genomic data integration with electronic records, and how these data can enable precision medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    识别和管理神经系统疾病的过程面临挑战,促使新方法的研究,以提高诊断的准确性。在这项研究中,我们进行了系统的文献综述,以确定基于遗传和分子途径的机器学习(ML)模型在治疗神经系统疾病中的意义.根据研究的目标,开发了搜索策略,以使用数字图书馆提取研究。我们遵循严格的研究选择标准。共有24项研究符合纳入标准并被纳入审查。我们根据神经系统疾病对研究进行了分类。纳入的研究强调了治疗神经系统疾病的多种方法和出色的结果。研究结果强调了现有模型的潜力,根据个人情况提出个性化干预措施。这些发现提供了性能更好的方法,可以处理遗传学和分子数据以产生有效的结果。此外,我们讨论了未来的研究方向和挑战,强调在现实世界的临床环境中推广现有模型的需求。这项研究有助于提高神经系统疾病诊断和管理领域的知识。
    The process of identification and management of neurological disorder conditions faces challenges, prompting the investigation of novel methods in order to improve diagnostic accuracy. In this study, we conducted a systematic literature review to identify the significance of genetics- and molecular-pathway-based machine learning (ML) models in treating neurological disorder conditions. According to the study\'s objectives, search strategies were developed to extract the research studies using digital libraries. We followed rigorous study selection criteria. A total of 24 studies met the inclusion criteria and were included in the review. We classified the studies based on neurological disorders. The included studies highlighted multiple methodologies and exceptional results in treating neurological disorders. The study findings underscore the potential of the existing models, presenting personalized interventions based on the individual\'s conditions. The findings offer better-performing approaches that handle genetics and molecular data to generate effective outcomes. Moreover, we discuss the future research directions and challenges, emphasizing the demand for generalizing existing models in real-world clinical settings. This study contributes to advancing knowledge in the field of diagnosis and management of neurological disorders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:黑色素瘤是最恶性的皮肤癌之一,晚期死亡率很高。因此,早期准确检测黑色素瘤对改善患者预后有重要作用。活检是诊断黑色素瘤的传统方法,但是这种方法缺乏可靠性。因此,重要的是应用新方法有效诊断黑色素瘤。
    目的:这项研究提出了一种使用深度神经网络(DNN)结合多模态成像和基因组数据对黑色素瘤进行分类的新方法,这可能提供比目前的黑色素瘤医学方法更可靠的诊断。
    方法:我们建立了一个皮肤图像数据集,组织病理学切片和基因组图谱。我们开发了一个自定义框架,该框架由两种广泛建立的神经网络组成,用于分析图像数据卷积神经网络(CNN)和可以学习图结构以分析基因组数据的网络-图神经网络。我们在这个数据集上训练和评估了建议的框架。
    结果:开发的多模态DNN比传统医学方法具有更高的准确性。所提出的模型的平均准确率为92.5%,受试者工作特征曲线下面积为0.96,这表明多模态DNN方法可以检测黑色素瘤的关键形态和分子特征,超越了传统AI和传统机器学习方法的限制。尖端AI的组合可以允许访问更广泛的诊断数据,这可以让皮肤科医生做出更准确的决定并完善治疗策略。然而,该框架的应用必须在更大范围内进行验证,还需要进行更多的临床试验,以确定这种新的诊断方法是否更有效和可行.
    BACKGROUND: Melanoma is one of the most malignant forms of skin cancer, with a high mortality rate in the advanced stages. Therefore, early and accurate detection of melanoma plays an important role in improving patients\' prognosis. Biopsy is the traditional method for melanoma diagnosis, but this method lacks reliability. Therefore, it is important to apply new methods to diagnose melanoma effectively.
    OBJECTIVE: This study presents a new approach to classify melanoma using deep neural networks (DNNs) with combined multiple modal imaging and genomic data, which could potentially provide more reliable diagnosis than current medical methods for melanoma.
    METHODS: We built a dataset of dermoscopic images, histopathological slides and genomic profiles. We developed a custom framework composed of two widely established types of neural networks for analysing image data Convolutional Neural Networks (CNNs) and networks that can learn graph structure for analysing genomic data-Graph Neural Networks. We trained and evaluated the proposed framework on this dataset.
    RESULTS: The developed multi-modal DNN achieved higher accuracy than traditional medical approaches. The mean accuracy of the proposed model was 92.5% with an area under the receiver operating characteristic curve of 0.96, suggesting that the multi-modal DNN approach can detect critical morphologic and molecular features of melanoma beyond the limitations of traditional AI and traditional machine learning approaches. The combination of cutting-edge AI may allow access to a broader range of diagnostic data, which can allow dermatologists to make more accurate decisions and refine treatment strategies. However, the application of the framework will have to be validated at a larger scale and more clinical trials need to be conducted to establish whether this novel diagnostic approach will be more effective and feasible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    2024年,大韩民国的医学研究人员应邀修订了卫生和医学数据利用指南(政府出版物注册号:11-1352000-0052828-14)。本研究旨在显示指南修订的总体影响,专注于临床基因组数据。
    这项研究通过卫生和福利部领导的一项联合研究,修改了先前版本中定义的基因组数据的假名化,韩国健康信息服务,和韩国基因组组织。为了开发以前的版本,我们与四个主要的医学研究所和七个学术团体举行了三次会议。我们针对学术界的特殊基因组专家进行了两项调查,工业,和机构。
    我们发现,在基因组数据的应用中,假名化的情况很少见,并且在先前版本的指南中使用的术语存在歧义。大多数专家(>〜90%)同意应消除“保留”条件,以在假名化后提供基因组数据。在这项研究中,基因组数据的范围被定义为临床下一代测序数据,包括FASTQ,BAM/SAM,VCF,和医疗记录。伪名化针对基因组序列和元数据,嵌入特定元素,例如种系突变,短串联重复,单核苷酸多态性,和可识别的数据(例如,ID或环境值)。从多组学产生的表达数据可以在没有假名化的情况下使用。
    该修正案不仅将增强医疗保健数据的安全使用,还将促进疾病预防方面的进步。诊断,和治疗。
    UNASSIGNED: In 2024, medical researchers in the Republic of Korea were invited to amend the health and medical data utilization guidelines (Government Publications Registration Number: 11-1352000-0052828-14). This study aimed to show the overall impact of the guideline revision, with a focus on clinical genomic data.
    UNASSIGNED: This study amended the pseudonymization of genomic data defined in the previous version through a joint study led by the Ministry of Health and Welfare, the Korea Health Information Service, and the Korea Genome Organization. To develop the previous version, we held three conferences with four main medical research institutes and seven academic societies. We conducted two surveys targeting special genome experts in academia, industry, and institutes.
    UNASSIGNED: We found that cases of pseudonymization in the application of genome data were rare and that there was ambiguity in the terminology used in the previous version of the guidelines. Most experts (> ~90%) agreed that the \'reserved\' condition should be eliminated to make genomic data available after pseudonymization. In this study, the scope of genomic data was defined as clinical next generation sequencing data, including FASTQ, BAM/SAM, VCF, and medical records. Pseudonymization targets genomic sequences and metadata, embedding specific elements, such as germline mutations, short tandem repeats, single-nucleotide polymorphisms, and identifiable data (for example, ID or environmental values). Expression data generated from multi-omics can be used without pseudonymization.
    UNASSIGNED: This amendment will not only enhance the safe use of healthcare data but also promote advancements in disease prevention, diagnosis, and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    数据协调涉及组合来自多个独立源的数据并处理数据以产生一个统一的数据集。已经提出合并单独的基因型或全基因组测序数据集作为通过增加有效样本大小来增加关联测试的统计能力的策略。然而,由于合并数据的困难(包括批次效应和群体分层产生的混淆),数据协调不是一种广泛采用的策略.详细的数据协调协议很少,而且往往相互冲突。此外,适应混合血统样本的数据协调协议实际上是不存在的。必须修改现有的数据协调程序,以确保将混合个体的异质性纳入其他下游分析中,而不会混淆结果。这里,我们提出了一套合并来自混合样本的多平台遗传数据的指南,任何具有基本生物信息学经验的研究者都可以采用这些指南.我们应用这些指南从六个独立的内部数据集中收集了1544个结核病(TB)病例对照样本,并进行了TB易感性的全基因组关联研究(GWAS)。在合并的数据集上执行的GWAS具有比单独分析数据集更高的能力,并且产生没有由批次效应和群体分层引入的偏差的汇总统计。©2024Wiley期刊有限责任公司。基本方案1:处理包含阵列基因型数据的单独数据集替代方案1:处理包含阵列基因型和全基因组测序数据的单独数据集替代方案2:使用本地参考面板执行插补基本方案2:合并单独数据集基本方案3:使用ADMIXTURE和RFMix基本方案4:使用伪病例对照比较进行祖先推断。
    Data harmonization involves combining data from multiple independent sources and processing the data to produce one uniform dataset. Merging separate genotypes or whole-genome sequencing datasets has been proposed as a strategy to increase the statistical power of association tests by increasing the effective sample size. However, data harmonization is not a widely adopted strategy due to the difficulties with merging data (including confounding produced by batch effects and population stratification). Detailed data harmonization protocols are scarce and are often conflicting. Moreover, data harmonization protocols that accommodate samples of admixed ancestry are practically non-existent. Existing data harmonization procedures must be modified to ensure the heterogeneous ancestry of admixed individuals is incorporated into additional downstream analyses without confounding results. Here, we propose a set of guidelines for merging multi-platform genetic data from admixed samples that can be adopted by any investigator with elementary bioinformatics experience. We have applied these guidelines to aggregate 1544 tuberculosis (TB) case-control samples from six separate in-house datasets and conducted a genome-wide association study (GWAS) of TB susceptibility. The GWAS performed on the merged dataset had improved power over analyzing the datasets individually and produced summary statistics free from bias introduced by batch effects and population stratification. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Processing separate datasets comprising array genotype data Alternate Protocol 1: Processing separate datasets comprising array genotype and whole-genome sequencing data Alternate Protocol 2: Performing imputation using a local reference panel Basic Protocol 2: Merging separate datasets Basic Protocol 3: Ancestry inference using ADMIXTURE and RFMix Basic Protocol 4: Batch effect correction using pseudo-case-control comparisons.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    数据访问委员会(DAC)保持对安全基因组和相关健康数据集的访问权限,但仍面临挑战,无法跟上数据生成的数量和复杂性的增长。自动化决策支持(ADS)系统已被证明支持一致性,合规,和数据访问审查决策的协调。然而,我们对DAC成员如何看待ADS的增值缺乏了解,如果有的话,他们评论的质量和有效性。在这项定性研究中,我们报告了对来自世界各地的DAC成员进行的13次半结构化访谈的结果,以确定实施用于基因组数据访问管理的ADS的相关障碍和促进因素.参与者通常支持测试ADS性能的试点研究,例如,在编目数据类型中,验证用户凭据并为使用术语标记数据集。与过度自动化相关的担忧,缺乏人类监督,低优先级,与机构使命的错位削弱了我们参与的发援会成员对ADS的热情。DAC运营的机构环境变化的紧张关系是为什么DAC成员考虑将ADS实施纳入其访问工作流程的强大动力,以及对广告相对于现状的相对优势的看法。未来的研究需要围绕在工作流程中使用/不使用ADS的机构的比较有效性和决策结果建立证据基础。
    Data access committees (DAC) gatekeep access to secured genomic and related health datasets yet are challenged to keep pace with the rising volume and complexity of data generation. Automated decision support (ADS) systems have been shown to support consistency, compliance, and coordination of data access review decisions. However, we lack understanding of how DAC members perceive the value add of ADS, if any, on the quality and effectiveness of their reviews. In this qualitative study, we report findings from 13 semi-structured interviews with DAC members from around the world to identify relevant barriers and facilitators to implementing ADS for genomic data access management. Participants generally supported pilot studies that test ADS performance, for example in cataloging data types, verifying user credentials and tagging datasets for use terms. Concerns related to over-automation, lack of human oversight, low prioritization, and misalignment with institutional missions tempered enthusiasm for ADS among the DAC members we engaged. Tensions for change in institutional settings within which DACs operated was a powerful motivator for why DAC members considered the implementation of ADS into their access workflows, as well as perceptions of the relative advantage of ADS over the status quo. Future research is needed to build the evidence base around the comparative effectiveness and decisional outcomes of institutions that do/not use ADS into their workflows.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:分子技术的出现极大地影响了生物体进化史的重建,导致利用来自不同物种的基因组数据的研究显着增加。然而,基因命名法缺乏标准化对数据库搜索和进化分析提出了挑战,影响所获得结果的准确性。
    结果:要解决此问题,用于标准化基因命名的Python类,合成基因,已经开发了。它自动识别并将不同的术语变体转换为标准化形式,促进全面和准确的搜索。此外,SynGenes提供了一个网络表单,用于使用与同一基因相关的不同名称进行个人搜索。SynGenes数据库总共包含545个线粒体基因名称变异和2485个叶绿体基因,为研究人员提供了宝贵的资源。
    结论:SynGenes平台提供了一种解决方案,用于标准化线粒体和叶绿体基因的基因命名,并为GenBank中的特定标记提供了标准化的搜索解决方案。通过在GenBank和PubMedCentral上进行的研究,对SynGenes有效性的评估表明,与传统搜索相比,它能够产生更多的结果。确保更全面和准确的结果。此工具对于准确的数据库搜索至关重要,因此,进化分析,解决非标准化基因命名法带来的挑战。
    BACKGROUND: The reconstruction of the evolutionary history of organisms has been greatly influenced by the advent of molecular techniques, leading to a significant increase in studies utilizing genomic data from different species. However, the lack of standardization in gene nomenclature poses a challenge in database searches and evolutionary analyses, impacting the accuracy of results obtained.
    RESULTS: To address this issue, a Python class for standardizing gene nomenclatures, SynGenes, has been developed. It automatically recognizes and converts different nomenclature variations into a standardized form, facilitating comprehensive and accurate searches. Additionally, SynGenes offers a web form for individual searches using different names associated with the same gene. The SynGenes database contains a total of 545 gene name variations for mitochondrial and 2485 for chloroplasts genes, providing a valuable resource for researchers.
    CONCLUSIONS: The SynGenes platform offers a solution for standardizing gene nomenclatures of mitochondrial and chloroplast genes and providing a standardized search solution for specific markers in GenBank. Evaluation of SynGenes effectiveness through research conducted on GenBank and PubMedCentral demonstrated its ability to yield a greater number of outcomes compared to conventional searches, ensuring more comprehensive and accurate results. This tool is crucial for accurate database searches, and consequently, evolutionary analyses, addressing the challenges posed by non-standardized gene nomenclature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号