annotation

注释
  • 文章类型: Journal Article
    我们介绍了最大的腹部CT数据集(称为AbdomenAtlas),包括20,460个三维CT体积,来自不同人群的112家医院,地理位置,和设施。AbdomenAtlas在AI算法的帮助下,提供了由10名放射科医生组成的团队注释的673K高质量的腹部解剖结构面罩。我们首先让放射科专家手动注释5,246个CT卷中的22个解剖结构。在此之后,对剩余的CT体积执行半自动注释程序,放射科医生修改AI预测的注释,反过来,AI通过从修订的注释中学习来改善其预测。如此大规模,详细注释,和多中心数据集的需要有两个原因。首先,AbdomenAtlas为大规模人工智能开发提供了重要资源,品牌为大型预训练模型,这可以减轻专家放射科医生的注释工作量,从而转移到更广泛的临床应用中。其次,AbdomenAtlas建立了评估AI算法的大规模基准-我们用于测试算法的数据越多,我们可以更好地保证在复杂的临床场景中的可靠性能。ISBI和MICCAI挑战名为BodyMaps:Towards3DAtlas是使用我们的AbdomenAtlas的一个子集启动的,旨在刺激人工智能创新,并对细分精度进行基准测试,推理效率,和领域的可泛化性。我们希望我们的AbdomenAtlas能够为更大规模的临床试验奠定基础,并为医学影像界的从业者提供特殊的机会。代码,模型,和数据集可在https://www上获得。zongweiz.com/dataset。
    We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673 K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manually annotate 22 anatomical structures in 5,246 CT volumes. Following this, a semi-automatic annotation procedure is performed for the remaining CT volumes, where radiologists revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from revised annotations. Such a large-scale, detailed-annotated, and multi-center dataset is needed for two reasons. Firstly, AbdomenAtlas provides important resources for AI development at scale, branded as large pre-trained models, which can alleviate the annotation workload of expert radiologists to transfer to broader clinical applications. Secondly, AbdomenAtlas establishes a large-scale benchmark for evaluating AI algorithms-the more data we use to test the algorithms, the better we can guarantee reliable performance in complex clinical scenarios. An ISBI & MICCAI challenge named BodyMaps: Towards 3D Atlas of Human Body was launched using a subset of our AbdomenAtlas, aiming to stimulate AI innovation and to benchmark segmentation accuracy, inference efficiency, and domain generalizability. We hope our AbdomenAtlas can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community. Codes, models, and datasets are available at https://www.zongweiz.com/dataset.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    分析主动脉和左心室流出道(LVOT)的解剖结构对于经导管主动脉瓣植入术(TAVI)的风险评估和计划至关重要。对主动脉根和LVOT的全面分析需要通过分割提取患者个体解剖结构。深度学习在各种分割任务中表现出良好的性能。如果这被表述为监督问题,训练需要大量的注释数据。因此,最小化注释复杂性是可取的。
    我们提出了二维(2D)横截面注释和基于点云的表面重建,以训练用于主动脉根和LVOT的全自动3D分割网络。我们的稀疏注释方案可以轻松快速地生成主动脉根部等管状结构的训练数据。从分割结果来看,我们得出TAVI计划的临床相关参数.
    提出的2D横截面注释结果在观察者之间具有很高的一致性[Dice相似系数(DSC):0.94]。分割模型实现了0.90的DSC和0.96mm的平均表面距离。我们的方法实现了预测和注释之间的主动脉瓣环最大直径差0.45mm(观察者间方差:0.25mm)。
    所提出的方法促进了可重复的注释。注释允许训练主动脉根和LVOT的准确分割模型。分割结果有助于对TAVI计划进行可再现和可量化的测量。
    UNASSIGNED: Analyzing the anatomy of the aorta and left ventricular outflow tract (LVOT) is crucial for risk assessment and planning of transcatheter aortic valve implantation (TAVI). A comprehensive analysis of the aortic root and LVOT requires the extraction of the patient-individual anatomy via segmentation. Deep learning has shown good performance on various segmentation tasks. If this is formulated as a supervised problem, large amounts of annotated data are required for training. Therefore, minimizing the annotation complexity is desirable.
    UNASSIGNED: We propose two-dimensional (2D) cross-sectional annotation and point cloud-based surface reconstruction to train a fully automatic 3D segmentation network for the aortic root and the LVOT. Our sparse annotation scheme enables easy and fast training data generation for tubular structures such as the aortic root. From the segmentation results, we derive clinically relevant parameters for TAVI planning.
    UNASSIGNED: The proposed 2D cross-sectional annotation results in high inter-observer agreement [Dice similarity coefficient (DSC): 0.94]. The segmentation model achieves a DSC of 0.90 and an average surface distance of 0.96 mm. Our approach achieves an aortic annulus maximum diameter difference between prediction and annotation of 0.45 mm (inter-observer variance: 0.25 mm).
    UNASSIGNED: The presented approach facilitates reproducible annotations. The annotations allow for training accurate segmentation models of the aortic root and LVOT. The segmentation results facilitate reproducible and quantifiable measurements for TAVI planning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作物生长监测对于作物和供应链管理都至关重要。传统的手动采样对于评估整个田地或所有田地中的作物生长的空间变异性是不可行的。同时,基于无人机的遥感可以对作物生长进行有效和无损的调查。需要各种特定于作物的训练图像数据集来使用深度学习模型从无人机图像中检测作物。具体来说,白菜的训练数据集有限。这篇数据文章包括田间带注释的卷心菜图像,以使用机器学习模型识别卷心菜。该数据集包含458个图像,其中17,621个带注释的卷心菜。图像大小约为500至1000像素正方形。由于这些卷心菜图像是在多年的整个生长季节从不同品种收集的,用这个数据集训练的深度学习模型将能够识别各种各样的白菜形状。在未来,该数据集不仅可以用于无人机,还可以用于陆基机器人应用,用于作物传感或相关的植物特定管理。
    Crop growth monitoring is essential for both crop and supply chain management. Conventional manual sampling is not feasible for assessing the spatial variability of crop growth within an entire field or across all fields. Meanwhile, UAV-based remote sensing enables the efficient and nondestructive investigation of crop growth. A variety of crop-specific training image datasets are needed to detect crops from UAV imagery using a deep learning model. Specifically, the training dataset of cabbage is limited. This data article includes annotated cabbage images in the fields to recognize cabbages using machine learning models. This dataset contains 458 images with 17,621 annotated cabbages. Image sizes are approximately 500 to 1000 pixel squares. Since these cabbage images were collected from different cultivars during the whole growing season over the years, deep learning models trained with this dataset will be able to recognize a wide variety of cabbage shapes. In the future, this dataset can be used not only in UAVs but also in land-based robot applications for crop sensing or associated plant-specific management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全面而准确的基因组注释对于推断生物体的预测功能至关重要。存在许多工具来注释基因,基因簇,移动遗传元素,和其他多样化的特征。然而,这些工具和管道很难安装和运行,专门针对特定元素或特征,或缺少提供重要基因组背景的较大元素的注释。整合分析结果对于理解基因功能也很重要。为了应对这些挑战,我们介绍Beav注释管道。Beav是一个命令行工具,可以自动注释细菌基因组序列,移动遗传元素,分子系统和基因簇,关键监管功能,和其他元素。除了自定义模型之外,Beav还使用现有工具,脚本,和数据库来注释不同的元素,系统,和序列特征。结合了植物相关微生物的自定义数据库,以改善农业上重要的病原体和互生体中关键毒力和共生基因的注释。Beav包括任选的农杆菌特异性管道,其鉴定和分类致癌质粒并注释质粒特异性特征。完成所有分析后,注释被合并以产生单一的综合输出。最后,Beav生成出版物质量的基因组和质粒图谱。Beav位于Bioconda上,可从https://github.com/weisberglab/beav下载。
    目的:基因组特征的注释,比如基因的存在及其预测的功能,或编码分泌系统或生物合成基因簇的较大基因座,是理解有机体编码的功能所必需的。基因组还可以承载不同的可移动遗传元件,如整合和共轭元件和/或噬菌体,通常不被现有管道注释。这些元件可以水平移动编码毒力的基因,抗菌素耐药性,或其他适应性功能并改变生物体的表型。我们开发了一个软件管道,叫Beav,它结合了新的和现有的工具,对这些和其他主要功能进行了全面的注释。现有的管道经常错误地注释对植物相关细菌中的毒力或共生很重要的基因座。Beav包括自定义数据库和可选的工作流程,用于改进植物相关细菌的注释。Beav的设计易于安装和运行,使全面的基因组注释广泛提供给研究界。
    Comprehensive and accurate genome annotation is crucial for inferring the predicted functions of an organism. Numerous tools exist to annotate genes, gene clusters, mobile genetic elements, and other diverse features. However, these tools and pipelines can be difficult to install and run, be specialized for a particular element or feature, or lack annotations for larger elements that provide important genomic context. Integrating results across analyses is also important for understanding gene function. To address these challenges, we present the Beav annotation pipeline. Beav is a command-line tool that automates the annotation of bacterial genome sequences, mobile genetic elements, molecular systems and gene clusters, key regulatory features, and other elements. Beav uses existing tools in addition to custom models, scripts, and databases to annotate diverse elements, systems, and sequence features. Custom databases for plant-associated microbes are incorporated to improve annotation of key virulence and symbiosis genes in agriculturally important pathogens and mutualists. Beav includes an optional Agrobacterium-specific pipeline that identifies and classifies oncogenic plasmids and annotates plasmid-specific features. Following the completion of all analyses, annotations are consolidated to produce a single comprehensive output. Finally, Beav generates publication-quality genome and plasmid maps. Beav is on Bioconda and is available for download at https://github.com/weisberglab/beav.
    OBJECTIVE: Annotation of genome features, such as the presence of genes and their predicted function, or larger loci encoding secretion systems or biosynthetic gene clusters, is necessary for understanding the functions encoded by an organism. Genomes can also host diverse mobile genetic elements, such as integrative and conjugative elements and/or phages, that are often not annotated by existing pipelines. These elements can horizontally mobilize genes encoding for virulence, antimicrobial resistance, or other adaptive functions and alter the phenotype of an organism. We developed a software pipeline, called Beav, that combines new and existing tools for the comprehensive annotation of these and other major features. Existing pipelines often misannotate loci important for virulence or mutualism in plant-associated bacteria. Beav includes custom databases and optional workflows for the improved annotation of plant-associated bacteria. Beav is designed to be easy to install and run, making comprehensive genome annotation broadly available to the research community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:在大多数陆地生态系统中,蚂蚁是生态优势昆虫,迄今为止,已有约340属的14,000多个现存物种。然而,对于大多数物种来说,基因组资源仍然稀缺,尤其是东亚或东南亚特有的物种,限制了系统发育的研究,这种进化上成功的动物谱系的物种形成和适应。这里,我们组装并注释了Odontoponeratransversa和Camponotusfriedae的基因组,两种在中国自然分布的蚂蚁,以促进未来对蚂蚁进化的研究。
    方法:我们获得了O.transversa和C.friedae的16Gb和51GbPacBioHiFi数据,分别,将其组装到O.transversa的339Mb和C.friedae的233Mb的草案基因组中。通过多个度量进行的基因组评估显示了两个组装的良好完整性和高精度。RNA-seq数据辅助的基因注释在两个基因组中产生了相当数量的蛋白质编码基因(O.transversa为10,892,C.friedae为11,296),而重复注释显示,这两种蚂蚁之间的重复含量存在显着差异(O.transversa为149.4Mb,而Friedae为49.7Mb)。此外,组装并注释了这两个物种的完整线粒体基因组。
    OBJECTIVE: Ants are ecologically dominant insects in most terrestrial ecosystems, with more than 14,000 extant species in about 340 genera recorded to date. However, genomic resources are still scarce for most species, especially for species endemic in East or Southeast Asia, limiting the study of phylogeny, speciation and adaptation of this evolutionarily successful animal lineage. Here, we assemble and annotate the genomes of Odontoponera transversa and Camponotus friedae, two ant species with a natural distribution in China, to facilitate future study of ant evolution.
    METHODS: We obtained a total of 16 Gb and 51 Gb PacBio HiFi data for O. transversa and C. friedae, respectively, which were assembled into the draft genomes of 339 Mb for O. transversa and 233 Mb for C. friedae. Genome assessments by multiple metrics showed good completeness and high accuracy of the two assemblies. Gene annotations assisted by RNA-seq data yielded a comparable number of protein-coding genes in the two genomes (10,892 for O. transversa and 11,296 for C. friedae), while repeat annotations revealed a remarkable difference of repeat content between these two ant species (149.4 Mb for O. transversa versus 49.7 Mb for C. friedae). Besides, complete mitochondrial genomes for the two species were assembled and annotated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    呼出气挥发物组学是医学应用中生物标志物发现的强大非侵入性工具,但是复合注释对于病理生理学见解和技术转移至关重要。本研究旨在研究将实时质子转移反应飞行时间质谱(PTR-TOF-MS)与全面的热解吸-二维气相色谱耦合飞行时间质谱(TD-GCxGC-TOF-MS)相结合的混合方法的兴趣,以增强临床研究中VOCs的分析和表征。使用COVID-19作为用例。使用PTR-TOF-MS指纹技术从COVID-19患者的临床研究中选择VOC生物标志物候选物,并与人类呼吸组学数据库进行匹配。使用与PTR-TOF-MS和TD-GCxGC-TOF-MS耦合的液体校准单元分析相应的分析标准品,用TD-GCxGC-TOF-MS确认新的临床样本。从26种潜在的VOC生物标志物中,用PTR-TOF-MS成功检测到23例。使用TD-GCxGC-TOF-MS成功检测所有VOCs,提供高度化学相关化合物的有效分离,包括异构体,并基于二维色谱分离和质谱实现高置信度注释。在临床样品中以1级注释鉴定了四种VOC。对于未来的应用,实时PTR-TOF-MS与全面的TD-GCxGC-TOF-MS相结合,至少在整个研究中的一部分样本上,将提高VOC注释的性能,为临床研究提供生物标志物发现的潜在进步。
    Exhaled breath volatilomics is a powerful non-invasive tool for biomarker discovery in medical applications, but compound annotation is essential for pathophysiological insights and technology transfer. This study was aimed at investigating the interest of a hybrid approach combining real-time proton transfer reaction-time-of-flight mass spectrometry (PTR-TOF-MS) with comprehensive thermal desorption-two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (TD-GCxGC-TOF-MS) to enhance the analysis and characterization of VOCs in clinical research, using COVID-19 as a use case. VOC biomarker candidates were selected from clinical research using PTR-TOF-MS fingerprinting in patients with COVID-19 and matched to the Human Breathomic Database. Corresponding analytical standards were analysed using both a liquid calibration unit coupled to PTR-TOF-MS and TD-GCxGC-TOF-MS, together with confirmation on new clinical samples with TD-GCxGC-TOF-MS. From 26 potential VOC biomarkers, 23 were successfully detected with PTR-TOF-MS. All VOCs were successfully detected using TD-GCxGC-TOF-MS, providing effective separation of highly chemically related compounds, including isomers, and enabling high-confidence annotation based on two-dimensional chromatographic separation and mass spectra. Four VOCs were identified with a level 1 annotation in the clinical samples. For future applications, the combination of real-time PTR-TOF-MS and comprehensive TD-GCxGC-TOF-MS, at least on a subset of samples from a whole study, would enhance the performance of VOC annotation, offering potential advancements in biomarker discovery for clinical research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:命名实体识别(NER)是自然语言处理中的一项基本任务。然而,它之前通常是命名实体注释,这带来了一些挑战,尤其是在临床领域。例如,确定实体边界是注释者之间最常见的分歧来源之一,因为诸如是否应该注释修饰语或外围词。如果未解决,这些会导致产生的语料库不一致,然而,另一方面,严格的指导方针或裁决会议可以进一步延长已经缓慢和复杂的过程。
    目的:本研究的目的是通过评估两种新颖的注释方法来解决这些挑战,宽松的跨度和点注释,旨在减轻精确确定实体边界的难度。
    方法:我们通过对日本医学病例报告数据集的注释案例研究来评估其效果。我们比较注释时间,注释者协议,和生成的标签的质量,并评估对在注释的语料库上训练的NER系统的性能的影响。
    结果:我们看到了标签过程效率的显着提高,与传统的边界严格方法相比,整体注释时间减少了25%,注释者协议甚至提高了10%。然而,与传统的注释方法相比,即使是最好的NER模型也表现出一些性能下降。
    结论:我们的发现证明了注释速度和模型性能之间的平衡。尽管忽略边界信息会在一定程度上影响模型性能,这是由显著减少注释者的工作量和显著提高注释过程的速度所抵消的。这些好处可能在各种应用中被证明是有价值的,为开发人员和研究人员提供了一个有吸引力的折衷方案。
    BACKGROUND: Named entity recognition (NER) is a fundamental task in natural language processing. However, it is typically preceded by named entity annotation, which poses several challenges, especially in the clinical domain. For instance, determining entity boundaries is one of the most common sources of disagreements between annotators due to questions such as whether modifiers or peripheral words should be annotated. If unresolved, these can induce inconsistency in the produced corpora, yet, on the other hand, strict guidelines or adjudication sessions can further prolong an already slow and convoluted process.
    OBJECTIVE: The aim of this study is to address these challenges by evaluating 2 novel annotation methodologies, lenient span and point annotation, aiming to mitigate the difficulty of precisely determining entity boundaries.
    METHODS: We evaluate their effects through an annotation case study on a Japanese medical case report data set. We compare annotation time, annotator agreement, and the quality of the produced labeling and assess the impact on the performance of an NER system trained on the annotated corpus.
    RESULTS: We saw significant improvements in the labeling process efficiency, with up to a 25% reduction in overall annotation time and even a 10% improvement in annotator agreement compared to the traditional boundary-strict approach. However, even the best-achieved NER model presented some drop in performance compared to the traditional annotation methodology.
    CONCLUSIONS: Our findings demonstrate a balance between annotation speed and model performance. Although disregarding boundary information affects model performance to some extent, this is counterbalanced by significant reductions in the annotator\'s workload and notable improvements in the speed of the annotation process. These benefits may prove valuable in various applications, offering an attractive compromise for developers and researchers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    纳米孔直接RNA测序(DRS)能够捕获和全长测序天然RNA,没有重新编码或放大偏差。可以查询所得数据集以定义化学修饰的核糖核苷酸的身份和位置,以及聚(A)尾巴的长度,在单个RNA分子上。这些分析的成功在很大程度上取决于高分辨率转录组注释的提供,以及最小化未对齐和其他分析工件的工作流程。用于生成高分辨率转录组注释的现有软件解决方案不太适合于病毒的小基因密集基因组,这是由于鉴定其中可变剪接和重叠RNA普遍存在的不同转录同种型的挑战。为了解决这个问题,我们确定了DRS数据集的关键特征,这些特征可为结果的读数比对提供信息,并开发了转录组体系结构的纳米孔指导注释(NAGATA)软件包(https://github.com/DepledgeLab/NAGATA).我们证明,使用来自腺病毒的合成和原始DRS数据集的组合,疱疹病毒,冠状病毒,和人类细胞,NAGATA优于现有的转录组注释软件,并在重建基因稀疏和基因密集转录组时产生一致的高水平的精确度和召回率。最后,我们应用NAGATA产生被忽视的病原体人类腺病毒F41型(HAdV-41)的第一个高分辨率转录组注释,为此我们鉴定出77种不同的转录本,编码至少23种不同的蛋白质.
    目的:生物体的转录组表示可以表达的编码RNA的全部库。这对于理解生物体的生物学以及准确的转录组和基于表观转录组的分析至关重要。注释转录组仍然是一项复杂的任务,特别是在小的基因密集的生物体,如病毒,通过重叠的RNA最大化其编码能力。为了解决这个问题,我们开发了一种新的软件纳米孔引导的转录组结构注释(NAGATA),它利用纳米孔直接RNA测序(DRS)数据集来快速产生高分辨率的转录组注释不同的病毒和其他生物体。
    Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package (https://github.com/DepledgeLab/NAGATA). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins.
    OBJECTIVE: The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: News
    该数据集涉及通过从巴西网站上抓取的大豆市场新闻的集合。这些新闻文章从2015年1月到2023年6月收集,并经历了标签过程,将它们分类为相关或不相关。新闻贴标签过程是在农业经济学专家的指导下进行的,他与九个人合作。十个参数被认为有助于参与者在标记过程中。该数据集包含大约11,000篇新闻文章,对于有兴趣探索大豆市场趋势的研究人员来说,这是一个宝贵的资源。重要的是,该数据集可以用于分类和自然语言处理等任务。它提供了对标签大豆市场新闻的见解,并支持开放的科学举措,促进研究社区内的进一步分析。
    This dataset involves a collection of soybean market news through web scraping from a Brazilian website. The news articles gathered span from January 2015 to June 2023 and have undergone a labeling process to categorize them as relevant or non-relevant. The news labeling process was conducted under the guidance of an agricultural economics expert, who collaborated with a group of nine individuals. Ten parameters were considered to assist participants in the labeling process. The dataset comprises approximately 11,000 news articles and serves as a valuable resource for researchers interested in exploring trends in the soybean market. Importantly, this dataset can be utilized for tasks such as classification and natural language processing. It provides insights into labeled soybean market news and supports open science initiatives, facilitating further analysis within the research community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究描述了一个资源模块的开发,该模块是名为“NIGMSSandboxforCloud-basedLearning”(https://github.com/NIGMS/NIGMS-Sandbox)的学习平台的一部分。沙箱的整体起源在本补编开头的编辑NIGMS沙箱中进行了描述。该模块使用Nextflow以交互式格式提供有关从头转录组组装的学习材料,该格式使用适当的云资源进行数据访问和分析。云计算是一种强大的新手段,生物医学研究人员可以通过它访问以前无法实现或过于昂贵的资源和容量。为了利用这些资源,然而,生物医学研究界需要新的技能和知识。我们在这里介绍一个基于云的训练模块,与GoogleCloud共同开发,德勤咨询,和NIHSTRIDES计划,它使用从头转录组组装的生物学问题来演示和教授计算工作流(使用Nextflow)以及云服务的成本和资源高效使用(使用GoogleCloudPlatform)的概念。我们的工作强调了减少现场计算资源的必要性和基于云的基础设施对生物信息学应用的可访问性。
    This study describes the development of a resource module that is part of a learning platform named \"NIGMS Sandbox for Cloud-based Learning\" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号