annotation

注释
  • 文章类型: Journal Article
    背景:卷心菜网虫,Hellulaundalis(Fabricius)(鳞翅目:Pyralidae),是全球温暖地区的芸苔属植物和其他十字花科植物的重要害虫。转录组分析对于研究昆虫发育和繁殖的分子机制很有价值。当没有可用的参考基因组时,从头组装对于获得昆虫物种的完整转录组信息特别有用。在Hellula的情况下,目前整个NCBI核苷酸数据库中只有17个核苷酸记录。与代谢过程相关的基因,一般发展,繁殖,防御和功能基因组学先前未在基因组水平上预测。
    结果:要解决此问题,我们使用IlluminaNovaSeq6000技术构建了Hellulaundalis转录组。从测序获得大约48百万个150bp的配对末端读数。通过样品的从头组装产生了总共30,451个重叠群,并将其与NCBI非冗余蛋白质数据库(Nr)中的序列进行了比较。总的来说,71%的重叠群与公共数据库中的已知蛋白质匹配,包括Nr,基因本体论(GO),和集群直系同源基因数据库(COG),然后,通过针对京都基因百科全书和基因组途径数据库(KEGG)的功能注释,将重叠群映射到123。此外,我们比较了Hullulaundalis的直系同源基因家族,节食夜蛾的转录组,斜纹夜蛾和斜纹夜蛾,发现391个直系同源基因家族是绿叶夜蛾特有的。在Hullulaundalis重叠群中发现了总共1,913个潜在的SSR。
    结论:这项研究是Hullulaundalis的第一个转录组数据。此外,它是识别目标基因和开发有效和环境友好的虫害防治策略的宝贵资源。
    BACKGROUND: The cabbage webworm, Hellula undalis (Fabricius) (Lepidoptera: Pyralidae), is a significant pest of brassicas and other cruciferous plants in warm regions worldwide. Transcriptome analysis is valuable for investigation of molecular mechanisms underlying the insect development and reproduction. De novo assembly is particularly useful for acquiring complete transcriptome information of insect species when there is no reference genome available. In case of Hellula undalis, only 17 nucleotide records are currently available throughout NCBI nucleotide database. Genes associated with metabolic processes, general development, reproduction, defense and functional genomics were not previously predicted in the Hellula undalis at the genomic level.
    RESULTS: To address this issue, we constructed Hellula undalis transcriptome using Illumina NovaSeq6000 technology. Approximately 48 million 150 bp paired-end reads were obtained from sequencing. A total of 30,451 contigs were generated by de novo assembly of sample and were compared with the sequences in the NCBI non-redundant protein database (Nr). In total, 71 % of contigs were matched to known proteins in public databases including Nr, Gene Ontology (GO), and Cluster Orthologous Gene Database (COG), and then, contigs were mapped to 123 via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG). In addition, we compared the ortholog gene family of the Hullula undalis, transcriptome to Spodoptera frugiperda, spodotera litura and spodoptera littoralis and found that 391 orthologous gene families are specific to Hullula undalis. A total of 1,913 potential SSRs was discovered in Hullula undalis contigs.
    CONCLUSIONS: This study is the first transcriptome data for Hullula undalis. Additionally, it serves as a valuable resource for identifying target genes and developing effective and environmentally friendly strategies for pest control.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:整合来自代表不同研究设计的数据源的信息有可能加强人口健康研究的证据。然而,这种“三角测量”的证据概念对系统地识别和整合相关信息提出了许多挑战。其中包括异构证据与共同语义概念和属性的协调,以及检索到的证据的优先级与感兴趣的问题的三角测量。
    结果:我们提供ASQ(带注释的语义查询),在EpiGraphDB中集成生物医学实体和流行病学证据的自然语言查询接口,它使用户能够从一段非结构化文本中提取“声明”,然后调查可能支持的证据,矛盾的说法,或为查询提供其他信息。这种方法有可能支持对预印本的快速审查,赠款申请,会议摘要和提交同行评审的文章。ASQ实施策略来协调不同分类中的生物医学实体和来自不同来源的证据,以促进证据的三角剖分和解释。
    方法:ASQ可在https://asq上公开获得。epigraphdb.org及其源代码可在GPL-3.0许可证下在https://github.com/mrcieu/epigraphdb-asq获得。
    背景:可以在补充材料以及通过https://asq在ASQ平台上找到更多信息。epigraphdb.org/docs.
    BACKGROUND: Integrating information from data sources representing different study designs has the potential to strengthen evidence in population health research. However, this concept of evidence \"triangulation\" presents a number of challenges for systematically identifying and integrating relevant information. These include the harmonization of heterogenous evidence with common semantic concepts and properties, as well as the priortization of the retrieved evidence for triangulation with the question of interest.
    RESULTS: We present Annotated Semantic Queries (ASQ), a natural language query interface to the integrated biomedical entities and epidemiological evidence in EpiGraphDB, which enables users to extract \"claims\" from a piece of unstructured text, and then investigate the evidence that could either support, contradict the claims, or offer additional information to the query. This approach has the potential to support the rapid review of preprints, grant applications, conference abstracts, and articles submitted for peer review. ASQ implements strategies to harmonize biomedical entities in different taxonomies and evidence from different sources, to facilitate evidence triangulation and interpretation.
    METHODS: ASQ is openly available at https://asq.epigraphdb.org and its source code is available at https://github.com/mrcieu/epigraphdb-asq under GPL-3.0 license.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    分析主动脉和左心室流出道(LVOT)的解剖结构对于经导管主动脉瓣植入术(TAVI)的风险评估和计划至关重要。对主动脉根和LVOT的全面分析需要通过分割提取患者个体解剖结构。深度学习在各种分割任务中表现出良好的性能。如果这被表述为监督问题,训练需要大量的注释数据。因此,最小化注释复杂性是可取的。
    我们提出了二维(2D)横截面注释和基于点云的表面重建,以训练用于主动脉根和LVOT的全自动3D分割网络。我们的稀疏注释方案可以轻松快速地生成主动脉根部等管状结构的训练数据。从分割结果来看,我们得出TAVI计划的临床相关参数.
    提出的2D横截面注释结果在观察者之间具有很高的一致性[Dice相似系数(DSC):0.94]。分割模型实现了0.90的DSC和0.96mm的平均表面距离。我们的方法实现了预测和注释之间的主动脉瓣环最大直径差0.45mm(观察者间方差:0.25mm)。
    所提出的方法促进了可重复的注释。注释允许训练主动脉根和LVOT的准确分割模型。分割结果有助于对TAVI计划进行可再现和可量化的测量。
    UNASSIGNED: Analyzing the anatomy of the aorta and left ventricular outflow tract (LVOT) is crucial for risk assessment and planning of transcatheter aortic valve implantation (TAVI). A comprehensive analysis of the aortic root and LVOT requires the extraction of the patient-individual anatomy via segmentation. Deep learning has shown good performance on various segmentation tasks. If this is formulated as a supervised problem, large amounts of annotated data are required for training. Therefore, minimizing the annotation complexity is desirable.
    UNASSIGNED: We propose two-dimensional (2D) cross-sectional annotation and point cloud-based surface reconstruction to train a fully automatic 3D segmentation network for the aortic root and the LVOT. Our sparse annotation scheme enables easy and fast training data generation for tubular structures such as the aortic root. From the segmentation results, we derive clinically relevant parameters for TAVI planning.
    UNASSIGNED: The proposed 2D cross-sectional annotation results in high inter-observer agreement [Dice similarity coefficient (DSC): 0.94]. The segmentation model achieves a DSC of 0.90 and an average surface distance of 0.96 mm. Our approach achieves an aortic annulus maximum diameter difference between prediction and annotation of 0.45 mm (inter-observer variance: 0.25 mm).
    UNASSIGNED: The presented approach facilitates reproducible annotations. The annotations allow for training accurate segmentation models of the aortic root and LVOT. The segmentation results facilitate reproducible and quantifiable measurements for TAVI planning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作物生长监测对于作物和供应链管理都至关重要。传统的手动采样对于评估整个田地或所有田地中的作物生长的空间变异性是不可行的。同时,基于无人机的遥感可以对作物生长进行有效和无损的调查。需要各种特定于作物的训练图像数据集来使用深度学习模型从无人机图像中检测作物。具体来说,白菜的训练数据集有限。这篇数据文章包括田间带注释的卷心菜图像,以使用机器学习模型识别卷心菜。该数据集包含458个图像,其中17,621个带注释的卷心菜。图像大小约为500至1000像素正方形。由于这些卷心菜图像是在多年的整个生长季节从不同品种收集的,用这个数据集训练的深度学习模型将能够识别各种各样的白菜形状。在未来,该数据集不仅可以用于无人机,还可以用于陆基机器人应用,用于作物传感或相关的植物特定管理。
    Crop growth monitoring is essential for both crop and supply chain management. Conventional manual sampling is not feasible for assessing the spatial variability of crop growth within an entire field or across all fields. Meanwhile, UAV-based remote sensing enables the efficient and nondestructive investigation of crop growth. A variety of crop-specific training image datasets are needed to detect crops from UAV imagery using a deep learning model. Specifically, the training dataset of cabbage is limited. This data article includes annotated cabbage images in the fields to recognize cabbages using machine learning models. This dataset contains 458 images with 17,621 annotated cabbages. Image sizes are approximately 500 to 1000 pixel squares. Since these cabbage images were collected from different cultivars during the whole growing season over the years, deep learning models trained with this dataset will be able to recognize a wide variety of cabbage shapes. In the future, this dataset can be used not only in UAVs but also in land-based robot applications for crop sensing or associated plant-specific management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全面而准确的基因组注释对于推断生物体的预测功能至关重要。存在许多工具来注释基因,基因簇,移动遗传元素,和其他多样化的特征。然而,这些工具和管道很难安装和运行,专门针对特定元素或特征,或缺少提供重要基因组背景的较大元素的注释。整合分析结果对于理解基因功能也很重要。为了应对这些挑战,我们介绍Beav注释管道。Beav是一个命令行工具,可以自动注释细菌基因组序列,移动遗传元素,分子系统和基因簇,关键监管功能,和其他元素。除了自定义模型之外,Beav还使用现有工具,脚本,和数据库来注释不同的元素,系统,和序列特征。结合了植物相关微生物的自定义数据库,以改善农业上重要的病原体和互生体中关键毒力和共生基因的注释。Beav包括任选的农杆菌特异性管道,其鉴定和分类致癌质粒并注释质粒特异性特征。完成所有分析后,注释被合并以产生单一的综合输出。最后,Beav生成出版物质量的基因组和质粒图谱。Beav位于Bioconda上,可从https://github.com/weisberglab/beav下载。
    目的:基因组特征的注释,比如基因的存在及其预测的功能,或编码分泌系统或生物合成基因簇的较大基因座,是理解有机体编码的功能所必需的。基因组还可以承载不同的可移动遗传元件,如整合和共轭元件和/或噬菌体,通常不被现有管道注释。这些元件可以水平移动编码毒力的基因,抗菌素耐药性,或其他适应性功能并改变生物体的表型。我们开发了一个软件管道,叫Beav,它结合了新的和现有的工具,对这些和其他主要功能进行了全面的注释。现有的管道经常错误地注释对植物相关细菌中的毒力或共生很重要的基因座。Beav包括自定义数据库和可选的工作流程,用于改进植物相关细菌的注释。Beav的设计易于安装和运行,使全面的基因组注释广泛提供给研究界。
    Comprehensive and accurate genome annotation is crucial for inferring the predicted functions of an organism. Numerous tools exist to annotate genes, gene clusters, mobile genetic elements, and other diverse features. However, these tools and pipelines can be difficult to install and run, be specialized for a particular element or feature, or lack annotations for larger elements that provide important genomic context. Integrating results across analyses is also important for understanding gene function. To address these challenges, we present the Beav annotation pipeline. Beav is a command-line tool that automates the annotation of bacterial genome sequences, mobile genetic elements, molecular systems and gene clusters, key regulatory features, and other elements. Beav uses existing tools in addition to custom models, scripts, and databases to annotate diverse elements, systems, and sequence features. Custom databases for plant-associated microbes are incorporated to improve annotation of key virulence and symbiosis genes in agriculturally important pathogens and mutualists. Beav includes an optional Agrobacterium-specific pipeline that identifies and classifies oncogenic plasmids and annotates plasmid-specific features. Following the completion of all analyses, annotations are consolidated to produce a single comprehensive output. Finally, Beav generates publication-quality genome and plasmid maps. Beav is on Bioconda and is available for download at https://github.com/weisberglab/beav.
    OBJECTIVE: Annotation of genome features, such as the presence of genes and their predicted function, or larger loci encoding secretion systems or biosynthetic gene clusters, is necessary for understanding the functions encoded by an organism. Genomes can also host diverse mobile genetic elements, such as integrative and conjugative elements and/or phages, that are often not annotated by existing pipelines. These elements can horizontally mobilize genes encoding for virulence, antimicrobial resistance, or other adaptive functions and alter the phenotype of an organism. We developed a software pipeline, called Beav, that combines new and existing tools for the comprehensive annotation of these and other major features. Existing pipelines often misannotate loci important for virulence or mutualism in plant-associated bacteria. Beav includes custom databases and optional workflows for the improved annotation of plant-associated bacteria. Beav is designed to be easy to install and run, making comprehensive genome annotation broadly available to the research community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:在大多数陆地生态系统中,蚂蚁是生态优势昆虫,迄今为止,已有约340属的14,000多个现存物种。然而,对于大多数物种来说,基因组资源仍然稀缺,尤其是东亚或东南亚特有的物种,限制了系统发育的研究,这种进化上成功的动物谱系的物种形成和适应。这里,我们组装并注释了Odontoponeratransversa和Camponotusfriedae的基因组,两种在中国自然分布的蚂蚁,以促进未来对蚂蚁进化的研究。
    方法:我们获得了O.transversa和C.friedae的16Gb和51GbPacBioHiFi数据,分别,将其组装到O.transversa的339Mb和C.friedae的233Mb的草案基因组中。通过多个度量进行的基因组评估显示了两个组装的良好完整性和高精度。RNA-seq数据辅助的基因注释在两个基因组中产生了相当数量的蛋白质编码基因(O.transversa为10,892,C.friedae为11,296),而重复注释显示,这两种蚂蚁之间的重复含量存在显着差异(O.transversa为149.4Mb,而Friedae为49.7Mb)。此外,组装并注释了这两个物种的完整线粒体基因组。
    OBJECTIVE: Ants are ecologically dominant insects in most terrestrial ecosystems, with more than 14,000 extant species in about 340 genera recorded to date. However, genomic resources are still scarce for most species, especially for species endemic in East or Southeast Asia, limiting the study of phylogeny, speciation and adaptation of this evolutionarily successful animal lineage. Here, we assemble and annotate the genomes of Odontoponera transversa and Camponotus friedae, two ant species with a natural distribution in China, to facilitate future study of ant evolution.
    METHODS: We obtained a total of 16 Gb and 51 Gb PacBio HiFi data for O. transversa and C. friedae, respectively, which were assembled into the draft genomes of 339 Mb for O. transversa and 233 Mb for C. friedae. Genome assessments by multiple metrics showed good completeness and high accuracy of the two assemblies. Gene annotations assisted by RNA-seq data yielded a comparable number of protein-coding genes in the two genomes (10,892 for O. transversa and 11,296 for C. friedae), while repeat annotations revealed a remarkable difference of repeat content between these two ant species (149.4 Mb for O. transversa versus 49.7 Mb for C. friedae). Besides, complete mitochondrial genomes for the two species were assembled and annotated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:命名实体识别(NER)是自然语言处理中的一项基本任务。然而,它之前通常是命名实体注释,这带来了一些挑战,尤其是在临床领域。例如,确定实体边界是注释者之间最常见的分歧来源之一,因为诸如是否应该注释修饰语或外围词。如果未解决,这些会导致产生的语料库不一致,然而,另一方面,严格的指导方针或裁决会议可以进一步延长已经缓慢和复杂的过程。
    目的:本研究的目的是通过评估两种新颖的注释方法来解决这些挑战,宽松的跨度和点注释,旨在减轻精确确定实体边界的难度。
    方法:我们通过对日本医学病例报告数据集的注释案例研究来评估其效果。我们比较注释时间,注释者协议,和生成的标签的质量,并评估对在注释的语料库上训练的NER系统的性能的影响。
    结果:我们看到了标签过程效率的显着提高,与传统的边界严格方法相比,整体注释时间减少了25%,注释者协议甚至提高了10%。然而,与传统的注释方法相比,即使是最好的NER模型也表现出一些性能下降。
    结论:我们的发现证明了注释速度和模型性能之间的平衡。尽管忽略边界信息会在一定程度上影响模型性能,这是由显著减少注释者的工作量和显著提高注释过程的速度所抵消的。这些好处可能在各种应用中被证明是有价值的,为开发人员和研究人员提供了一个有吸引力的折衷方案。
    BACKGROUND: Named entity recognition (NER) is a fundamental task in natural language processing. However, it is typically preceded by named entity annotation, which poses several challenges, especially in the clinical domain. For instance, determining entity boundaries is one of the most common sources of disagreements between annotators due to questions such as whether modifiers or peripheral words should be annotated. If unresolved, these can induce inconsistency in the produced corpora, yet, on the other hand, strict guidelines or adjudication sessions can further prolong an already slow and convoluted process.
    OBJECTIVE: The aim of this study is to address these challenges by evaluating 2 novel annotation methodologies, lenient span and point annotation, aiming to mitigate the difficulty of precisely determining entity boundaries.
    METHODS: We evaluate their effects through an annotation case study on a Japanese medical case report data set. We compare annotation time, annotator agreement, and the quality of the produced labeling and assess the impact on the performance of an NER system trained on the annotated corpus.
    RESULTS: We saw significant improvements in the labeling process efficiency, with up to a 25% reduction in overall annotation time and even a 10% improvement in annotator agreement compared to the traditional boundary-strict approach. However, even the best-achieved NER model presented some drop in performance compared to the traditional annotation methodology.
    CONCLUSIONS: Our findings demonstrate a balance between annotation speed and model performance. Although disregarding boundary information affects model performance to some extent, this is counterbalanced by significant reductions in the annotator\'s workload and notable improvements in the speed of the annotation process. These benefits may prove valuable in various applications, offering an attractive compromise for developers and researchers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    纳米孔直接RNA测序(DRS)能够捕获和全长测序天然RNA,没有重新编码或放大偏差。可以查询所得数据集以定义化学修饰的核糖核苷酸的身份和位置,以及聚(A)尾巴的长度,在单个RNA分子上。这些分析的成功在很大程度上取决于高分辨率转录组注释的提供,以及最小化未对齐和其他分析工件的工作流程。用于生成高分辨率转录组注释的现有软件解决方案不太适合于病毒的小基因密集基因组,这是由于鉴定其中可变剪接和重叠RNA普遍存在的不同转录同种型的挑战。为了解决这个问题,我们确定了DRS数据集的关键特征,这些特征可为结果的读数比对提供信息,并开发了转录组体系结构的纳米孔指导注释(NAGATA)软件包(https://github.com/DepledgeLab/NAGATA).我们证明,使用来自腺病毒的合成和原始DRS数据集的组合,疱疹病毒,冠状病毒,和人类细胞,NAGATA优于现有的转录组注释软件,并在重建基因稀疏和基因密集转录组时产生一致的高水平的精确度和召回率。最后,我们应用NAGATA产生被忽视的病原体人类腺病毒F41型(HAdV-41)的第一个高分辨率转录组注释,为此我们鉴定出77种不同的转录本,编码至少23种不同的蛋白质.
    目的:生物体的转录组表示可以表达的编码RNA的全部库。这对于理解生物体的生物学以及准确的转录组和基于表观转录组的分析至关重要。注释转录组仍然是一项复杂的任务,特别是在小的基因密集的生物体,如病毒,通过重叠的RNA最大化其编码能力。为了解决这个问题,我们开发了一种新的软件纳米孔引导的转录组结构注释(NAGATA),它利用纳米孔直接RNA测序(DRS)数据集来快速产生高分辨率的转录组注释不同的病毒和其他生物体。
    Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package (https://github.com/DepledgeLab/NAGATA). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins.
    OBJECTIVE: The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: News
    该数据集涉及通过从巴西网站上抓取的大豆市场新闻的集合。这些新闻文章从2015年1月到2023年6月收集,并经历了标签过程,将它们分类为相关或不相关。新闻贴标签过程是在农业经济学专家的指导下进行的,他与九个人合作。十个参数被认为有助于参与者在标记过程中。该数据集包含大约11,000篇新闻文章,对于有兴趣探索大豆市场趋势的研究人员来说,这是一个宝贵的资源。重要的是,该数据集可以用于分类和自然语言处理等任务。它提供了对标签大豆市场新闻的见解,并支持开放的科学举措,促进研究社区内的进一步分析。
    This dataset involves a collection of soybean market news through web scraping from a Brazilian website. The news articles gathered span from January 2015 to June 2023 and have undergone a labeling process to categorize them as relevant or non-relevant. The news labeling process was conducted under the guidance of an agricultural economics expert, who collaborated with a group of nine individuals. Ten parameters were considered to assist participants in the labeling process. The dataset comprises approximately 11,000 news articles and serves as a valuable resource for researchers interested in exploring trends in the soybean market. Importantly, this dataset can be utilized for tasks such as classification and natural language processing. It provides insights into labeled soybean market news and supports open science initiatives, facilitating further analysis within the research community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究描述了一个资源模块的开发,该模块是名为“NIGMSSandboxforCloud-basedLearning”(https://github.com/NIGMS/NIGMS-Sandbox)的学习平台的一部分。沙箱的整体起源在本补编开头的编辑NIGMS沙箱中进行了描述。该模块使用Nextflow以交互式格式提供有关从头转录组组装的学习材料,该格式使用适当的云资源进行数据访问和分析。云计算是一种强大的新手段,生物医学研究人员可以通过它访问以前无法实现或过于昂贵的资源和容量。为了利用这些资源,然而,生物医学研究界需要新的技能和知识。我们在这里介绍一个基于云的训练模块,与GoogleCloud共同开发,德勤咨询,和NIHSTRIDES计划,它使用从头转录组组装的生物学问题来演示和教授计算工作流(使用Nextflow)以及云服务的成本和资源高效使用(使用GoogleCloudPlatform)的概念。我们的工作强调了减少现场计算资源的必要性和基于云的基础设施对生物信息学应用的可访问性。
    This study describes the development of a resource module that is part of a learning platform named \"NIGMS Sandbox for Cloud-based Learning\" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号