
  • 文章类型: Journal Article
    The development of reliable artificial intelligence (AI) algorithms in pathology often depends on ground truth provided by annotation of whole slide images (WSI), a time-consuming and operator-dependent process. A comparative analysis of different annotation approaches is performed to streamline this process. Two pathologists annotated renal tissue using semi-automated (Segment Anything Model, SAM)) and manual devices (touchpad vs mouse). A comparison was conducted in terms of working time, reproducibility (overlap fraction), and precision (0 to 10 accuracy rated by two expert nephropathologists) among different methods and operators. The impact of different displays on mouse performance was evaluated. Annotations focused on three tissue compartments: tubules (57 annotations), glomeruli (53 annotations), and arteries (58 annotations). The semi-automatic approach was the fastest and had the least inter-observer variability, averaging 13.6 ± 0.2 min with a difference (Δ) of 2%, followed by the mouse (29.9 ± 10.2, Δ = 24%), and the touchpad (47.5 ± 19.6 min, Δ = 45%). The highest reproducibility in tubules and glomeruli was achieved with SAM (overlap values of 1 and 0.99 compared to 0.97 for the mouse and 0.94 and 0.93 for the touchpad), though SAM had lower reproducibility in arteries (overlap value of 0.89 compared to 0.94 for both the mouse and touchpad). No precision differences were observed between operators (p = 0.59). Using non-medical monitors increased annotation times by 6.1%. The future employment of semi-automated and AI-assisted approaches can significantly speed up the annotation process, improving the ground truth for AI tool development.






  • 文章类型: Journal Article
    We report the whole-genome sequence of Diaporthe australafricana Crous & J.M. van Niekerkusing using Oxford Nanopore long-read sequencing and Illumina short-read sequencing. The hybrid genome consists of 11 contigs with a total length of 53.509 Mb, and a GC content of 52.40%.






  • 文章类型: Journal Article
    BACKGROUND: The cabbage webworm, Hellula undalis (Fabricius) (Lepidoptera: Pyralidae), is a significant pest of brassicas and other cruciferous plants in warm regions worldwide. Transcriptome analysis is valuable for investigation of molecular mechanisms underlying the insect development and reproduction. De novo assembly is particularly useful for acquiring complete transcriptome information of insect species when there is no reference genome available. In case of Hellula undalis, only 17 nucleotide records are currently available throughout NCBI nucleotide database. Genes associated with metabolic processes, general development, reproduction, defense and functional genomics were not previously predicted in the Hellula undalis at the genomic level.
    RESULTS: To address this issue, we constructed Hellula undalis transcriptome using Illumina NovaSeq6000 technology. Approximately 48 million 150 bp paired-end reads were obtained from sequencing. A total of 30,451 contigs were generated by de novo assembly of sample and were compared with the sequences in the NCBI non-redundant protein database (Nr). In total, 71 % of contigs were matched to known proteins in public databases including Nr, Gene Ontology (GO), and Cluster Orthologous Gene Database (COG), and then, contigs were mapped to 123 via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG). In addition, we compared the ortholog gene family of the Hullula undalis, transcriptome to Spodoptera frugiperda, spodotera litura and spodoptera littoralis and found that 391 orthologous gene families are specific to Hullula undalis. A total of 1,913 potential SSRs was discovered in Hullula undalis contigs.
    CONCLUSIONS: This study is the first transcriptome data for Hullula undalis. Additionally, it serves as a valuable resource for identifying target genes and developing effective and environmentally friendly strategies for pest control.






  • 文章类型: Journal Article
    BACKGROUND: Integrating information from data sources representing different study designs has the potential to strengthen evidence in population health research. However, this concept of evidence \"triangulation\" presents a number of challenges for systematically identifying and integrating relevant information. These include the harmonization of heterogenous evidence with common semantic concepts and properties, as well as the priortization of the retrieved evidence for triangulation with the question of interest.
    RESULTS: We present Annotated Semantic Queries (ASQ), a natural language query interface to the integrated biomedical entities and epidemiological evidence in EpiGraphDB, which enables users to extract \"claims\" from a piece of unstructured text, and then investigate the evidence that could either support, contradict the claims, or offer additional information to the query. This approach has the potential to support the rapid review of preprints, grant applications, conference abstracts, and articles submitted for peer review. ASQ implements strategies to harmonize biomedical entities in different taxonomies and evidence from different sources, to facilitate evidence triangulation and interpretation.
    METHODS: ASQ is openly available at and its source code is available at under GPL-3.0 license.






  • 文章类型: Journal Article
    Xenia2 is a DV cluster actinobacteriophage that infects Gordonia rubripertincta NRRL B-16540. The genome is 68,135bp, has a GC content of 57.9% and 98 predicted protein-coding genes, 33 of which have a predicted function. Xenia2 has a lysis cassette with an endolysin (lysin A) and four different holin-like transmembrane proteins.






  • 文章类型: Journal Article
    Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281-72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3-5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.






  • 文章类型: Journal Article
    We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673 K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manually annotate 22 anatomical structures in 5,246 CT volumes. Following this, a semi-automatic annotation procedure is performed for the remaining CT volumes, where radiologists revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from revised annotations. Such a large-scale, detailed-annotated, and multi-center dataset is needed for two reasons. Firstly, AbdomenAtlas provides important resources for AI development at scale, branded as large pre-trained models, which can alleviate the annotation workload of expert radiologists to transfer to broader clinical applications. Secondly, AbdomenAtlas establishes a large-scale benchmark for evaluating AI algorithms-the more data we use to test the algorithms, the better we can guarantee reliable performance in complex clinical scenarios. An ISBI & MICCAI challenge named BodyMaps: Towards 3D Atlas of Human Body was launched using a subset of our AbdomenAtlas, aiming to stimulate AI innovation and to benchmark segmentation accuracy, inference efficiency, and domain generalizability. We hope our AbdomenAtlas can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community. Codes, models, and datasets are available at






  • 文章类型: Journal Article
    UNASSIGNED: Analyzing the anatomy of the aorta and left ventricular outflow tract (LVOT) is crucial for risk assessment and planning of transcatheter aortic valve implantation (TAVI). A comprehensive analysis of the aortic root and LVOT requires the extraction of the patient-individual anatomy via segmentation. Deep learning has shown good performance on various segmentation tasks. If this is formulated as a supervised problem, large amounts of annotated data are required for training. Therefore, minimizing the annotation complexity is desirable.
    UNASSIGNED: We propose two-dimensional (2D) cross-sectional annotation and point cloud-based surface reconstruction to train a fully automatic 3D segmentation network for the aortic root and the LVOT. Our sparse annotation scheme enables easy and fast training data generation for tubular structures such as the aortic root. From the segmentation results, we derive clinically relevant parameters for TAVI planning.
    UNASSIGNED: The proposed 2D cross-sectional annotation results in high inter-observer agreement [Dice similarity coefficient (DSC): 0.94]. The segmentation model achieves a DSC of 0.90 and an average surface distance of 0.96 mm. Our approach achieves an aortic annulus maximum diameter difference between prediction and annotation of 0.45 mm (inter-observer variance: 0.25 mm).
    UNASSIGNED: The presented approach facilitates reproducible annotations. The annotations allow for training accurate segmentation models of the aortic root and LVOT. The segmentation results facilitate reproducible and quantifiable measurements for TAVI planning.






  • 文章类型: Journal Article
    Crop growth monitoring is essential for both crop and supply chain management. Conventional manual sampling is not feasible for assessing the spatial variability of crop growth within an entire field or across all fields. Meanwhile, UAV-based remote sensing enables the efficient and nondestructive investigation of crop growth. A variety of crop-specific training image datasets are needed to detect crops from UAV imagery using a deep learning model. Specifically, the training dataset of cabbage is limited. This data article includes annotated cabbage images in the fields to recognize cabbages using machine learning models. This dataset contains 458 images with 17,621 annotated cabbages. Image sizes are approximately 500 to 1000 pixel squares. Since these cabbage images were collected from different cultivars during the whole growing season over the years, deep learning models trained with this dataset will be able to recognize a wide variety of cabbage shapes. In the future, this dataset can be used not only in UAVs but also in land-based robot applications for crop sensing or associated plant-specific management.






  • 文章类型: Journal Article
    Comprehensive and accurate genome annotation is crucial for inferring the predicted functions of an organism. Numerous tools exist to annotate genes, gene clusters, mobile genetic elements, and other diverse features. However, these tools and pipelines can be difficult to install and run, be specialized for a particular element or feature, or lack annotations for larger elements that provide important genomic context. Integrating results across analyses is also important for understanding gene function. To address these challenges, we present the Beav annotation pipeline. Beav is a command-line tool that automates the annotation of bacterial genome sequences, mobile genetic elements, molecular systems and gene clusters, key regulatory features, and other elements. Beav uses existing tools in addition to custom models, scripts, and databases to annotate diverse elements, systems, and sequence features. Custom databases for plant-associated microbes are incorporated to improve annotation of key virulence and symbiosis genes in agriculturally important pathogens and mutualists. Beav includes an optional Agrobacterium-specific pipeline that identifies and classifies oncogenic plasmids and annotates plasmid-specific features. Following the completion of all analyses, annotations are consolidated to produce a single comprehensive output. Finally, Beav generates publication-quality genome and plasmid maps. Beav is on Bioconda and is available for download at
    OBJECTIVE: Annotation of genome features, such as the presence of genes and their predicted function, or larger loci encoding secretion systems or biosynthetic gene clusters, is necessary for understanding the functions encoded by an organism. Genomes can also host diverse mobile genetic elements, such as integrative and conjugative elements and/or phages, that are often not annotated by existing pipelines. These elements can horizontally mobilize genes encoding for virulence, antimicrobial resistance, or other adaptive functions and alter the phenotype of an organism. We developed a software pipeline, called Beav, that combines new and existing tools for the comprehensive annotation of these and other major features. Existing pipelines often misannotate loci important for virulence or mutualism in plant-associated bacteria. Beav includes custom databases and optional workflows for the improved annotation of plant-associated bacteria. Beav is designed to be easy to install and run, making comprehensive genome annotation broadly available to the research community.





