Multiple instance learning

多实例学习
  • 文章类型: Multicenter Study
    深度学习(DL)可以加速从结直肠癌(CRC)的常规病理切片中预测预后生物标志物。然而,当前的方法依赖于卷积神经网络(CNN),并且大部分已经在小型患者队列中得到了验证。这里,我们开发了一种新的基于变压器的管道,通过将预先训练的变压器编码器与用于补丁聚合的变压器网络相结合,用于从病理切片进行端到端生物标志物预测。我们基于变压器的方法大大提高了性能,概括性,数据效率,与当前最先进的算法相比,以及可解释性。在对来自16个大肠癌队列的13,000多名患者的大型多中心队列进行培训和评估后,我们对手术切除标本的微卫星不稳定性(MSI)预测的敏感性为0.99,阴性预测值超过0.99.我们证明,仅切除标本训练在内窥镜活检组织上达到临床级表现,解决了一个长期存在的诊断问题。
    Deep learning (DL) can accelerate the prediction of prognostic biomarkers from routine pathology slides in colorectal cancer (CRC). However, current approaches rely on convolutional neural networks (CNNs) and have mostly been validated on small patient cohorts. Here, we develop a new transformer-based pipeline for end-to-end biomarker prediction from pathology slides by combining a pre-trained transformer encoder with a transformer network for patch aggregation. Our transformer-based approach substantially improves the performance, generalizability, data efficiency, and interpretability as compared with current state-of-the-art algorithms. After training and evaluating on a large multicenter cohort of over 13,000 patients from 16 colorectal cancer cohorts, we achieve a sensitivity of 0.99 with a negative predictive value of over 0.99 for prediction of microsatellite instability (MSI) on surgical resection specimens. We demonstrate that resection specimen-only training reaches clinical-grade performance on endoscopic biopsy tissue, solving a long-standing diagnostic problem.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们设计了一个框架,用于基于最先进的二值化算法研究3至24个月的语言前儿童语音。我们的系统由一个时不变特征提取器组成,上下文相关的嵌入生成器,和分类器。我们研究交换系统不同组件的效果,以及改变损失函数,找到最好的表现。我们还提出了一种多实例学习技术,使我们能够在具有较粗段边界标签的较大数据集上预先训练我们的参数。我们发现,我们最好的系统在测试数据集上达到43.8%的DER,与LENA软件实现的55.4%DER相比。我们还发现,使用卷积特征提取器代替logmel特征显着提高了神经二值化的性能。
    We design a framework for studying prelinguistic child voice from 3 to 24 months based on state-of-the-art algorithms in diarization. Our system consists of a time-invariant feature extractor, a context-dependent embedding generator, and a classifier. We study the effect of swapping out different components of the system, as well as changing loss function, to find the best performance. We also present a multiple-instance learning technique that allows us to pre-train our parameters on larger datasets with coarser segment boundary labels. We found that our best system achieved 43.8% DER on test dataset, compared to 55.4% DER achieved by LENA software. We also found that using convolutional feature extractor instead of logmel features significantly increases the performance of neural diarization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    肿瘤纯度是组织切片中癌细胞的百分比。病理学家通过手动阅读苏木精-伊红(H&E)染色的载玻片来评估肿瘤纯度以选择用于基因组分析的样品,这很乏味,耗时,并且容易出现观察者之间的可变性。此外,病理学家的估计与基因组肿瘤纯度值没有很好的相关性,这些数据是从基因组数据推断出来的,被认为是下游分析的准确数据。我们开发了一种深度多实例学习模型,可从H&E染色的数字组织病理学幻灯片中预测肿瘤纯度。我们的模型成功地预测了八个癌症基因组图谱(TCGA)队列和一个新加坡本地队列中的肿瘤纯度。预测与基因组肿瘤纯度值高度一致。因此,我们的模型可以用来选择样本进行基因组分析,这将有助于减少病理学家的工作量和减少观察者之间的变异性。此外,我们的模型提供了显示切片内空间变化的肿瘤纯度图.它们可以帮助更好地了解肿瘤微环境。
    Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists\' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists\' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:为了降低胃癌(GC)的高发病率和高死亡率,我们的目标是开发基于深度学习的模型,以帮助使用病理图像预测GC患者的诊断和总生存期(OS)。
    方法:从两个队列中收集了1037例GC患者的2333张苏木精和伊红染色的病理照片,以开发我们的算法,武汉大学人民医院(RHWU)和癌症基因组图谱(TCGA)。此外,我们从国家人类遗传资源共享服务平台(NHGRP)获得了91名GC患者的175张数码照片,作为独立的外部验证集。使用人工智能(AI)开发了两个模型,一个名为GastroMIL的诊断GC,另一个名为MIL-GC,用于预测GC的结果。
    结果:GastroMIL的辨别能力在外部验证集中达到了0.920的准确度,优于初级病理学家,与专家病理学家相当。在预后模型中,内部和外部验证集生存预测的C指数分别为0.671和0.657。此外,在单变量(HR=2.414,P<0.0001)和多变量(HR=1.803,P=0.043)分析中,外部验证集中MIL-GC输出的风险评分被证明是OS的强预测因子.预测过程可在在线网站(https://baigao。github.io/病理预后分析/)。
    结论:我们的研究开发了AI模型,并有助于预测GC患者的精确诊断和预后。这将为选择合适的治疗方法提供帮助,以改善GC患者的生存状况。
    背景:不适用。
    BACKGROUND: To reduce the high incidence and mortality of gastric cancer (GC), we aimed to develop deep learning-based models to assist in predicting the diagnosis and overall survival (OS) of GC patients using pathological images.
    METHODS: 2333 hematoxylin and eosin-stained pathological pictures of 1037 GC patients were collected from two cohorts to develop our algorithms, Renmin Hospital of Wuhan University (RHWU) and the Cancer Genome Atlas (TCGA). Additionally, we gained 175 digital pictures of 91 GC patients from National Human Genetic Resources Sharing Service Platform (NHGRP), served as the independent external validation set. Two models were developed using artificial intelligence (AI), one named GastroMIL for diagnosing GC, and the other named MIL-GC for predicting outcome of GC.
    RESULTS: The discriminatory power of GastroMIL achieved accuracy 0.920 in the external validation set, superior to that of the junior pathologist and comparable to that of expert pathologists. In the prognostic model, C-indices for survival prediction of internal and external validation sets were 0.671 and 0.657, respectively. Moreover, the risk score output by MIL-GC in the external validation set was proved to be a strong predictor of OS both in the univariate (HR = 2.414, P < 0.0001) and multivariable (HR = 1.803, P = 0.043) analyses. The predicting process is available at an online website (https://baigao.github.io/Pathologic-Prognostic-Analysis/).
    CONCLUSIONS: Our study developed AI models and contributed to predicting precise diagnosis and prognosis of GC patients, which will offer assistance to choose appropriate treatment to improve the survival status of GC patients.
    BACKGROUND: Not applicable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    OBJECTIVE: Ultrasound imaging is routinely used in prostate biopsy, which involves obtaining prostate tissue samples using a systematic, yet, non-targeted approach. This approach is blinded to individual patient intraprostatic pathology, and unfortunately, has a high rate of false negatives.
    METHODS: In this paper, we propose a deep network for improved detection of prostate cancer in systematic biopsy. We address several challenges associated with training such network: (1) Statistical labels: Since biopsy core\'s pathology report only represents a statistical distribution of cancer within the core, we use multiple instance learning (MIL) networks to enable learning from ultrasound image regions associated with those data; (2) Limited labels: The number of biopsy cores are limited to at most 12 per patient. As a result, the number of samples available for training a deep network is limited. We alleviate this issue by effectively combining Independent Conditional Variational Auto Encoders (ICVAE) with MIL. We train ICVAE to learn label-invariant features of RF data, which is subsequently used to generate synthetic data for improved training of the MIL network.
    RESULTS: Our in vivo study includes data from 339 prostate biopsy cores of 70 patients. We achieve an area under the curve, sensitivity, specificity, and balanced accuracy of 0.68, 0.77, 0.55 and 0.66, respectively.
    CONCLUSIONS: The proposed approach is generic and can be applied to several other scenarios where unlabeled data and noisy labels in training samples are present.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Evaluation Study
    Supervised machine learning is a powerful tool frequently used in computer-aided diagnosis (CAD) applications. The bottleneck of this technique is its demand for fine grained expert annotations, which are tedious for medical image analysis applications. Furthermore, information is typically localized in diagnostic images, which makes representation of an entire image by a single feature set problematic. The multiple instance learning framework serves as a remedy to these two problems by allowing labels to be provided for groups of observations, called bags, and assuming the group label to be the maximum of the instance labels within the bag. This setup can effectively be applied to CAD by splitting a given diagnostic image into a Cartesian grid, treating each grid element (patch) as an instance by representing it with a feature set, and grouping instances belonging to the same image into a bag. We quantify the power of existing multiple instance learning methods by evaluating their performance on two distinct CAD applications: (i) Barrett\'s cancer diagnosis and (ii) diabetic retinopathy screening. In the experiments, mi-Graph appears as the best-performing method in bag-level prediction (i.e. diagnosis) for both of these applications that have drastically different visual characteristics. For instance-level prediction (i.e. disease localization), mi-SVM ranks as the most accurate method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号