Data imbalance

数据不平衡
  • 文章类型: Journal Article
    目的:本研究旨在解决使用心电图(ECG)进行不平衡心跳分类的挑战。在这个提出的新颖的深度学习方法中,重点是准确识别以ECG数据显着失衡为特征的少数群体。

方法:我们提出了一种通过动态少数群体偏置批量加权损失函数增强的特征融合神经网络。该网络包括三个专门的分支:完整的ECG数据分支,用于全面查看ECG信号,本地QRS波分支,用于QRS波群的详细特征,和R波信息分支分析R波特征。该结构被设计为提取ECG数据的不同方面。动态损失函数优先考虑少数类,同时保持对多数类的识别,在不改变原始数据分布的情况下调整网络的学习重点。一起,这种融合结构和自适应损失函数显著提高了网络区分各种心跳类别的能力,提高了少数民族阶级识别的准确性。

主要结果:所提出的方法在MIT-BIH数据集中展示了平衡的性能,尤其是少数民族。在患者内部范式下,准确性,灵敏度,特异性,室上性异位搏动的阳性预测值(PPV)为99.63%,93.62%,99.81%,92.98%,分别,融合节拍为99.76%,85.56%,99.87%,和84.16%,分别。在患者间范式下,这些指标是96.56%,89.16%,96.84%,室上性异位搏动为51.99%,和96.10%,77.06%,96.25%,和13.92%的融合节拍,分别。

意义:该方法有效地解决了ECG数据集中的类不平衡。通过利用不同的ECG信号信息和新颖的损失函数,这种方法为心脏疾病的诊断和治疗提供了有希望的工具. .
    OBJECTIVE: This study aims to address the challenges of imbalanced heartbeat classification using electrocardiogram (ECG). In this proposed novel deep-learning method, the focus is on accurately identifying minority classes in conditions characterized by significant imbalances in ECG data. Approach: We propose a Feature Fusion Neural Network enhanced by a Dynamic Minority-Biased Batch Weighting Loss Function. This network comprises three specialized branches: the Complete ECG Data Branch for a comprehensive view of ECG signals, the Local QRS Wave Branch for detailed features of the QRS complex, and the R Wave Information Branch to analyze R wave characteristics. This structure is designed to extract diverse aspects of ECG data. The dynamic loss function prioritizes minority classes while maintaining the recognition of majority classes, adjusting the network\'s learning focus without altering the original data distribution. Together, this fusion structure and adaptive loss function significantly improve the network\'s ability to distinguish between various heartbeat classes, enhancing the accuracy of minority class identification. Main Results: The proposed method demonstrated balanced performance within the MIT-BIH dataset, especially for minority classes. Under the intra-patient paradigm, the accuracy, sensitivity, specificity, and positive predictive value (PPV) for Supraventricular ectopic beat were 99.63%, 93.62%, 99.81%, and 92.98%, respectively, and for Fusion beat were 99.76%, 85.56%, 99.87%, and 84.16%, respectively. Under the inter-patient paradigm, these metrics were 96.56%, 89.16%, 96.84%, and 51.99% for Supraventricular ectopic beat, and 96.10%, 77.06%, 96.25%, and 13.92% for Fusion beat, respectively. Significance: This method effectively addresses the class imbalance in ECG datasets. By leveraging diverse ECG signal information and a novel loss function, this approach offers a promising tool for aiding in the diagnosis and treatment of cardiac conditions. .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确识别CRISPR/Cas9系统中潜在的脱靶位点对于提高编辑效率和安全性至关重要。然而,可用的目标外数据集的不平衡对提高预测性能构成了主要障碍。尽管已经开发了几种预测模型来解决这个问题,目前仍缺乏对脱靶预测中数据失衡处理的系统研究。本文系统地研究了非目标数据集中的数据不平衡问题,并从新的角度探索了处理数据不平衡的多种方法。首先,我们通过确定这些数据集中存在的失衡比率来强调失衡问题对脱靶预测任务的影响.然后,我们全面回顾了各种抽样技术和成本敏感方法,以减轻非目标数据集中的类别失衡.最后,系统的实验进行了几个国家的最先进的预测模型,以说明应用数据不平衡解决方案的影响。结果表明,类不平衡处理方法显著提高了模型跨多个测试数据集的脱靶预测能力。本研究中使用的代码和数据集可在https://github.com/gzrgzx/CRISPR_Data_Imbalance获得。
    Accurately identifying potential off-target sites in the CRISPR/Cas9 system is crucial for improving the efficiency and safety of editing. However, the imbalance of available off-target datasets has posed a major obstacle in enhancing prediction performance. Despite several prediction models have been developed to address this issue, there remains a lack of systematic research on handling data imbalance in off-target prediction. This article systematically investigates the data imbalance issue in off-target datasets and explores numerous methods to process data imbalance from a novel perspective. First, we highlight the impact of the imbalance problem on off-target prediction tasks by determining the imbalance ratios present in these datasets. Then, we provide a comprehensive review of various sampling techniques and cost-sensitive methods to mitigate class imbalance in off-target datasets. Finally, systematic experiments are conducted on several state-of-the-art prediction models to illustrate the impact of applying data imbalance solutions. The results show that class imbalance processing methods significantly improve the off-target prediction capabilities of the models across multiple testing datasets. The code and datasets used in this study are available at https://github.com/gzrgzx/CRISPR_Data_Imbalance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目标:增加一个小的,不平衡,伤口数据集,通过使用具有二级数据集的半监督学习。然后利用增强的伤口数据集进行基于深度学习的伤口评估。方法:经过临床验证的照相伤口评估工具(PWAT)对八个伤口属性进行评分:大小,深度,坏死组织类型,坏死组织数量,肉芽组织类型,肉芽组织量,边缘,溃疡周围皮肤活力全面评估慢性伤口图像。使用标记有地面实况PWAT分数的1639张伤口图像的小语料库作为参考。使用半监督学习和渐进式多粒度训练机制来利用9870个未标记的伤口图像的次要语料库。伤口评分在增强的伤口语料库上使用了EfficientNet卷积神经网络。结果:我们提出的半监督PMGEfficientNet(SS-PMG-EfficientNet)方法估计了所有8个PWAT子得分,分类准确性和F1得分平均约90%,并且优于基线模型的综合列表,并且比现有技术(没有数据增加)改善了7%.我们还证明,使用生成对抗网络(GAN)的合成伤口图像生成并不能改善伤口评估。结论:在二级数据集中对未标记的伤口图像进行半监督学习对于基于深度学习的伤口分级具有令人印象深刻的性能。
    Goal: Augment a small, imbalanced, wound dataset by using semi-supervised learning with a secondary dataset. Then utilize the augmented wound dataset for deep learning-based wound assessment. Methods: The clinically-validated Photographic Wound Assessment Tool (PWAT) scores eight wound attributes: Size, Depth, Necrotic Tissue Type, Necrotic Tissue Amount, Granulation Tissue type, Granulation Tissue Amount, Edges, Periulcer Skin Viability to comprehensively assess chronic wound images. A small corpus of 1639 wound images labeled with ground truth PWAT scores was used as reference. A Semi-Supervised learning and Progressive Multi-Granularity training mechanism were used to leverage a secondary corpus of 9870 unlabeled wound images. Wound scoring utilized the EfficientNet Convolutional Neural Network on the augmented wound corpus. Results: Our proposed Semi-Supervised PMG EfficientNet (SS-PMG-EfficientNet) approach estimated all 8 PWAT sub-scores with classification accuracies and F1 scores of about 90% on average, and outperformed a comprehensive list of baseline models and had a 7% improvement over the prior state-of-the-art (without data augmentation). We also demonstrate that synthetic wound image generation using Generative Adversarial Networks (GANs) did not improve wound assessment. Conclusions: Semi-supervised learning on unlabeled wound images in a secondary dataset achieved impressive performance for deep learning-based wound grading.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管可电离脂质纳米颗粒(LNPs)在信使RNA(mRNA)递送的临床应用中广泛使用,mRNA药物递送系统在LNP的筛选中面临着有效的挑战。传统的筛选方法通常需要大量的实验时间并且招致高的研发成本。为了加快LNP的早期开发阶段,我们提出了TransLNP,基于变压器的转染预测模型,旨在帮助选择mRNA药物递送系统的LNP。TransLNP使用两种类型的分子信息来感知结构与转染效率之间的关系:粗粒度的原子序列信息和细粒度的原子空间关系信息。由于现有LNP实验数据的稀缺性,我们发现预训练分子模型对于更好地理解预测LNP属性的任务至关重要,这是通过重建原子3D坐标和掩蔽原子预测来实现的。此外,数据失衡问题在现实世界的LNP探索中尤为突出。我们引入BalMol块通过平滑标记和分子特征的分布来解决这个问题。在随机和支架数据分割下,我们的方法在转染特性预测方面均优于最新技术。此外,我们建立了分子结构相似性和转染差异之间的关系,选择4267对分子转染悬崖,它们是具有高度结构相似性但转染效率显着差异的分子对,从而揭示了预测误差的主要来源。代码,模型和数据可在https://github.com/wklix/TransLNP上公开获得。
    Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:DNA结合蛋白(DNA-BP)是与DNA结合并相互作用的蛋白质。DNA-BPs调节和影响许多生物过程,例如,转录和DNA复制,修复,和染色体DNA的组织。很少有蛋白质,然而,本质上是DNA结合。因此,有必要开发一种有效的预测因子来识别DNA-BPs。
    结果:在这项工作中,我们为DNA结合蛋白预测问题提出了新的基准数据集。我们发现了广泛使用的基准数据集的几个质量问题,PDB1075(用于培训)和PDB186(用于独立测试),这就需要准备新的基准数据集。我们提出的数据集UNIPROT1424和UNIPROT356可分别用于模型训练和独立测试。我们已经在新数据集中重新训练了选定的最先进的DNA-BP预测因子,并报告了它们的性能结果。我们还使用新的基准数据集训练了一个新的预测器。我们从各种特征类别中提取特征,然后使用随机森林分类器和带有交叉验证的递归特征消除(RFECV)来选择452个特征的最佳集合。然后,我们提出了一个堆叠集成架构作为我们的最终预测模型。DNA结合蛋白预测的命名堆叠集成模型,或者简称StackDPP,我们的模型在10倍交叉验证中获得了0.92、0.92和0.93的准确性,分别进行刀刀和独立测试。
    结论:StackDPP在交叉验证测试中表现非常出色,并且在独立测试中优于所有最新的预测模型。它在交叉验证测试中的性能得分在独立测试集中非常好。该模型的源代码可在https://github.com/HasibAhmed1624/StackDPP上公开获得。因此,我们希望研究人员和从业人员可以采用这种广义模型来鉴定新型的DNA结合蛋白。
    BACKGROUND: DNA-binding proteins (DNA-BPs) are the proteins that bind and interact with DNA. DNA-BPs regulate and affect numerous biological processes, such as, transcription and DNA replication, repair, and organization of the chromosomal DNA. Very few proteins, however, are DNA-binding in nature. Therefore, it is necessary to develop an efficient predictor for identifying DNA-BPs.
    RESULTS: In this work, we have proposed new benchmark datasets for the DNA-binding protein prediction problem. We discovered several quality concerns with the widely used benchmark datasets, PDB1075 (for training) and PDB186 (for independent testing), which necessitated the preparation of new benchmark datasets. Our proposed datasets UNIPROT1424 and UNIPROT356 can be used for model training and independent testing respectively. We have retrained selected state-of-the-art DNA-BP predictors in the new dataset and reported their performance results. We also trained a novel predictor using the new benchmark dataset. We extracted features from various feature categories, then used a Random Forest classifier and Recursive Feature Elimination with Cross-validation (RFECV) to select the optimal set of 452 features. We then proposed a stacking ensemble architecture as our final prediction model. Named Stacking Ensemble Model for DNA-binding Protein Prediction, or StackDPP in short, our model achieved 0.92, 0.92 and 0.93 accuracy in 10-fold cross-validation, jackknife and independent testing respectively.
    CONCLUSIONS: StackDPP has performed very well in cross-validation testing and has outperformed all the state-of-the-art prediction models in independent testing. Its performance scores in cross-validation testing generalized very well in the independent test set. The source code of the model is publicly available at https://github.com/HasibAhmed1624/StackDPP . Therefore, we expect this generalized model can be adopted by researchers and practitioners to identify novel DNA-binding proteins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    深度学习技术最近在各个领域取得了显著成果。然而,这些结果的质量在很大程度上取决于训练阶段使用的数据的质量和数量。多类别和多标签分类中的一个常见问题是类别不平衡,其中一个或几个类构成了总实例的很大一部分。这种不平衡导致神经网络在训练期间优先考虑大多数类的特征,因为他们的检测会导致更高的分数。在对象检测的背景下,可以识别两种类型的不平衡:(1)前景和背景占用的空间之间的不平衡,以及(2)每个类的实例数量的不平衡。本文旨在解决第二类失衡,而不会加剧第一类失衡。为了实现这一点,我们提出了对复制粘贴数据增强技术的修改,结合损失函数中的重量平衡方法。此策略是专门为提高具有高实例密度的数据集的性能而定制的,其中实例重叠可能是有害的。为了验证我们的方法,我们将其应用于高度不平衡的数据集,重点是原子核检测。结果表明,这种混合方法改善了少数类的分类,而不会显着损害多数类的性能。
    Deep learning techniques have recently yielded remarkable results across various fields. However, the quality of these results depends heavily on the quality and quantity of data used during the training phase. One common issue in multi-class and multi-label classification is class imbalance, where one or several classes make up a substantial portion of the total instances. This imbalance causes the neural network to prioritize features of the majority classes during training, as their detection leads to higher scores. In the context of object detection, two types of imbalance can be identified: (1) an imbalance between the space occupied by the foreground and background and (2) an imbalance in the number of instances for each class. This paper aims to address the second type of imbalance without exacerbating the first. To achieve this, we propose a modification of the copy-paste data augmentation technique, combined with weight-balancing methods in the loss function. This strategy was specifically tailored to improve the performance in datasets with a high instance density, where instance overlap could be detrimental. To validate our methodology, we applied it to a highly unbalanced dataset focused on nuclei detection. The results show that this hybrid approach improves the classification of minority classes without significantly compromising the performance of majority classes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:在过去的十年中,长尾学习已成为深度学习在医学中应用的热门研究热点。然而,没有科学计量学报告对这一科学领域提供了系统的概述。我们利用文献计量技术来识别和分析长尾学习在医学深度学习应用中的文献,并调查研究趋势。核心作者,和核心期刊。我们扩展了对医学领域长尾学习研究的主要组成部分和主要方法的理解。
    方法:WebofScience被用来收集直到2023年12月出版的所有关于医学长尾学习的文章。评估了所有检索到的标题和摘要的适用性。对于文献计量分析,提取了所有数值数据。CiteSpace用于基于关键字创建集群和视觉知识图。
    结果:共579篇文章符合评价标准。在过去的十年里,年度出版物数量和引用频率均显示出显着增长,遵循幂律和指数趋势,分别。这一领域值得注意的贡献者包括HusanbirSinghPannu,FadiThabtah,还有TalhaMahboobAlam,在IEEEACCESS等领先期刊上,生物学和医学计算机,IEEE医学成像事务,计算机医学图像和图形已成为传播该领域研究的关键平台。医学领域长尾学习研究的核心包含六个主要主题:不平衡数据的深度学习,模型优化,图像分析中的神经网络,健康记录中的数据不平衡,CNN在诊断和风险评估中,和疾病机制中的遗传信息。
    结论:本研究通过文献计量分析和可视化知识图总结了将长尾学习应用于医学深度学习的最新进展。它解释了新趋势,来源,核心作者,期刊,和研究热点。尽管这一领域在医学深度学习研究中显示出巨大的前景,我们的研究结果将为未来的研究和临床实践提供有价值的见解.
    BACKGROUND: In the last decade, long-tail learning has become a popular research focus in deep learning applications in medicine. However, no scientometric reports have provided a systematic overview of this scientific field. We utilized bibliometric techniques to identify and analyze the literature on long-tailed learning in deep learning applications in medicine and investigate research trends, core authors, and core journals. We expanded our understanding of the primary components and principal methodologies of long-tail learning research in the medical field.
    METHODS: Web of Science was utilized to collect all articles on long-tailed learning in medicine published until December 2023. The suitability of all retrieved titles and abstracts was evaluated. For bibliometric analysis, all numerical data were extracted. CiteSpace was used to create clustered and visual knowledge graphs based on keywords.
    RESULTS: A total of 579 articles met the evaluation criteria. Over the last decade, the annual number of publications and citation frequency both showed significant growth, following a power-law and exponential trend, respectively. Noteworthy contributors to this field include Husanbir Singh Pannu, Fadi Thabtah, and Talha Mahboob Alam, while leading journals such as IEEE ACCESS, COMPUTERS IN BIOLOGY AND MEDICINE, IEEE TRANSACTIONS ON MEDICAL IMAGING, and COMPUTERIZED MEDICAL IMAGING AND GRAPHICS have emerged as pivotal platforms for disseminating research in this area. The core of long-tailed learning research within the medical domain is encapsulated in six principal themes: deep learning for imbalanced data, model optimization, neural networks in image analysis, data imbalance in health records, CNN in diagnostics and risk assessment, and genetic information in disease mechanisms.
    CONCLUSIONS: This study summarizes recent advancements in applying long-tail learning to deep learning in medicine through bibliometric analysis and visual knowledge graphs. It explains new trends, sources, core authors, journals, and research hotspots. Although this field has shown great promise in medical deep learning research, our findings will provide pertinent and valuable insights for future research and clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:医学图像配准在几种应用中起着重要作用。由于数据不平衡问题,使用无监督学习的现有方法遇到了问题,因为它们的目标通常是连续变量。
    目的:在本研究中,我们介绍了一种称为无监督不平衡配准的新方法,解决数据不平衡的挑战,防止过度自信,同时提高4D图像配准的准确性和稳定性。
    方法:我们的方法涉及执行无监督图像混合以平滑输入空间,然后是无监督的图像配准,以学习连续目标。我们使用两种广泛使用的无监督方法评估了我们在4D-Lung上的方法,即VoxelMorph和ViT-V-Net。
    结果:我们的发现表明,我们提出的方法在小数据集上显着将配准的平均精度提高了3%-10%,同时还将精度方差降低了10%。
    结论:无监督不平衡配准是一种有前途的方法,与当前应用于4D图像的无监督图像配准方法兼容。
    BACKGROUND: Medical image registration plays an important role in several applications. Existing approaches using unsupervised learning encounter issues due to the data imbalance problem, as their target is usually a continuous variable.
    OBJECTIVE: In this study, we introduce a novel approach known as Unsupervised Imbalanced Registration, to address the challenge of data imbalance and prevent overconfidence while increasing the accuracy and stability of 4D image registration.
    METHODS: Our approach involves performing unsupervised image mixtures to smooth the input space, followed by unsupervised image registration to learn the continual target. We evaluated our method on 4D-Lung using two widely used unsupervised methods, namely VoxelMorph and ViT-V-Net.
    RESULTS: Our findings demonstrate that our proposed method significantly enhances the mean accuracy of registration by 3%-10% on a small dataset while also reducing the accuracy variance by 10%.
    CONCLUSIONS: Unsupervised Imbalanced Registration is a promising approach that is compatible with current unsupervised image registration methods applied to 4D images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:在COVID-19大流行之后,有限的精神卫生保健资源与快速增长的患者数量之间的冲突变得更加明显。心理学家有必要借用基于人工智能(AI)的方法来分析接受精神疾病治疗的患者对药物治疗的满意度。
    目的:我们的目标是通过分析精神疾病患者对药物摄入的经验和评论,构建高度准确和可转移的模型来预测他们对药物的满意度。
    方法:我们从16,950个疾病类别的161,297条评论的大型公开数据集中,提取了与精神疾病相关的20种疾病类别的41,851条评论。为了发现自然语言处理模型的更优化结构,我们提出了统一的可互换模型融合来分解来自变压器(BERT)的最先进的双向编码器表示,支持向量机,和随机森林(RF)模型分为2个模块,编码器和分类器,然后重建融合的“编码器+分类器”模型,以准确评估患者的满意度。根据模型结构,融合模型分为两类,传统的基于机器学习的模型和基于神经网络的模型。针对这些基于神经网络的模型,提出了一种新的损失函数,以克服过拟合和数据不平衡的问题。最后,我们对融合模型进行了微调,并根据F1得分对其性能进行了全面评估,准确度,κ系数,和使用10倍交叉验证的训练时间。
    结果:通过广泛的实验,变压器双向编码器+RF模型优于最先进的BERT,MentalBERT,和其他融合模型。它成为预测患者对药物治疗满意度的最佳模型。它的平均F1评分为0.872,准确率为0.873,κ系数为0.806。该模型适用于拥有充足计算资源的高标准用户。或者,事实证明,单词嵌入编码器RF模型显示出相对较好的性能,平均F1评分为0.801,精度为0.812,κ系数为0.695,但训练时间要少得多。它可以部署在计算资源有限的环境中。
    结论:我们分析了支持向量机的性能,射频,BERT,MentalBERT,和所有融合模型,并确定了不同临床场景的最佳模型。这些发现可以作为证据,支持自然语言处理方法可以有效地帮助心理学家评估患者对药物治疗计划的满意度,并提供精确和标准化的解决方案。统一的可互换模型融合为构建心理健康的AI模型提供了不同的视角,并有可能将模型的不同组件的优势融合到单个模型中,这可能有助于AI在心理健康方面的发展。
    BACKGROUND: After the COVID-19 pandemic, the conflict between limited mental health care resources and the rapidly growing number of patients has become more pronounced. It is necessary for psychologists to borrow artificial intelligence (AI)-based methods to analyze patients\' satisfaction with drug treatment for those undergoing mental illness treatment.
    OBJECTIVE: Our goal was to construct highly accurate and transferable models for predicting the satisfaction of patients with mental illness with medication by analyzing their own experiences and comments related to medication intake.
    METHODS: We extracted 41,851 reviews in 20 categories of disorders related to mental illnesses from a large public data set of 161,297 reviews in 16,950 illness categories. To discover a more optimal structure of the natural language processing models, we proposed the Unified Interchangeable Model Fusion to decompose the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT), support vector machine, and random forest (RF) models into 2 modules, the encoder and the classifier, and then reconstruct fused \"encoder+classifer\" models to accurately evaluate patients\' satisfaction. The fused models were divided into 2 categories in terms of model structures, traditional machine learning-based models and neural network-based models. A new loss function was proposed for those neural network-based models to overcome overfitting and data imbalance. Finally, we fine-tuned the fused models and evaluated their performance comprehensively in terms of F1-score, accuracy, κ coefficient, and training time using 10-fold cross-validation.
    RESULTS: Through extensive experiments, the transformer bidirectional encoder+RF model outperformed the state-of-the-art BERT, MentalBERT, and other fused models. It became the optimal model for predicting the patients\' satisfaction with drug treatment. It achieved an average graded F1-score of 0.872, an accuracy of 0.873, and a κ coefficient of 0.806. This model is suitable for high-standard users with sufficient computing resources. Alternatively, it turned out that the word-embedding encoder+RF model showed relatively good performance with an average graded F1-score of 0.801, an accuracy of 0.812, and a κ coefficient of 0.695 but with much less training time. It can be deployed in environments with limited computing resources.
    CONCLUSIONS: We analyzed the performance of support vector machine, RF, BERT, MentalBERT, and all fused models and identified the optimal models for different clinical scenarios. The findings can serve as evidence to support that the natural language processing methods can effectively assist psychologists in evaluating the satisfaction of patients with drug treatment programs and provide precise and standardized solutions. The Unified Interchangeable Model Fusion provides a different perspective on building AI models in mental health and has the potential to fuse the strengths of different components of the models into a single model, which may contribute to the development of AI in mental health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为了解决异常心电图(ECG)数据库的稀缺和类别不平衡,这在人工智能驱动的诊断工具中至关重要,用于潜在的心血管疾病检测,这项研究提出了一种新颖的量子条件生成对抗算法(QCGAN-ECG),用于生成异常的ECG信号。QCGAN-ECG构建了基于补丁方法的量子发生器。在这种方法中,每个子生成器生成不同段中异常心跳的不同特征。这种基于补丁的生成算法节省了量子资源,并使QCGAN-ECG适用于近期量子设备。此外,QCGAN-ECG引入量子寄存器作为控制条件。它将有关异常心跳的类型和概率分布的信息编码到量子寄存器中,渲染整个生成过程可控。Pennylane的模拟实验表明,QCGAN-ECG可以产生完全异常的心跳,平均准确率为88.8%。此外,QCGAN-ECG可以准确拟合各种异常ECG数据的概率分布。在抗噪声实验中,QCGAN-ECG在各种级别的量子噪声干扰中展示了出色的鲁棒性。这些结果证明了QCGAN-ECG产生异常ECG信号的有效性和潜在适用性,这将进一步促进人工智能驱动的心脏病诊断系统的发展。源代码可在github.com/VanSWK/QCGAN_ECG获得。
    To address the scarcity and class imbalance of abnormal electrocardiogram (ECG) databases, which are crucial in AI-driven diagnostic tools for potential cardiovascular disease detection, this study proposes a novel quantum conditional generative adversarial algorithm (QCGAN-ECG) for generating abnormal ECG signals. The QCGAN-ECG constructs a quantum generator based on patch method. In this method, each sub-generator generates distinct features of abnormal heartbeats in different segments. This patch-based generative algorithm conserves quantum resources and makes QCGAN-ECG practical for near-term quantum devices. Additionally, QCGAN-ECG introduces quantum registers as control conditions. It encodes information about the types and probability distributions of abnormal heartbeats into quantum registers, rendering the entire generative process controllable. Simulation experiments on Pennylane demonstrated that the QCGAN-ECG could generate completely abnormal heartbeats with an average accuracy of 88.8%. Moreover, the QCGAN-ECG can accurately fit the probability distribution of various abnormal ECG data. In the anti-noise experiments, the QCGAN-ECG showcased outstanding robustness across various levels of quantum noise interference. These results demonstrate the effectiveness and potential applicability of the QCGAN-ECG for generating abnormal ECG signals, which will further promote the development of AI-driven cardiac disease diagnosis systems. The source code is available at github.com/VanSWK/QCGAN_ECG.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号