HDSNE 一种新的无监督多图像数据库融合学习算法，具有一个数据库的灵活和脆的生产：胸部 X射线图像中肺部感染诊断的证明案例研究。HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical image databases, which often suffer from two common drawbacks: image duplication and corruption. The many subsequent releases of the same data with the same classes or categories come with no clear evidence of success in the concatenation of those identical classes among image databases. This issue stands as a stumbling block in the path of hypothesis-based experiments for the production of a single learning model that can successfully classify all of them correctly. Removing redundant data, enhancing performance, and optimizing energy resources are among the most challenging aspects. In this article, we propose a global data aggregation scale model that incorporates six image databases selected from specific global resources. The proposed valid learner is based on training all the unique patterns within any given data release, thereby creating a unique dataset hypothetically. The Hash MD5 algorithm (MD5) generates a unique hash value for each image, making it suitable for duplication removal. The T-Distributed Stochastic Neighbor Embedding (t-SNE), with a tunable perplexity parameter, can represent data dimensions. Both the Hash MD5 and t-SNE algorithms are applied recursively, producing a balanced and uniform database containing equal samples per category: normal, pneumonia, and Coronavirus Disease of 2019 (COVID-19). We evaluated the performance of all proposed data and the new automated version using the Inception V3 pre-trained model with various evaluation metrics. The performance outcome of the proposed scale model showed more respectable results than traditional data aggregation, achieving a high accuracy of 98.48%, along with high precision, recall, and F1-score. The results have been proved through a statistical t-test, yielding t-values and p-values. It\'s important to emphasize that all t-values are undeniably significant, and the p-values provide irrefutable evidence against the null hypothesis. Furthermore, it\'s noteworthy that the Final dataset outperformed all other datasets across all metric values when diagnosing various lung infections with the same factors.

摘要：

具有完全或部分相同的内部类别的图像数据库的连续发布极大地恶化了用于真正全面的医疗诊断的自主计算机辅助诊断(CAD)系统的生产。第一个挑战是医学图像数据库的频繁大量发布，这通常有两个常见的缺点：图像复制和损坏。具有相同类别或类别的相同数据的许多后续版本没有明确的证据表明在图像数据库之间的这些相同类别的串联成功。这个问题是基于假设的实验路径上的绊脚石，用于产生可以成功地对所有这些模型进行正确分类的单一学习模型。删除冗余数据,提高性能，优化能源资源是最具挑战性的方面。在这篇文章中,我们提出了一个全球数据聚合规模模型，该模型包含从特定的全球资源中选择的六个图像数据库。建议的有效学习器基于训练任何给定数据发布中的所有独特模式，从而假设创建一个独特的数据集。HashMD5算法（MD5）为每个图像生成一个唯一的哈希值，使其适合重复删除。T分布随机邻域嵌入(t-SNE)，使用可调的困惑参数，可以表示数据维度。HashMD5和t-SNE算法都是递归应用的，生成一个平衡和统一的数据库，每个类别包含相等的样本：正常，肺炎,和2019年冠状病毒病（COVID-19）。我们使用InceptionV3预训练模型和各种评估指标评估了所有建议数据和新自动化版本的性能。所提出的规模模型的性能结果显示出比传统的数据聚合更可观的结果，达到98.48%的高精度，随着高精度，召回，和F1得分。结果已通过统计t检验证明，产生t值和p值。重要的是要强调，所有的t值都是不可否认的重要，p值提供了反对零假设的无可辩驳的证据。此外，值得注意的是，当使用相同的因素诊断各种肺部感染时，Final数据集优于所有度量值的所有其他数据集。