Support Vector Machines

支持向量机
  • 文章类型: Journal Article
    目的:这项研究的目的是探索机器学习和全同态加密(FHE)在私有病理评估中的适用性,重点介绍了支持向量机(SVM)的推理阶段,用于对机密医疗数据进行分类。
    方法:引入了一个框架,该框架利用了Cheon-Kim-Kim-Song(CKKS)FHE方案,促进在加密数据集上执行SVM推理。该框架确保了患者数据的隐私,并在分析过程中消除了解密的必要性。此外,提出了一种有效的特征提取技术,用于将医学图像转换为矢量表示。
    结果:系统对各种数据集的评估证实了其实用性和有效性。该方法的分类精度和性能与传统方法相当,非加密SVM推理,同时针对针对CKKS方案的既定密码攻击保持128位安全级别。安全推断过程在仅仅几秒的时间跨度内执行。
    结论:这项研究的结果强调了FHE在提高生物信息学分析的安全性和效率方面的可行性,潜在受益的领域,如心脏病学,肿瘤学,和医学图像。这项研究的意义对隐私保护机器学习的未来具有重要意义,促进诊断程序的进步,量身定制的医疗方法,和临床调查。
    OBJECTIVE: The objective of this research is to explore the applicability of machine learning and fully homomorphic encryption (FHE) in the private pathological assessment, with a focus on the inference phase of support vector machines (SVM) for the classification of confidential medical data.
    METHODS: A framework is introduced that utilizes the Cheon-Kim-Kim-Song (CKKS) FHE scheme, facilitating the execution of SVM inference on encrypted datasets. This framework ensures the privacy of patient data and negates the necessity of decryption during the analytical process. Additionally, an efficient feature extraction technique is presented for the transformation of medical imagery into vectorial representations.
    RESULTS: The system\'s evaluation across various datasets substantiates its practicality and efficacy. The proposed method delivers classification accuracy and performance on par with traditional, non-encrypted SVM inference, while upholding a 128-bit security level against established cryptographic attacks targeting the CKKS scheme. The secure inference process is executed within a temporal span of mere seconds.
    CONCLUSIONS: The findings of this study underscore the viability of FHE in enhancing the security and efficiency of bioinformatics analyses, potentially benefiting fields such as cardiology, oncology, and medical imagery. The implications of this research are significant for the future of privacy-preserving machine learning, promoting progress in diagnostic procedures, tailored medical treatments, and clinical investigations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:抑郁症在中年和老年吸烟者中非常普遍。因此,我们的目的是确定经常和不经常使用尼古丁的中老年人患抑郁症的风险,因为这对于支持他们的福祉是非常必要的。
    方法:本研究共纳入10,821名参与者,这些数据来自中国健康与退休纵向研究第5波,2020(CHARLS-5)。采用了五种机器学习(ML)算法。一些指标被用来评估模型的性能,包括接受者工作特性曲线下面积(AUC),阳性预测值(PPV),特异性,准确性。
    结果:10,821名参与者(6472名男性,4349名女性)的平均年龄为60.47±8.98,抑郁量表得分为8.90±6.53。对于经常使用尼古丁的中老年人,随机森林(RF)获得了最高的AUC值,PPV和特异性(分别为0.75、0.74和0.88)。对于另一组,支持向量机(SVM)表现出最高的PPV(0.74),和相对较高的准确性和特异性(分别为0.72和0.87)。特征重要性分析表明,“对生活的不满”是SVM模型中识别抑郁风险的最重要变量。而“对预期寿命的态度”是射频模型中最重要的一个。
    结论:CHARLS-5是在COVID-19期间收集的,因此我们的结果可能受到大流行的影响。
    结论:这项研究表明,某些ML模型可以理想地识别中老年人患抑郁症的风险,这对他们的健康管理具有重要价值。
    BACKGROUND: Depression is very prevalent in middle-aged and older smokers. Therefore, we aimed to identify the risk of depression among middle-aged and older adults with frequent and infrequent nicotine use, as this is quite necessary for supporting their well-being.
    METHODS: This study included a total of 10,821 participants, which were derived from the China Health and Retirement Longitudinal Study Wave 5, 2020 (CHARLS-5). Five machine learning (ML) algorithms were employed. Some metrics were used to evaluate the performance of models, including area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), specificity, accuracy.
    RESULTS: 10,821 participants (6472 males, 4349 females) had a mean age of 60.47 ± 8.98, with a score of 8.90 ± 6.53 on depression scale. For middle-aged and older adults with frequent nicotine use, random forest (RF) achieved the highest AUC value, PPV and specificity (0.75, 0.74 and 0.88, respectively). For the other group, support vector machines (SVM) showed the highest PPV (0.74), and relatively high accuracy and specificity (0.72 and 0.87, respectively). Feature importance analysis indicated that \"dissatisfaction with life\" was the most important variable of identifying the risk of depression in the SVM model, while \"attitude towards expected life span\" was the most important one in the RF model.
    CONCLUSIONS: CHARLS-5 was collected during the COVID-19, so our results may be influenced by the pandemic.
    CONCLUSIONS: This study indicated that certain ML models can ideally identify the risk of depression in middle-aged and older adults, which holds significant value for their health management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    该研究的目的是开发一种计算模型,通过该模型可以对奶牛养殖场中乳腺炎的患病率进行预测。模型构建的数据来自对111个农场进行的大规模希腊范围的实地研究。无监督学习方法被应用于基于18个变量(17个独立变量与农场应用的健康管理实践相关,农场所在地的气候数据,以亚临床型乳腺炎的患病率水平为目标值)。K-means工具显示,对于构建计算模型,将农场分为两个集群的意义最高:农场中的亚临床乳腺炎患病率(四分位数间距)为20.0%(四分位数间距:15.8%)和30.0%(16.0%)(p=0.002)。随后使用监督学习工具来预测感染的流行程度:决策树,k-NN,神经网络,和支持向量机。对于这些中的每一个,采用了超参数组合;产生了83个模型,共进行4150次评估。通过支持向量机获得的计算模型(内核:\'线性\',选择正则化参数C=3)。此后,该模型是通过373个与模型选择无关的羊群记录中的亚临床乳腺炎患病率的结果进行评估的;该模型用于评估373个集合中每一组数据的正确分类,每一个都包括一个测试(预测)子集,其中一个记录涉及被评估的农场。在这两个类别中,模型分类的农场中感染的中位数患病率分别为10.4%(5.5%)和36.3%(9.7%)(p<0.0001)。该模型对K-means工具提供的结果的总体准确性为94.1%;对于农场的患病率水平(<25.0%/≥25.0%)的估计,是96.3%。这项研究的结果表明,机器学习算法可以有效地用于预测奶牛养殖场的亚临床乳腺炎水平。这可以促进为农场的干预措施建立适当的健康管理措施。
    The objective of the study was to develop a computational model with which predictions regarding the level of prevalence of mastitis in dairy sheep farms could be performed. Data for the construction of the model were obtained from a large Greece-wide field study with 111 farms. Unsupervised learning methodology was applied for clustering data into two clusters based on 18 variables (17 independent variables related to health management practices applied in farms, climatological data at the locations of the farms, and the level of prevalence of subclinical mastitis as the target value). The K-means tool showed the highest significance for the classification of farms into two clusters for the construction of the computational model: median (interquartile range) prevalence of subclinical mastitis among farms was 20.0% (interquartile range: 15.8%) and 30.0% (16.0%) (p = 0.002). Supervised learning tools were subsequently used to predict the level of prevalence of the infection: decision trees, k-NN, neural networks, and Support vector machines. For each of these, combinations of hyperparameters were employed; 83 models were produced, and 4150 assessments were made in total. A computational model obtained by means of Support vector machines (kernel: \'linear\', regularization parameter C = 3) was selected. Thereafter, the model was assessed through the results of the prevalence of subclinical mastitis in 373 records from sheep flocks unrelated to the ones employed for the selection of the model; the model was used for evaluation of the correct classification of the data in each of 373 sets, each of which included a test (prediction) subset with one record that referred to the farm under assessment. The median prevalence of the infection in farms classified by the model in each of the two categories was 10.4% (5.5%) and 36.3% (9.7%) (p < 0.0001). The overall accuracy of the model for the results presented by the K-means tool was 94.1%; for the estimation of the level of prevalence (<25.0%/≥25.0%) in the farms, it was 96.3%. The findings of this study indicate that machine learning algorithms can be usefully employed in predicting the level of subclinical mastitis in dairy sheep farms. This can facilitate setting up appropriate health management measures for interventions in the farms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管越来越多的证据突出了运动技巧的个性,运动学习的主要模型,特别是在收购阶段,继续强调泛化,独立于人的方法。生物力学研究,再加上机器学习方法,已经证明了个人表现出的运动技术的独特性。然而,这些证据主要与已经稳定的运动技术有关,在周期性的日常活动中尤其明显,例如步行,跑步,或者骑自行车,以及专家级的运动。这项研究旨在评估全身运动中的个性假设,这需要在习得阶段开始时新手参与者之间进行复杂的协调和力量。
    在主题内设计中,16名高度活跃的男性参与者(平均年龄:23.1±2.1岁),学习任务中的所有绝对新手(即,奥运会举重的力量抓举),参加了随机抓举学习比赛。这些回合包括跨各种运动学习模型的36项试验:差异学习情境干扰(serial,sCIL;和阻塞,bCIL),重复学习。从每个运动学习模型回合后进行的三个标准化抓举试验中收集了运动学和动力学数据。将时间连续数据输入到线性支持向量机(SVM)。我们对两个分类任务进行了分析:参与者和运动学习模型。
    与运动学习模型分类相比,支持向量机分类显示出明显优于参与者的分类,平均预测准确率为78%(在跨褶皱的45个测试试验中平均约35个)和27.3%(在跨褶皱的36个测试试验中平均约9个)。在特定的折叠和输入组合中,准确率分别为91%和38%。
    方法,在未来研究的背景下,讨论了选择合适的数据预处理方法和确定SVM数据输入的最佳组合的关键作用。我们的发现为在奥运会举重力量抓举的早期阶段,在运动技术中运动学习模型的个性优势提供了初步支持。
    UNASSIGNED: Despite the growing body of evidence highlighting the individuality in movement techniques, predominant models of motor learning, particularly during the acquisition phase, continue to emphasise generalised, person-independent approaches. Biomechanical studies, coupled with machine learning approaches, have demonstrated the uniqueness of movement techniques exhibited by individuals. However, this evidence predominantly pertains to already stabilised movement techniques, particularly evident in cyclic daily activities such as walking, running, or cycling, as well as in expert-level sports movements. This study aims to evaluate the hypothesis of individuality in whole-body movements necessitating intricate coordination and strength among novice participants at the very beginning of an acquisition phase.
    UNASSIGNED: In a within-subject design, sixteen highly active male participants (mean age: 23.1 ± 2.1 years), all absolute novices in the learning task (i.e., power snatch of Olympic weightlifting), participated in randomised snatch learning bouts. These bouts comprised 36 trials across various motor learning models: differential learning contextual interference (serial, sCIL; and blocked, bCIL), and repetitive learning. Kinematic and kinetic data were collected from three standardised snatch trials performed following each motor learning model bout. The time-continuous data were input to a linear Support Vector Machine (SVM). We conducted analyses on two classification tasks: participant and motor learning model.
    UNASSIGNED: The Support Vector Machine classification revealed a notably superior participant classification compared to the motor learning model classification, with an averaged prediction accuracy of 78% (in average ≈35 out of 45 test trials across the folds) versus 27.3% (in average ≈9 out of 36 test trials across the folds). In specific fold and input combinations, accuracies of 91% versus 38% were respectively achieved.
    UNASSIGNED: Methodically, the crucial role of selecting appropriate data pre-processing methods and identifying the optimal combinations of SVM data inputs is discussed in the context of future research. Our findings provide initial support for a dominance of individuality over motor learning models in movement techniques during the early phase of acquisition in Olympic weightlifting power snatch.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    震颤,定义为“非自愿,有节奏的,身体部位的振荡运动“,是许多神经系统疾病的关键特征,包括帕金森病和特发性震颤。临床评估继续通过视觉观察进行,并在临床量表上进行量化。客观量化震颤的方法很有希望,但在各个中心仍未标准化。我们的中心进行全身行为测试与3D运动捕捉为临床和研究目的帕金森病患者,特发性震颤,和其他条件。这项研究的目的是评估几种候选处理管道在确认运动障碍患者的运动学数据中识别是否存在震颤的能力,并将其与运动障碍专家的专家评级进行比较。我们从我们中心收集了2272个独立的运动学数据记录的数据库,运动医生同时将其注释为存在或不存在的震颤。我们比较了六个独立的处理管道根据F1评分重新创建临床医生评级的能力,除了准确性,精度,和回忆。跨算法的性能通常是可比的。平均F1评分为0.84±0.02(平均值±SD;范围0.81-0.87)。第二性能最高的算法(交叉验证的F1=0.87)是混合的,其使用从具有现代支持向量机分类器的长期临床使用的算法改编的工程特征。一起来看,我们的研究结果表明,有可能更新传统的临床决策支持系统,以整合现代机器学习分类器,从而创建性能更好的工具.
    Tremor, defined as an \"involuntary, rhythmic, oscillatory movement of a body part\", is a key feature of many neurological conditions including Parkinson\'s disease and essential tremor. Clinical assessment continues to be performed by visual observation with quantification on clinical scales. Methodologies for objectively quantifying tremor are promising but remain non-standardized across centers. Our center performs full-body behavioral testing with 3D motion capture for clinical and research purposes in patients with Parkinson\'s disease, essential tremor, and other conditions. The objective of this study was to assess the ability of several candidate processing pipelines to identify the presence or absence of tremor in kinematic data from patients with confirmed movement disorders and compare them to expert ratings from movement disorders specialists. We curated a database of 2272 separate kinematic data recordings from our center, each of which was contemporaneously annotated as tremor present or absent by a movement physician. We compared the ability of six separate processing pipelines to recreate clinician ratings based on F1 score, in addition to accuracy, precision, and recall. The performance across algorithms was generally comparable. The average F1 score was 0.84±0.02 (mean ± SD; range 0.81-0.87). The second highest performing algorithm (cross-validated F1=0.87) was a hybrid that used engineered features adapted from an algorithm in longstanding clinical use with a modern Support Vector Machine classifier. Taken together, our results suggest the potential to update legacy clinical decision support systems to incorporate modern machine learning classifiers to create better-performing tools.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    被篡改的多媒体内容越来越多地用于广泛的网络犯罪活动中。假新闻的传播,错误信息,数字绑架,与勒索软件有关的犯罪是最常见的犯罪之一,其中操纵的数码照片和视频是犯罪和传播媒介。刑事调查在应用机器学习技术自动区分伪造和真实的没收照片和视频方面受到了挑战。尽管需要手动验证,易于使用的数字取证平台对于自动化和促进篡改内容的检测以及帮助刑事调查人员的工作至关重要。本文提出了一种基于机器学习的支持向量机(SVM)区分真假多媒体文件的方法,即数码照片和视频,这可能表明存在deepfake内容。该方法在Python中实现,并在广泛使用的数字取证应用程序Autopsy中集成为新模块。所实现的方法提取了一组简单的特征,这些特征是通过将离散傅立叶变换(DFT)应用于数码照片和视频帧而产生的。使用大型分类多媒体文件数据集对模型进行了评估,该数据集包含合法和伪造的照片以及从视频中提取的帧。关于视频中的深度伪造检测,使用Celeb-DFv1数据集,以从YouTube收集的590个原始视频为特色,涵盖不同的主题。通过5倍交叉验证获得的结果优于文献中记录的基于SVM的方法,通过达到99.53%的平均F1分数,79.55%,和89.10%,分别为照片,视频,以及两种内容的混合。还使用最先进的方法进行了基准测试,通过将提出的SVM方法与深度学习方法进行比较,即卷积神经网络(CNN)。尽管CNN的性能优于所提出的DFT-SVM复合方法,DFT-SVM获得的结果的竞争力和大幅减少的处理时间使其适合实施和嵌入尸检模块,通过预测为每个分析的多媒体文件计算的虚假程度。
    Tampered multimedia content is being increasingly used in a broad range of cybercrime activities. The spread of fake news, misinformation, digital kidnapping, and ransomware-related crimes are amongst the most recurrent crimes in which manipulated digital photos and videos are the perpetrating and disseminating medium. Criminal investigation has been challenged in applying machine learning techniques to automatically distinguish between fake and genuine seized photos and videos. Despite the pertinent need for manual validation, easy-to-use platforms for digital forensics are essential to automate and facilitate the detection of tampered content and to help criminal investigators with their work. This paper presents a machine learning Support Vector Machines (SVM) based method to distinguish between genuine and fake multimedia files, namely digital photos and videos, which may indicate the presence of deepfake content. The method was implemented in Python and integrated as new modules in the widely used digital forensics application Autopsy. The implemented approach extracts a set of simple features resulting from the application of a Discrete Fourier Transform (DFT) to digital photos and video frames. The model was evaluated with a large dataset of classified multimedia files containing both legitimate and fake photos and frames extracted from videos. Regarding deepfake detection in videos, the Celeb-DFv1 dataset was used, featuring 590 original videos collected from YouTube, and covering different subjects. The results obtained with the 5-fold cross-validation outperformed those SVM-based methods documented in the literature, by achieving an average F1-score of 99.53%, 79.55%, and 89.10%, respectively for photos, videos, and a mixture of both types of content. A benchmark with state-of-the-art methods was also done, by comparing the proposed SVM method with deep learning approaches, namely Convolutional Neural Networks (CNN). Despite CNN having outperformed the proposed DFT-SVM compound method, the competitiveness of the results attained by DFT-SVM and the substantially reduced processing time make it appropriate to be implemented and embedded into Autopsy modules, by predicting the level of fakeness calculated for each analyzed multimedia file.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    必须高度重视与细菌抗生素耐药性的斗争,以避免由于临床相关抗生素的无效而导致的当前和新出现的治疗细菌感染的危机。内在基因突变和可转移抗生素抗性基因(ARGs)是抗生素抗性发展的核心。然而,传统的ARGs检测比对方法具有局限性。人工智能(AI)方法和方法可以潜在地增强ARG的检测,并识别抗生素靶标以及作为或可以开发为抗生素的拮抗杀菌和抑菌分子。这篇综述深入研究了关于识别和注释ARG的各种人工智能方法和方法的文献,强调他们的潜力和局限性。具体来说,我们讨论了(1)从基因组DNA序列中直接鉴定和分类ARGs的方法,(2)从质粒序列中直接鉴定和分类,(3)从特征选择中识别推定的ARG。
    The fight against bacterial antibiotic resistance must be given critical attention to avert the current and emerging crisis of treating bacterial infections due to the inefficacy of clinically relevant antibiotics. Intrinsic genetic mutations and transferrable antibiotic resistance genes (ARGs) are at the core of the development of antibiotic resistance. However, traditional alignment methods for detecting ARGs have limitations. Artificial intelligence (AI) methods and approaches can potentially augment the detection of ARGs and identify antibiotic targets and antagonistic bactericidal and bacteriostatic molecules that are or can be developed as antibiotics. This review delves into the literature regarding the various AI methods and approaches for identifying and annotating ARGs, highlighting their potential and limitations. Specifically, we discuss methods for (1) direct identification and classification of ARGs from genome DNA sequences, (2) direct identification and classification from plasmid sequences, and (3) identification of putative ARGs from feature selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    精准医学是开发基于证据的医学建议的框架,旨在确定适合所有可观察到的相关患者水平特征的最佳治疗顺序。因为精准医学依赖于高度敏感,患者级数据,确保参与者的隐私非常重要。动态治疗方案(DTR)在纵向环境中提供了一种精确医学的形式化。结果加权学习(OWL)是基于观测数据估计最佳DTR的一系列技术。OWL技术利用支持向量机(SVM)分类器来执行估计。SVM基于被称为支持向量的数据中的一组有影响的点执行分类。SVM产生的分类规则通常需要直接访问支持向量。因此,发布OWL估计的治疗策略需要发布样本中患者子集的患者数据。因此,来自SVM的分类规则对于其数据包含支持向量的那些个体构成严重的隐私侵犯。这种隐私侵犯是一个主要问题,特别是考虑到DTR估计中使用的潜在高度敏感的医疗数据。差分隐私已经成为确保个人数据隐私的数学框架,对对手可以确定个人特征的可能性有可证明的保证。我们在DTR的背景下提供了对差分隐私的首次调查,并提供了差分私有OWL估计器,理论结果使我们能够根据私人估计量的准确性来量化隐私成本。
    Precision medicine is a framework for developing evidence-based medical recommendations that seeks to determine the optimal sequence of treatments tailored to all of the relevant patient-level characteristics which are observable. Because precision medicine relies on highly sensitive, patient-level data, ensuring the privacy of participants is of great importance. Dynamic treatment regimes (DTRs) provide one formalization of precision medicine in a longitudinal setting. Outcome-Weighted Learning (OWL) is a family of techniques for estimating optimal DTRs based on observational data. OWL techniques leverage support vector machine (SVM) classifiers in order to perform estimation. SVMs perform classification based on a set of influential points in the data known as support vectors. The classification rule produced by SVMs often requires direct access to the support vectors. Thus, releasing a treatment policy estimated with OWL requires the release of patient data for a subset of patients in the sample. As a result, the classification rules from SVMs constitute a severe privacy violation for those individuals whose data comprise the support vectors. This privacy violation is a major concern, particularly in light of the potentially highly sensitive medical data which are used in DTR estimation. Differential privacy has emerged as a mathematical framework for ensuring the privacy of individual-level data, with provable guarantees on the likelihood that individual characteristics can be determined by an adversary. We provide the first investigation of differential privacy in the context of DTRs and provide a differentially private OWL estimator, with theoretical results allowing us to quantify the cost of privacy in terms of the accuracy of the private estimators.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:非黑色素瘤皮肤癌(NMSC)的早期检测对于确保患者接受最有效的治疗至关重要。由于与其他类型的皮肤病变的混淆率很高,因此NMSC的诊断筛查工具至关重要。如光化性角化病。然而,目前的诊断和筛查患者的手段依赖于视觉标准,通常以主观性和经验为条件,或高度侵入性,慢,和昂贵的方法,如组织学诊断。由此,本研究的目的是测试分类精度是否在电磁波谱的近红外区域提高,与先前在较短波长的研究相反。
    方法:本研究利用近红外高光谱成像,在900.6和1454.8nm的范围内。共捕获125名患者的图像,包括66例基底细胞癌,42患有皮肤鳞状细胞癌,17患有光化性角化病,区分健康和不健康的皮肤病变。采用混合卷积神经网络(用于特征提取)和支持向量机算法(作为最终激活层)的组合进行分析。此外,我们从在电磁波谱的较短波长上训练的网络中测试迁移学习是否可行。
    结果:实施的方法达到了80%以上的一般精度,有些任务达到90%以上。还发现F1得分通常超过0.8的最佳阈值。检测光化性角化病时获得了最好的结果,然而,区分这两种类型的恶性病变通常被认为是更困难的。这些结果证明了近红外高光谱成像结合先进的机器学习技术在区分NMSC与其他皮肤病变方面的潜力。迁移学习在改进这些算法的训练方面没有成功。
    结论:我们已经表明,电磁波谱的近红外区域对于识别和研究非黑素瘤型皮肤病变非常有用。虽然结果很有希望,需要进一步的研究来开发更稳健的算法,以在临床应用可行之前将这些数据集中的噪声影响降至最低.
    BACKGROUND: The early detection of Non-Melanoma Skin Cancer (NMSC) is essential to ensure patients receive the most effective treatment. Diagnostic screening tools for NMSC are crucial due to high confusion rates with other types of skin lesions, such as Actinic Keratosis. Nevertheless, current means of diagnosing and screening patients rely on either visual criteria, that are often conditioned by subjectivity and experience, or highly invasive, slow, and costly methods, such as histological diagnoses. From this, the objectives of the present study are to test if classification accuracies improve in the Near-Infrared region of the electromagnetic spectrum, as opposed to previous research in shorter wavelengths.
    METHODS: This study utilizes near-infrared hyperspectral imaging, within the range of 900.6 and 1454.8 nm. Images were captured for a total of 125 patients, including 66 patients with Basal Cell Carcinoma, 42 with cutaneous Squamous Cell Carcinoma, and 17 with Actinic Keratosis, to differentiate between healthy and unhealthy skin lesions. A combination of hybrid convolutional neural networks (for feature extraction) and support vector machine algorithms (as a final activation layer) was employed for analysis. In addition, we test whether transfer learning is feasible from networks trained on shorter wavelengths of the electromagnetic spectrum.
    RESULTS: The implemented method achieved a general accuracy of over 80 %, with some tasks reaching over 90 %. F1 scores were also found to generally be over the optimal threshold of 0.8. The best results were obtained when detecting Actinic Keratosis, however differentiation between the two types of malignant lesions was often noted to be more difficult. These results demonstrate the potential of near-infrared hyperspectral imaging combined with advanced machine learning techniques in distinguishing NMSC from other skin lesions. Transfer learning was unsuccessful in improving the training of these algorithms.
    CONCLUSIONS: We have shown that the Near-Infrared region of the electromagnetic spectrum is highly useful for the identification and study of non-melanoma type skin lesions. While the results are promising, further research is required to develop more robust algorithms that can minimize the impact of noise in these datasets before clinical application is feasible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景痉挛型脑瘫,最常见的儿童发病致残疾病,估计儿童患病率为0.2%,是一种以僵硬运动为特征的复杂状况,肌肉挛缩,和异常步态会降低生活质量。痉挛型CP约占所有CP病例的83%,并且经常与其他复杂疾病同时发生。比如癫痫.估计42%的痉挛型CP病例同时发生癫痫。不幸的是,CP通常难以诊断。虽然大多数患有CP的孩子出生时就有它或在出生后立即获得它,许多患者在19个月后才被确诊,而CP诊断往往要到5岁才被确诊.需要新的生物信息学方法来早期识别CP。最近的研究表明,与CP相关的DNA甲基化模式改变可能具有诊断价值。并发癫痫对这些模式的潜在混淆作用尚不清楚。我们评估了有或没有并发癫痫的CP患者的机器学习分类。结果从30名诊断为癫痫的研究参与者(n=4)中收集全血样本,痉挛CP(n=10),两者(n=8),或者都没有(n=8)。开发了一种新颖的支持向量机学习算法来识别甲基化基因座,该甲基化基因座能够在存在或不存在癫痫的情况下将CP与对照进行分类。该算法还用于测量鉴定的甲基化基因座的分类能力。数据预处理后,在CP和对照之间的二元比较中进行了重要的甲基化基因座的分离,以及在四向方案中,封装癫痫诊断。类似地评估分类能力。在有或没有将癫痫作为特征的情况下,对CP分类性能进行了评估。在4级比较中,F1得分中位数为0.67,和二元分类中的1.0,优于线性判别分析(分别为0.57和0.86)。结论这种新颖的算法能够将患有痉挛型CPA和/或癫痫的研究参与者与具有显著表现的对照进行分类。该算法有望在诊断甲基化基因座的甲基化数据中快速鉴定。在这个模型中,支持向量机在分类方面优于线性判别分析。在评估基于表观遗传学的CP诊断时,癫痫可能不是一个显著的混杂因素。
    UNASSIGNED: Spastic cerebral palsy, the most common pediatric-onset disabling condition with an estimated prevalence of 0.2% in children, is a complex condition characterized by stiff movement, muscle contractures, and abnormal gait that can diminish quality of life. Spastic CP accounts for approximately 83% of all CP cases and frequently co-occurs with other complex conditions, like epilepsy. An estimated 42% of spastic CP cases have co-occurring epilepsy. Unfortunately, CP is often difficult to diagnose. Although most children with CP are born with it or acquire it immediately after birth, many are not identified until after 19 months of age with CP diagnosis often not confirmed until 5 years of age. New bioinformatic approaches to identify CP earlier are needed. Recent studies indicate that altered DNA methylation patterns associated with CP may have diagnostic value. The potential confounding effects of co-occurrent epilepsy on these patterns are not known. We evaluated machine learning classification of CP patients with or without co-occurring epilepsy.
    UNASSIGNED: Whole blood samples were collected from 30 study participants diagnosed with epilepsy (n=4), spastic CP (n=10), both (n=8), or neither (n=8). A novel Support-Vector-Machine learning algorithm was developed to identify methylation loci that have ability to classify CP from controls in the presence or absence of epilepsy. This algorithm was also employed to measure classification ability of identified methylation loci. After preprocessing of data, isolation of important methylation loci was performed in a binary comparison between CP and controls, as well as in a 4-way scheme, encapsulating epilepsy diagnoses. The classification ability was similarly assessed. CP Classification performance was evaluated with and without inclusion of epilepsy as a feature. Median F1 scores were 0.67 in 4-class comparison, and 1.0 in the binary classification, outperforming Linear-Discriminant-Analysis (0.57 and 0.86, respectively).
    UNASSIGNED: This novel algorithm was able to classify study participants with spastic CP and/or epilepsy from controls with significant performance. The algorithm shows promise for rapid identification in methylation data of diagnostic methylation loci. In this model, Support Vector Machines outperformed Linear Discriminant Analysis in classification. In the evaluation of epigenetics-based diagnostics for CP, epilepsy may not be a significant confounding factor.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号