SVM, Support vector machines

  • 文章类型: Journal Article
    翻译后修饰(PTM)与许多疾病密切相关。在调节蛋白质结构方面发挥着重要作用,活动,和功能。因此,PTM的鉴定对于理解细胞生物学和疾病治疗的机制至关重要。与传统的机器学习方法相比,PTM预测的深度学习方法提供了准确、快速的筛查,指导下游湿实验,利用筛选的信息进行重点研究。在本文中,我们回顾了深度学习识别磷酸化的最新工作,乙酰化,泛素化,和其他PTM类型。此外,我们总结了PTM数据库,并讨论了未来的发展方向。
    Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    用于重组蛋白生产(RPP)的发酵过程的优化通常是资源密集型的。机器学习(ML)方法有助于最小化实验并在RPP中找到广泛的应用。然而,这些基于ML的工具主要关注氨基酸序列的特征,排除发酵工艺条件的影响。本研究将发酵过程条件的特征与氨基酸序列的特征相结合,构建了基于ML的模型,该模型可预测大肠杆菌周质中目标重组蛋白表达的最大蛋白质产量和相应的发酵条件。在第一阶段使用两组XGBoost分类器将靶蛋白的表达水平分类为高(>50mg/L),培养基(0.5至50mg/L),或低(<0.5mg/L)。第二阶段框架由三个回归模型组成,该模型涉及支持向量机和随机森林,以预测与每个表达式级别类相对应的表达式产量。独立测试表明,对于正确分类的实例,预测器的总体平均准确性为75%,皮尔逊系数相关性为0.91。因此,我们的模型提供了大量试错实验的可靠替代,以确定RPP的最佳发酵条件和产量。它也被实现为一个开放访问的网络服务器,PERISCOPE-Opt(http://periscope-opt。erc.莫纳什.edu)。
    Optimization of the fermentation process for recombinant protein production (RPP) is often resource-intensive. Machine learning (ML) approaches are helpful in minimizing the experimentations and find vast applications in RPP. However, these ML-based tools primarily focus on features with respect to amino-acid-sequence, ruling out the influence of fermentation process conditions. The present study combines the features derived from fermentation process conditions with that from amino acid-sequence to construct an ML-based model that predicts the maximal protein yields and the corresponding fermentation conditions for the expression of target recombinant protein in the Escherichia coli periplasm. Two sets of XGBoost classifiers were employed in the first stage to classify the expression levels of the target protein as high (>50 mg/L), medium (between 0.5 and 50 mg/L), or low (<0.5 mg/L). The second-stage framework consisted of three regression models involving support vector machines and random forest to predict the expression yields corresponding to each expression-level-class. Independent tests showed that the predictor achieved an overall average accuracy of 75% and a Pearson coefficient correlation of 0.91 for the correctly classified instances. Therefore, our model offers a reliable substitution of numerous trial-and-error experiments to identify the optimal fermentation conditions and yield for RPP. It is also implemented as an open-access webserver, PERISCOPE-Opt (http://periscope-opt.erc.monash.edu).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    未经授权:放疗计划和定量成像生物标志物目的都需要肿瘤勾画。这是一个手册,时间和劳动密集型的过程容易出现观察者之间和观察者之间的变化。半自动或全自动分割可以提供更好的效率和一致性。本研究旨在研究包含和结合功能与解剖磁共振成像(MRI)序列对自动分割质量的影响。
    未经评估:T2加权(T2w),扩散加权,多回波T2*加权,分析中使用了81例直肠癌患者的动态多回声(DME)MR图像。四种经典的机器学习算法;自适应增强(ADA),线性和二次判别分析和支持向量机,使用MR图像的不同组合作为输入来训练肿瘤和正常组织的自动分割,其次是半自动形态学后处理。两位专家的人工描述是事实。Sørensen-Dice相似性系数(DICE)和平均对称表面距离(MSD)用作留一交叉验证中的性能指标。
    未经评估:单独使用T2w图像,ADA优于其他算法,每位患者的平均DICE为0.67,MSD为3.6毫米。当添加功能图像时,性能得到改善,对于基于T2w和DME图像(DICE:0.72,MSD:2.7mm)或所有四个MRI序列(DICE:0.72,MSD:2.5mm)的模型,性能最高。
    未经评估:使用功能性MRI的机器学习模型,特别是DME,相对于单独使用T2wMRI的模型,有可能改善直肠癌的自动分割。
    UNASSIGNED: Tumor delineation is required both for radiotherapy planning and quantitative imaging biomarker purposes. It is a manual, time- and labor-intensive process prone to inter- and intraobserver variations. Semi or fully automatic segmentation could provide better efficiency and consistency. This study aimed to investigate the influence of including and combining functional with anatomical magnetic resonance imaging (MRI) sequences on the quality of automatic segmentations.
    UNASSIGNED: T2-weighted (T2w), diffusion weighted, multi-echo T2*-weighted, and contrast enhanced dynamic multi-echo (DME) MR images of eighty-one patients with rectal cancer were used in the analysis. Four classical machine learning algorithms; adaptive boosting (ADA), linear and quadratic discriminant analysis and support vector machines, were trained for automatic segmentation of tumor and normal tissue using different combinations of the MR images as input, followed by semi-automatic morphological post-processing. Manual delineations from two experts served as ground truth. The Sørensen-Dice similarity coefficient (DICE) and mean symmetric surface distance (MSD) were used as performance metric in leave-one-out cross validation.
    UNASSIGNED: Using T2w images alone, ADA outperformed the other algorithms, yielding a median per patient DICE of 0.67 and MSD of 3.6 mm. The performance improved when functional images were added and was highest for models based on either T2w and DME images (DICE: 0.72, MSD: 2.7 mm) or all four MRI sequences (DICE: 0.72, MSD: 2.5 mm).
    UNASSIGNED: Machine learning models using functional MRI, in particular DME, have the potential to improve automatic segmentation of rectal cancer relative to models using T2w MRI alone.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Metabolomics is an expanding field of medical diagnostics since many diseases cause metabolic reprogramming alteration. Additionally, the metabolic point of view offers an insight into the molecular mechanisms of diseases. Due to the complexity of metabolic assignment dependent on the 1D NMR spectral analysis, 2D NMR techniques are preferred because of spectral resolution issues. Thus, in this work, we introduce an automated metabolite identification and assignment from 1H-1H TOCSY (total correlation spectroscopy) using real breast cancer tissue. The new approach is based on customized and extended semi-supervised classifiers: KNFST, SVM, third (PC3) and fourth (PC4) degree polynomial. In our approach, metabolic assignment is based only on the vertical and horizontal frequencies of the metabolites in the 1H-1H TOCSY. KNFST and SVM show high performance (high accuracy and low mislabeling rate) in relatively low size of initially labeled training data. PC3 and PC4 classifiers showed lower accuracy and high mislabeling rates, and both classifiers fail to provide an acceptable accuracy at extremely low size (≤9% of the entire dataset) of initial training data. Additionally, semi-supervised classifiers were implemented to obtain a fully automatic procedure for signal assignment and deconvolution of TOCSY, which is a big step forward in NMR metabolic profiling. A set of 27 metabolites were deduced from the TOCSY, and their assignments agreed with the metabolites deduced from a 1D NMR spectrum of the same sample analyzed by conventional human-based methodology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    胶质瘤是中枢神经系统最常见的原发性肿瘤之一。先前的研究发现巨噬细胞积极参与肿瘤的生长。
    使用加权基因共表达网络分析来鉴定用于聚类的有意义的巨噬细胞相关基因基因。Pamr,SVM,和神经网络用于验证聚类结果。体细胞突变和甲基化用于定义鉴定的簇的特征。进行弹性回归和主成分分析后,分层组之间的差异表达基因(DEGs)用于构建MScore。基于单细胞测序分析在肿瘤微环境中评估巨噬细胞特异性基因的表达。来自15个神经胶质瘤数据集的总共2365个样品和5842个泛癌症样品用于MScore的外部验证。
    巨噬细胞与神经胶质瘤患者的生存率呈负相关。通过弹性回归和PCA获得的26个巨噬细胞特异性DEGs在巨噬细胞中在单细胞水平上高表达。通过浸润微环境的活跃促炎和代谢谱以及对具有该特征的样品的免疫疗法的反应,验证了MScore在神经胶质瘤中的预后价值。MScore设法在15个外部神经胶质瘤数据集和泛癌症数据集中对患者生存概率进行分层,这预测了更糟糕的生存结果。湘雅胶质瘤队列的测序数据和免疫组织化学证实了MScore的预后价值。基于MScore的预后模型显示出较高的准确率。
    我们的发现强烈支持巨噬细胞的调节作用,特别是M2巨噬细胞在神经胶质瘤的进展和值得进一步的实验研究。
    UNASSIGNED: Gliomas are one of the most common types of primary tumors in central nervous system. Previous studies have found that macrophages actively participate in tumor growth.
    UNASSIGNED: Weighted gene co-expression network analysis was used to identify meaningful macrophage-related gene genes for clustering. Pamr, SVM, and neural network were applied for validating clustering results. Somatic mutation and methylation were used for defining the features of identified clusters. Differentially expressed genes (DEGs) between the stratified groups after performing elastic regression and principal component analyses were used for the construction of MScores. The expression of macrophage-specific genes were evaluated in tumor microenvironment based on single cell sequencing analysis. A total of 2365 samples from 15 glioma datasets and 5842 pan-cancer samples were used for external validation of MScore.
    UNASSIGNED: Macrophages were identified to be negatively associated with the survival of glioma patients. Twenty-six macrophage-specific DEGs obtained by elastic regression and PCA were highly expressed in macrophages at single-cell level. The prognostic value of MScores in glioma was validated by the active proinflammatory and metabolic profile of infiltrating microenvironment and response to immunotherapies of samples with this signature. MScores managed to stratify patient survival probabilities in 15 external glioma datasets and pan-cancer datasets, which predicted worse survival outcome. Sequencing data and immunohistochemistry of Xiangya glioma cohort confirmed the prognostic value of MScores. A prognostic model based on MScores demonstrated high accuracy rate.
    UNASSIGNED: Our findings strongly support a modulatory role of macrophages, especially M2 macrophages in glioma progression and warrants further experimental studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    药物发现旨在寻找具有特定化学性质的用于治疗疾病的新化合物。在过去的几年里,在这个搜索中使用的方法提出了一个重要的组成部分,在计算机科学与机器学习技术的飞涨,由于其民主化。随着精准医学计划设定的目标和产生的新挑战,有必要建立健壮的,实现既定目标的标准和可重复的计算方法。目前,基于机器学习的预测模型在临床前研究之前的步骤中已经变得非常重要。这一阶段设法大大减少了发现新药的成本和研究时间。这篇综述文章的重点是如何在近年来的研究中使用这些新方法。分析该领域的最新技术将使我们了解在短期内化学信息学的发展方向,它所呈现的局限性和所取得的积极成果。这篇综述将主要关注用于对分子数据进行建模的方法,以及近年来解决的生物学问题和用于药物发现的机器学习算法。
    Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    核酸测序技术的进步使我们能够扩大分析微生物多样性的能力。这些分类和功能多样性的大型数据集是更好地理解微生物生态学的关键。机器学习已被证明是分析微生物群落数据并对包括人类和环境健康在内的结果进行预测的有用方法。应用于微生物群落概况的机器学习已用于预测人类健康中的疾病状态,环境质量和环境污染的存在,作为法医的痕迹证据.机器学习作为一种强大的工具具有吸引力,可以提供对微生物群落的深入见解并识别微生物群落数据中的模式。然而,通常机器学习模型可以用作黑匣子来预测特定的结果,几乎不了解模型是如何得出预测的。复杂的机器学习算法通常可能会在牺牲可解释性的情况下重视更高的准确性和性能。为了利用机器学习进行更多与微生物组相关的转化研究,并加强我们提取有意义的生物信息的能力。重要的是模型是可解释的。在这里,我们回顾了机器学习在微生物生态学中应用的当前趋势,以及更广泛地应用机器学习来理解微生物群落的一些重要挑战和机遇。
    Advances in nucleic acid sequencing technology have enabled expansion of our ability to profile microbial diversity. These large datasets of taxonomic and functional diversity are key to better understanding microbial ecology. Machine learning has proven to be a useful approach for analyzing microbial community data and making predictions about outcomes including human and environmental health. Machine learning applied to microbial community profiles has been used to predict disease states in human health, environmental quality and presence of contamination in the environment, and as trace evidence in forensics. Machine learning has appeal as a powerful tool that can provide deep insights into microbial communities and identify patterns in microbial community data. However, often machine learning models can be used as black boxes to predict a specific outcome, with little understanding of how the models arrived at predictions. Complex machine learning algorithms often may value higher accuracy and performance at the sacrifice of interpretability. In order to leverage machine learning into more translational research related to the microbiome and strengthen our ability to extract meaningful biological information, it is important for models to be interpretable. Here we review current trends in machine learning applications in microbial ecology as well as some of the important challenges and opportunities for more broad application of machine learning to understanding microbial communities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    (123)I-ioflupane单光子发射计算机断层扫描(SPECT)是帕金森氏病(PD)和非典型帕金森氏综合症(APS)的灵敏且完善的成像工具,然而,至少根据目视检查或简单的感兴趣区域分析,PD和APS之间的区分被认为不一致.在这里,我们通过应用先进的图像分析技术将PD与各种APS分开来重新评估此问题。这项研究包括392例连续的退行性帕金森病患者在过去十年中接受(123)I-ioflupaneSPECT:306PD,24多系统萎缩(MSA),32例进行性核上性麻痹(PSP)和30例皮质基底变性(CBD)患者。数据分析包括使用线性判别分类器的逐体素单变量统计参数映射和多变量模式识别。相对于PD和CBD,MSA和PSP显示尾状核头部摄取较少,然而MSA和PSP之间没有差异。相对于PD,CBD在两个壳核中都有更高的摄取,MSA和PSP。PD与APS(AUC0.69,p<0.05)以及APS亚型之间的分类显着(MSA与CBDAUC0.80,p<0.05;MSA与PSPAUC0.69p<0.05;CBD与PSPAUC0.69p<0.05)。纹状体和纹状体外区域都包含分类信息,然而,两个区域的组合并不能显著提高分类精度.PD,MSA,PSP和CBD在(123)I-ioflupaneSPECT上具有不同的多巴胺能消耗模式。PD相对于APS的84-90%的高特异性表明分类器对于确认APS病例特别有用。
    (123)I-ioflupane single photon emission computed tomography (SPECT) is a sensitive and well established imaging tool in Parkinson\'s disease (PD) and atypical parkinsonian syndromes (APS), yet a discrimination between PD and APS has been considered inconsistent at least based on visual inspection or simple region of interest analyses. We here reappraise this issue by applying advanced image analysis techniques to separate PD from the various APS. This study included 392 consecutive patients with degenerative parkinsonism undergoing (123)I-ioflupane SPECT at our institution over the last decade: 306 PD, 24 multiple system atrophy (MSA), 32 progressive supranuclear palsy (PSP) and 30 corticobasal degeneration (CBD) patients. Data analysis included voxel-wise univariate statistical parametric mapping and multivariate pattern recognition using linear discriminant classifiers. MSA and PSP showed less ioflupane uptake in the head of caudate nucleus relative to PD and CBD, yet there was no difference between MSA and PSP. CBD had higher uptake in both putamen relative to PD, MSA and PSP. Classification was significant for PD versus APS (AUC 0.69, p < 0.05) and between APS subtypes (MSA vs CBD AUC 0.80, p < 0.05; MSA vs PSP AUC 0.69 p < 0.05; CBD vs PSP AUC 0.69 p < 0.05). Both striatal and extra-striatal regions contain classification information, yet the combination of both regions does not significantly improve classification accuracy. PD, MSA, PSP and CBD have distinct patterns of dopaminergic depletion on (123)I-ioflupane SPECT. The high specificity of 84-90% for PD versus APS indicates that the classifier is particularly useful for confirming APS cases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号