t-SNE, t-distributed stochastic neighbor embedding

t - SNE,t 分布随机邻居嵌入
  • 文章类型: Journal Article
    N6-甲基腺嘌呤(6mA)在包括DNA复制在内的各种表观遗传过程中起着至关重要的作用。DNA修复,沉默,转录,和癌症等疾病。为了理解这种表观遗传机制,6mA已通过高通量技术以单碱基分辨率在全基因组规模上检测到,与常规方法如免疫沉淀,质谱和毛细管电泳,但是这些实验方法费时费力。为了补充这些问题,我们开发了一个基于CNN的6mA站点预测器,命名为CNN6mA,提出了两种新的架构:位置特定的一维卷积层和交叉交互网络。在位置特定的1-D卷积层中,将具有不同窗口大小的位置特定的过滤器应用于查询序列,而不是在所有位置上共享相同的过滤器,以便在不同级别提取位置特定的特征。交叉交互网络探索了查询序列中所有核苷酸模式之间的关系。因此,CNN6mA在许多物种中的表现优于现有的最先进的模型,并创建了能够智能地解释预测机制的贡献得分向量。CNN6mA中的源代码和Web应用程序可在https://github.com/kuratahiroyuki/CNN6mA上免费访问。git和http://kurata35。bio.kyutech.AC.jp/CNN6mA/,分别。
    N6-methyladenine (6mA) plays a critical role in various epigenetic processing including DNA replication, DNA repair, silencing, transcription, and diseases such as cancer. To understand such epigenetic mechanisms, 6 mA has been detected by high-throughput technologies on a genome-wide scale at single-base resolution, together with conventional methods such as immunoprecipitation, mass spectrometry and capillary electrophoresis, but these experimental approaches are time-consuming and laborious. To complement these problems, we have developed a CNN-based 6 mA site predictor, named CNN6mA, which proposed two new architectures: a position-specific 1-D convolutional layer and a cross-interactive network. In the position-specific 1-D convolutional layer, position-specific filters with different window sizes were applied to an inquiry sequence instead of sharing the same filters over all positions in order to extract the position-specific features at different levels. The cross-interactive network explored the relationships between all the nucleotide patterns within the inquiry sequence. Consequently, CNN6mA outperformed the existing state-of-the-art models in many species and created the contribution score vector that intelligibly interpret the prediction mechanism. The source codes and web application in CNN6mA are freely accessible at https://github.com/kuratahiroyuki/CNN6mA.git and http://kurata35.bio.kyutech.ac.jp/CNN6mA/, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    病毒感染是全世界的主要健康问题。SARS-CoV-2传播的惊人速度,例如,导致了一场世界性的大流行.病毒将遗传物质整合到宿主基因组中以劫持宿主细胞功能,例如细胞周期和凋亡。在这些病毒过程中,蛋白质-蛋白质相互作用(PPIs)起着关键作用。因此,人与病毒之间PPI的鉴定对于了解感染机制和宿主对病毒感染的免疫反应以及发现有效药物至关重要。包括基于质谱的蛋白质组学和酵母双杂交测定法在内的实验方法被广泛用于鉴定人病毒PPI,但是这些实验方法很耗时,贵,和费力。为了克服这个问题,我们开发了一种新的计算预测器,名为交叉注意力PHV,通过实现交叉注意力机制和一维卷积神经网络(1D-CNN)两个关键技术。交叉注意力机制在增强预测和泛化能力方面非常有效。将1D-CNN应用于word2vec生成的特征矩阵降低了计算成本,从而将蛋白质序列的允许长度扩展到9000个氨基酸残基。使用基准数据集和准确预测未知病毒的PPI,交叉注意力PHV的性能优于现有的最新模型。交叉注意力PHV还预测了人SARS-CoV-2PPI的曲线下面积值>0.95。Cross-attentionPHVWeb服务器和源代码可在https://kurata35免费获得。bio.kyutech.AC.jp/Cross-attention_PHV/andhttps://github.com/kuratahiroyuki/Cross-attention_PHV,分别。
    Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein-protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human-SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    与细胞和分子的空间组织相关的转录组水平表达数据将允许全面理解基因表达如何与生物系统中的结构和功能相关。空间转录组学平台可能很快提供此类信息。然而,目前的平台仍然缺乏空间分辨率,只捕获转录组异质性的一小部分,或缺乏大规模研究的吞吐量。在规划空间转录组学研究时,需要考虑当前ST平台和计算解决方案的优缺点。计算ST分析的基础是为单细胞RNA测序数据开发的解决方案,考虑到转录组的空间连通性的进步。scRNA-seq工具已针对空间转录组学或基于深度学习的表达联合分析等新解决方案进行了修改。空间,和图像数据被开发以提取空间分辨转录组中的生物信息。计算ST分析可以揭示对基因表达空间模式的显着生物学见解,细胞信号,以及与复杂组织中细胞类型特异性信号传导和组织有关的细胞类型变化。这篇综述涵盖了帮助选择空间转录组学研究平台和计算解决方案的主题。我们专注于当前可用的ST方法和平台及其优势和局限性。在计算解决方案中,我们概述了ST数据分析中使用的分析步骤和工具。总结了与当前ST分析框架提供的数据类型和工具的兼容性。
    Transcriptome level expression data connected to the spatial organization of the cells and molecules would allow a comprehensive understanding of how gene expression is connected to the structure and function in the biological systems. The spatial transcriptomics platforms may soon provide such information. However, the current platforms still lack spatial resolution, capture only a fraction of the transcriptome heterogeneity, or lack the throughput for large scale studies. The strengths and weaknesses in current ST platforms and computational solutions need to be taken into account when planning spatial transcriptomics studies. The basis of the computational ST analysis is the solutions developed for single-cell RNA-sequencing data, with advancements taking into account the spatial connectedness of the transcriptomes. The scRNA-seq tools are modified for spatial transcriptomics or new solutions like deep learning-based joint analysis of expression, spatial, and image data are developed to extract biological information in the spatially resolved transcriptomes. The computational ST analysis can reveal remarkable biological insights into spatial patterns of gene expression, cell signaling, and cell type variations in connection with cell type-specific signaling and organization in complex tissues. This review covers the topics that help choosing the platform and computational solutions for spatial transcriptomics research. We focus on the currently available ST methods and platforms and their strengths and limitations. Of the computational solutions, we provide an overview of the analysis steps and tools used in the ST data analysis. The compatibility with the data types and the tools provided by the current ST analysis frameworks are summarized.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组尺度代谢模型(GEM)已被建立为通过预测细胞内通量在系统水平上研究细胞代谢的重要工具。随着通用人类GEM的出现,它们越来越多地应用于一系列疾病,通常是为了预测有效的代谢药物靶标。癌症是一种代表性疾病,使用GEMs已被证明是有效的,部分原因是患者特异性RNA-seq数据的大量可用性。当使用人类GEM时,需要首先通过使用细胞特异性RNA-seq数据来开发所谓的上下文特异性GEM。特定于上下文的GEM的生物学有效性在很大程度上取决于模型提取方法(MEM)和模型模拟方法(MSM)。然而,虽然MEMs已经被彻底检查过,MSM尚未经过系统检查,尤其是,在研究癌症代谢时。在这项研究中,通过检查所产生的癌症患者特异性GEM的生物学特征来评估三个MEM和五个MSM的成对组合的效果。总共重建了1,562例患者特异性GEM,并进行机器学习指导和生物学评估,以得出稳健的结论。从评估中得到了值得注意的观察,包括两个MEM的高性能,即基于等级的“任务驱动的组织整合网络推断”(tINIT)或“代谢和表达调节的基因不活动”(GIMME),与最小绝对偏差(LAD)配对作为MSM,通量平衡分析(FBA)和简约FBA(pFBA)的性能相对较差。在使用患者特异性GEM研究癌症代谢时,可以将这项研究的见解视为参考。
    Genome-scale metabolic model (GEM) has been established as an important tool to study cellular metabolism at a systems level by predicting intracellular fluxes. With the advent of generic human GEMs, they have been increasingly applied to a range of diseases, often for the objective of predicting effective metabolic drug targets. Cancer is a representative disease where the use of GEMs has proved to be effective, partly due to the massive availability of patient-specific RNA-seq data. When using a human GEM, so-called context-specific GEM needs to be developed first by using cell-specific RNA-seq data. Biological validity of a context-specific GEM highly depends on both model extraction method (MEM) and model simulation method (MSM). However, while MEMs have been thoroughly examined, MSMs have not been systematically examined, especially, when studying cancer metabolism. In this study, the effects of pairwise combinations of three MEMs and five MSMs were evaluated by examining biological features of the resulting cancer patient-specific GEMs. For this, a total of 1,562 patient-specific GEMs were reconstructed, and subjected to machine learning-guided and biological evaluations to draw robust conclusions. Noteworthy observations were made from the evaluation, including the high performance of two MEMs, namely rank-based \'task-driven Integrative Network Inference for Tissues\' (tINIT) or \'Gene Inactivity Moderated by Metabolism and Expression\' (GIMME), paired with least absolute deviation (LAD) as a MSM, and relatively poorer performance of flux balance analysis (FBA) and parsimonious FBA (pFBA). Insights from this study can be considered as a reference when studying cancer metabolism using patient-specific GEMs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结直肠癌(CRC),世界范围内的恶性肿瘤由微卫星不稳定性(MSI)和稳定(MSS)表型组成。尽管SHP2是癌症治疗的一个有希望的靶点,它与先天免疫抑制的关系仍然难以捉摸。为了解决这个问题,进行单细胞RNA测序以探索SHP2在小鼠MC38异种移植物的所有细胞类型的肿瘤微环境(TME)中的作用。发现瘤内细胞在功能上是异质的,并且对SHP2变构抑制剂SHP099有显着反应。SHP099明显阻止了肿瘤细胞的恶性演变。机械上,STING-TBK1-IRF3介导的I型干扰素信号在浸润的骨髓细胞中被SHP099高度激活。值得注意的是,与MSI高表型相比,具有MSS表型的CRC患者在CD68巨噬细胞中表现出更大的巨噬细胞浸润和更有效的SHP2磷酸化,提示巨噬细胞SHP2在TME中的潜在作用。总的来说,我们的数据揭示了SHP2介导的先天免疫抑制机制,提示SHP2是结肠癌免疫治疗的一个有前景的靶点.
    Colorectal cancer (CRC), a malignant tumor worldwide consists of microsatellite instability (MSI) and stable (MSS) phenotypes. Although SHP2 is a hopeful target for cancer therapy, its relationship with innate immunosuppression remains elusive. To address that, single-cell RNA sequencing was performed to explore the role of SHP2 in all cell types of tumor microenvironment (TME) from murine MC38 xenografts. Intratumoral cells were found to be functionally heterogeneous and responded significantly to SHP099, a SHP2 allosteric inhibitor. The malignant evolution of tumor cells was remarkably arrested by SHP099. Mechanistically, STING-TBK1-IRF3-mediated type I interferon signaling was highly activated by SHP099 in infiltrated myeloid cells. Notably, CRC patients with MSS phenotype exhibited greater macrophage infiltration and more potent SHP2 phosphorylation in CD68+ macrophages than MSI-high phenotypes, suggesting the potential role of macrophagic SHP2 in TME. Collectively, our data reveals a mechanism of innate immunosuppression mediated by SHP2, suggesting that SHP2 is a promising target for colon cancer immunotherapy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    药物发现旨在寻找具有特定化学性质的用于治疗疾病的新化合物。在过去的几年里,在这个搜索中使用的方法提出了一个重要的组成部分,在计算机科学与机器学习技术的飞涨,由于其民主化。随着精准医学计划设定的目标和产生的新挑战,有必要建立健壮的,实现既定目标的标准和可重复的计算方法。目前,基于机器学习的预测模型在临床前研究之前的步骤中已经变得非常重要。这一阶段设法大大减少了发现新药的成本和研究时间。这篇综述文章的重点是如何在近年来的研究中使用这些新方法。分析该领域的最新技术将使我们了解在短期内化学信息学的发展方向,它所呈现的局限性和所取得的积极成果。这篇综述将主要关注用于对分子数据进行建模的方法,以及近年来解决的生物学问题和用于药物发现的机器学习算法。
    Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:乳腺癌疾病是美国女性中最常见的癌症,也是女性癌症死亡的第二大原因。
    目的:比较和评估用于乳腺癌预测的关键监督和半监督机器学习算法的性能和准确性。
    方法:我们使用了9种机器学习分类算法进行监督(SL)和半监督学习(SSL):1)逻辑回归;2)高斯朴素贝叶斯;3)线性支持向量机;4)RBF支持向量机;5)决策树;6)随机森林;7)Xgboost;8)梯度提升;9)KNN。威斯康星州诊断癌症数据集用于训练和测试这些模型。为了保证模型的鲁棒性,我们应用了K折交叉验证和优化的超参数。我们使用准确性对模型进行了评估和比较,精度,召回,F1分数,和ROC曲线。
    结果:使用SL和SSL的所有模型的结果都令人鼓舞。SSL具有很高的准确性(90%-98%),只有一半的训练数据。SL的KNN模子和SSL的logistic回归获得了98的最高精度。
    结论:SSL算法的准确性非常接近SL算法。所有模型的精度在91-98%的范围内。SSL是解决该问题的一种有前途且具有竞争力的方法。使用小样本的标记和低计算能力,SSL完全能够替代SL算法诊断肿瘤类型。
    BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women.
    OBJECTIVE: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction.
    METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves.
    RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%-98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98.
    CONCLUSIONS: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91-98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)技术在生命科学和生物医学研究中的应用日益广泛,这大大增加了我们对免疫学中细胞异质性的认识。肿瘤学和发育生物学。本文将总结各种scRNA-seq技术的发展;主要讨论scRNA-seq在感染性疾病中的应用。探索当前的发展,挑战,以及scRNA-seq技术在未来的潜在应用。
    The increasing application of single-cell RNA sequencing (scRNA-seq) technology in life science and biomedical research has significantly increased our understanding of the cellular heterogeneities in immunology, oncology and developmental biology. This review will summarize the development of various scRNA-seq technologies; primarily discussing the application of scRNA-seq on infectious diseases, and exploring the current development, challenges, and potential applications of scRNA-seq technology in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Case Reports
    一些microRNAs(miRNA)的表达受到香烟烟雾(CS)的调控,这是主要可预防疾病的主要原因。然而,miRNA的表达是否也受到来自潜在风险降低产物的气溶胶/提取物的调节还没有得到很好的研究.目前的工作是对12项体外研究的荟萃分析,这些研究涉及人类器官型上皮组织的呼吸消化道(口腔,牙龈,支气管,鼻部,和小气道上皮)。这些研究比较了暴露于电子蒸气(电子蒸气)产品和加热烟草产品的气溶胶的影响,以及瑞典鼻烟产品的提取物(在目前的工作中,将被称为降低风险的产物[RRP])对miRNA表达的影响,以及暴露于CS或其总颗粒物分数的影响。该荟萃分析评估了总共736个检测到的miRNA和2775个暴露的培养插入物的12个数据集。t分布随机邻居嵌入方法用于发现以组织类型为特征的miRNA响应的多样性的相似性。曝光类型,和产品浓度。CS诱导的牙龈培养物中miRNA表达的变化与口腔培养物接近;类似地,小气道中miRNA表达的改变,支气管,和鼻组织相似。进行监督聚类以鉴定表现出特定反应模式的miRNA。分析确定了一组miRNA,其表达在暴露于CS后在特定组织中发生了改变(例如,miR-125b-5p,miR-132-3p,miR-99a-5p,和146a-5p)。最后,我们通过在单个miRNA水平上计算RRP和CS诱导的改变之间的反应比r,研究了RRP对miRNA表达相对于CS表达的影响,显示相对于CS暴露,RRP暴露后miRNA表达的改变减少(94%相对减少)。没有特定的miRNA反应模式表明暴露于来自加热的烟草产品和电子蒸汽产品的气溶胶,或者瑞典鼻烟的提取物是可以识别的。
    The expression of some microRNAs (miRNA) is modulated in response to cigarette smoke (CS), which is a leading cause of major preventable diseases. However, whether miRNA expression is also modulated by the aerosol/extract from potentially reduced-risk products is not well studied. The present work is a meta-analysis of 12 in vitro studies in human organotypic epithelial cultures of the aerodigestive tract (buccal, gingival, bronchial, nasal, and small airway epithelia). These studies compared the effects of exposure to aerosols from electronic vapor (e-vapor) products and heated tobacco products, and to extracts from Swedish snus products (in the present work, will be referred to as reduced-risk products [RRPs]) on miRNA expression with the effects of exposure to CS or its total particulate matter fraction. This meta-analysis evaluated 12 datasets of a total of 736 detected miRNAs and 2775 exposed culture inserts. The t-distributed stochastic neighbor embedding method was used to find similarities across the diversity of miRNA responses characterized by tissue type, exposure type, and product concentration. The CS-induced changes in miRNA expression in gingival cultures were close to those in buccal cultures; similarly, the alterations in miRNA expression in small airway, bronchial, and nasal tissues resembled each other. A supervised clustering was performed to identify miRNAs exhibiting particular response patterns. The analysis identified a set of miRNAs whose expression was altered in specific tissues upon exposure to CS (e.g., miR-125b-5p, miR-132-3p, miR-99a-5p, and 146a-5p). Finally, we investigated the impact of RRPs on miRNA expression in relation to that of CS by calculating the response ratio r between the RRP- and CS-induced alterations at an individual miRNA level, showing reduced alterations in miRNA expression following RRP exposure relative to CS exposure (94 % relative reduction). No specific miRNA response pattern indicating exposure to aerosols from heated tobacco products and e-vapor products, or extracts from Swedish snus was identifiable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Function at the organ level manifests itself from a heterogeneous collection of cell types. Cellular heterogeneity emerges from developmental processes by which multipotent progenitor cells make fate decisions and transition to specific cell types through intermediate cell states. Although genetic experimental strategies such as lineage tracing have provided insights into cell lineages, recent developments in single-cell technologies have greatly increased our ability to interrogate distinct cell types, as well as transitional cell states in tissue systems. From single-cell data that describe these intermediate cell states, computational tools have been developed to reconstruct cell-state transition trajectories that model cell developmental processes. These algorithms, although powerful, are still in their infancy, and attention must be paid to their strengths and weaknesses when they are used. Here, we review some of these tools, also referred to as pseudotemporal ordering algorithms, and their associated assumptions and caveats. We hope to provide a rational and generalizable workflow for single-cell trajectory analysis that is intuitive for experimental biologists.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号