lasso regression model

Lasso 回归模型
  • 文章类型: Journal Article
    目的:本研究的目的是通过多种机器学习算法确定青光眼中心基因。
    背景:青光眼多年来困扰着许多患者,眼睛压力过大,会不断损害神经系统,导致严重失明。目前缺乏有效的分子诊断方法。
    目的:本研究试图揭示中心基因在青光眼中的分子机制和基因调控网络。随后试图揭示由hub基因调节的药物-基因-疾病网络。
    方法:通过基因表达综合数据库获得微阵列测序数据集(GSE9944)。鉴定了青光眼中差异表达的基因。基于这些基因,我们构建了三个用于特征训练的机器学习模型,随机森林模型(RF)最小绝对收缩和选择算子回归模型(LASSO),和支持向量机模型(SVM)。同时,对GSE9944表达谱进行加权基因共表达网络分析(WGCNA)以鉴定青光眼相关基因。四组中的重叠基因被认为是青光眼的中心基因。基于这些基因,我们还构建了青光眼的分子诊断模型。在这项研究中,我们还进行了分子对接分析,以探索针对hub基因的基因-药物网络.此外,我们应用CIBERSORT方法评估了青光眼样本和正常样本中的免疫细胞浸润情况。
    结果:确定了8个hub基因:ATP6V0D1,PLEC,SLC25A1,HRSP12,PKN1,RHOD,TMEM158和GSN。诊断模型显示出优异的诊断性能(曲线下面积=1)。GSN可能正调节T细胞CD4原初以及负调节T细胞调节(Tregs)。此外,我们构建了基因-药物网络,试图探索新型青光眼治疗药物.
    结论:我们的结果系统地确定了8个hub基因,并建立了可以诊断青光眼的分子诊断模型。我们的研究为未来青光眼发病机制的系统研究提供了基础。
    OBJECTIVE: The aims of this study were to determine hub genes in glaucoma through multiple machine learning algorithms.
    BACKGROUND: Glaucoma has afflicted many patients for many years, with excessive pressure in the eye continuously damaging the nervous system and leading to severe blindness. An effective molecular diagnostic method is currently lacking.
    OBJECTIVE: The present study attempted to reveal the molecular mechanism and gene regulatory network of hub genes in glaucoma, followed by an attempt to reveal the drug-gene-disease network regulated by hub genes.
    METHODS: A microarray sequencing dataset (GSE9944) was obtained through the Gene Expression Omnibus database. The differentially expressed genes in Glaucoma were identified. Based on these genes, we constructed three machine learning models for feature training, Random Forest model (RF), Least absolute shrinkage and selection operator regression model (LASSO), and Support Vector Machines model (SVM). Meanwhile, Weighted Gene Co-Expression Network Analysis (WGCNA) was performed for GSE9944 expression profiles to identify Glaucoma-related genes. The overlapping genes in the four groups were considered as hub genes of Glaucoma. Based on these genes, we also constructed a molecular diagnostic model of Glaucoma. In this study, we also performed molecular docking analysis to explore the gene-drug network targeting hub genes. In addition, we evaluated the immune cell infiltration landscape in Glaucoma samples and normal samples by applying CIBERSORT method.
    RESULTS: 8 hub genes were determined: ATP6V0D1, PLEC, SLC25A1, HRSP12, PKN1, RHOD, TMEM158 and GSN. The diagnostic model showed excellent diagnostic performance (area under the curve=1). GSN might positively regulate T cell CD4 naïve as well as negatively regulate T cell regulation (Tregs). In addition, we constructed gene-drug networks in an attempt to explore novel therapeutic agents for Glaucoma.
    CONCLUSIONS: Our results systematically determined 8 hub genes and established a molecular diagnostic model that allowed the diagnosis of Glaucoma. Our study provided a basis for future systematic studies of Glaucoma pathogenesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:川崎病(KD)是一种全身性血管炎,通常影响儿童,其病因尚不清楚。越来越多的证据表明,外周血中免疫介导的炎症和免疫细胞在KD的病理生理中起着至关重要的作用。这项研究的目的是寻找与KD有关的重要生物标志物和免疫相关机制,以及它们与外周血中免疫细胞的相关性。
    方法:在本研究中使用来自基因表达综合(GEO)的基因微阵列数据。三个数据集,即GSE63881(341个样品),GSE73463(233个样品),和GSE73461(279个样品),已获得。为了找到相交的基因,我们采用了差异表达基因(DEGs)分析和加权基因共表达网络分析(WGCNA)。随后,功能注释,蛋白质-蛋白质相互作用(PPI)网络的构建,进行最小绝对收缩和选择算子(LASSO)回归以鉴定hub基因。使用接受者工作特征曲线(ROC)评估了这些hub基因在鉴定KD中的准确性。此外,采用基因集变异分析(GSVA)来探索所评估的数据集内的循环免疫细胞的组成及其与中心基因标记的关系。
    结果:WGCNA产生了八个共表达模块,一个集线器模块(MEblue模块)与急性KD的相关性最强。鉴定了425个不同的基因。整合WGCNA和DEGs产生总共277个交叉基因。通过进行LASSO分析,5个hub基因(S100A12、MMP9、TLR2、NLRC4和ARG1)被鉴定为KD的潜在生物标志物。通过ROC曲线分析证明了这5个hub基因的诊断价值,表明它们在诊断KD方面具有很高的准确性。对评估数据集内循环免疫细胞组成的分析揭示了KD与各种免疫细胞类型之间的显著关联。包括激活的树突状细胞,中性粒细胞,未成熟树突状细胞,巨噬细胞,和激活的CD8T细胞。重要的是,所有五个hub基因都与免疫细胞表现出很强的相关性。
    结论:激活的树突状细胞,中性粒细胞,巨噬细胞与KD的发病密切相关。此外,hub基因(S100A12、MMP9、TLR2、NLRC4和ARG1)可能通过免疫相关信号通路参与KD的致病机制。
    BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that commonly affects children and its etiology remains unknown. Growing evidence suggests that immune-mediated inflammation and immune cells in the peripheral blood play crucial roles in the pathophysiology of KD. The objective of this research was to find important biomarkers and immune-related mechanisms implicated in KD, along with their correlation with immune cells in the peripheral blood.
    METHODS: Gene microarray data from the Gene Expression Omnibus (GEO) was utilized in this study. Three datasets, namely GSE63881 (341 samples), GSE73463 (233 samples), and GSE73461 (279 samples), were obtained. To find intersecting genes, we employed differentially expressed genes (DEGs) analysis and weighted gene co-expression network analysis (WGCNA). Subsequently, functional annotation, construction of protein-protein interaction (PPI) networks, and Least Absolute Shrinkage and Selection Operator (LASSO) regression were performed to identify hub genes. The accuracy of these hub genes in identifying KD was evaluated using the receiver operating characteristic curve (ROC). Furthermore, Gene Set Variation Analysis (GSVA) was employed to explore the composition of circulating immune cells within the assessed datasets and their relationship with the hub gene markers.
    RESULTS: WGCNA yielded eight co-expression modules, with one hub module (MEblue module) exhibiting the strongest association with acute KD. 425 distinct genes were identified. Integrating WGCNA and DEGs yielded a total of 277 intersecting genes. By conducting LASSO analysis, five hub genes (S100A12, MMP9, TLR2, NLRC4 and ARG1) were identified as potential biomarkers for KD. The diagnostic value of these five hub genes was demonstrated through ROC curve analysis, indicating their high accuracy in diagnosing KD. Analysis of the circulating immune cell composition within the assessed datasets revealed a significant association between KD and various immune cell types, including activated dendritic cells, neutrophils, immature dendritic cells, macrophages, and activated CD8 T cells. Importantly, all five hub genes exhibited strong correlations with immune cells.
    CONCLUSIONS: Activated dendritic cells, neutrophils, and macrophages were closely associated with the pathogenesis of KD. Furthermore, the hub genes (S100A12, MMP9, TLR2, NLRC4, and ARG1) are likely to participate in the pathogenic mechanisms of KD through immune-related signaling pathways.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究旨在探讨肌酸激酶同工酶(CK-MB)的地理空间分布,为临床检查提供科学依据。通过阅读大量文献,收集了中国137个城市8697名健康成年人的CK-MB参考值。莫兰指数用于确定空间关系,选择了24个因素,属于地形,气候,和土壤指数。对CK-MB和地理因素进行相关性分析,以确定显著性。提取了9个显著性因子。基于R语言评估模型的多重共线性程度,CK-MB脊模型,套索模型,建立了PCA模型,通过计算相对误差来选择最佳的PCA模型,测试预测值的正常性,并选择析取克里格插值来进行地理分布。结果表明,健康成年人的CK-MB参考值与纬度大致相关,年日照持续时间,年平均相对湿度,年降水量,和年气温范围,并与年平均气温显着相关,表土砾石含量,粘土中的表土阳离子交换能力,和表层土壤中的阳离子交换能力。地理空间分布图显示,北部较高,南部较低,并从东南沿海地区向西北内陆地区逐渐增加。如果地理因素是在某个位置获得的,CK-MB模型可用于预测该地区健康成年人的CK-MB,为我们在临床诊断中考虑区域差异提供了参考。
    The aim of this study was to investigate the geographical spatial distribution of creatine kinase isoenzyme (CK-MB) in order to provide a scientific basis for clinical examination. The reference values of CK-MB of 8697 healthy adults in 137 cities in China were collected by reading a large number of literates. Moran index was used to determine the spatial relationship, and 24 factors were selected, which belonged to terrain, climate, and soil indexes. Correlation analysis was conducted between CK-MB and geographical factors to determine significance, and 9 significance factors were extracted. Based on R language to evaluate the degree of multicollinearity of the model, CK-MB Ridge model, Lasso model, and PCA model were established, through calculating the relative error to choose the best model PCA, testing the normality of the predicted values, and choosing the disjunctive kriging interpolation to make the geographical distribution. The results show that CK-MB reference values of healthy adults were generally correlated with latitude, annual sunshine duration, annual mean relative humidity, annual precipitation amount, and annual range of air temperature and significantly correlated with annual mean air temperature, topsoil gravel content, topsoil cation exchange capacity in clay, and topsoil cation exchange capacity in silt. The geospatial distribution map shows that on the whole, it is higher in the north and lower in the south, and gradually increases from the southeast coastal area to the northwest inland area. If the geographical factors are obtained in a location, the CK-MB model can be used to predict the CK-MB of healthy adults in the region, which provides a reference for us to consider regional differences in clinical diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    企业碳绩效是实现企业可持续发展的关键驱动力。识别影响企业碳排放的因素是提高碳绩效的基础。基于碳披露项目(CDP)数据库,我们整合最小绝对收缩和选择算子(LASSO)回归模型和固定效应模型来确定碳排放的决定因素。此外,我们根据决定因素的重要性进行排序。我们发现Capx在所有碳环境下都进入模型。对于范围1和范围2,财务层面的因素发挥更大的作用。对于范围3,企业内部激励政策和减排行为很重要。与绝对碳排放不同,对于相对碳排放,财务层面的因素\'偿债能力是企业碳排放影响的重要参考指标。
    Corporate carbon performance is a key driver of achieving corporate sustainability. The identification of factors that influence corporate carbon emissions is fundamental to promoting carbon performance. Based on the carbon disclosure project (CDP) database, we integrate the least absolute shrinkage and selection operator (LASSO) regression model and the fixed effects model to identify the determinants of carbon emissions. Furthermore, we rank determining factors according to their importance. We find that Capx enters the models under all carbon contexts. For Scope 1 and Scope 2, financial-level factors play a greater role. For Scope 3, corporate internal incentive policies and emission reduction behaviors are important. Different from absolute carbon emissions, for relative carbon emissions, the financial-level factors\' debt-paying ability is a vital reference indicator for the impact of corporate carbon emissions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Kawasaki disease (KD) is a febrile systemic vasculitis involvingchildren younger than five years old. However, the specific biomarkers and precise mechanisms of this disease are not fully understood, which can delay the best treatment time, hence, this study aimed to detect the potential biomarkers and pathophysiological process of KD through bioinformatic analysis.
    The Gene Expression Omnibus database (GEO) was the source of the RNA sequencing data from KD patients. Differential expressed genes (DEGs) were screened between KD patients and healthy controls (HCs) with the \"limma\" R package. Weighted gene correlation network analysis (WGCNA) was performed to discover the most corresponding module and hub genes of KD. The node genes were obtained by the combination of the least absolute shrinkage and selection operator (LASSO) regression model with the top 5 genes from five algorithms in CytoHubba, which were further validated with the receiver operating characteristic curve (ROC curve). CIBERSORTx was employed to discover the constitution of immune cells in KDs and HCs. Functional enrichment analysis was performed to understand the biological implications of the modular genes. Finally, competing endogenous RNAs (ceRNA) networks of node genes were predicted using online databases.
    A total of 267 DEGs were analyzed between 153 KD patients and 92 HCs in the training set, spanning two modules according to WGCNA. The turquoise module was identified as the hub module, which was mainly enriched in cell activation involved in immune response, myeloid leukocyte activation, myeloid leukocyte mediated immunity, secretion and leukocyte mediated immunity biological processes; included type II diabetes mellitus, nicotinate and nicotinamide metabolism, O-glycan biosynthesis, glycerolipid and glutathione metabolism pathways. The node genes included ADM, ALPL, HK3, MMP9 and S100A12, and there was good performance in the validation studies. Immune cell infiltration analysis revealed that gamma delta T cells, monocytes, M0 macrophage, activated dendritic cells, activated mast cells and neutrophils were elevated in KD patients. Regarding the ceRNA networks, three intact networks were constructed: NEAT1/NORAD/XIST-hsa-miR-524-5p-ADM, NEAT1/NORAD/XIST-hsa-miR-204-5p-ALPL, NEAT1/NORAD/XIST-hsa-miR-524-5p/hsa-miR-204-5p-MMP9.
    To conclude, the five-gene signature and three ceRNA networks constructed in our study are of great value in the early diagnosis of KD and might help to elucidate our understanding of KD at the RNA regulatory level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    铅中毒通常被认为是一种传统疾病;然而,毒性的具体机制尚不清楚.研究铅诱导的细胞代谢途径变化对于了解与环境暴露于铅相关的生物反应和疾病很重要。最近,代谢组学研究引起了人们的广泛关注,以详细了解对铅暴露的生物学反应以及相关的毒性机制。在本研究中,调查了从铅污染区域(N=18)和控制区(N=10)收集的野生啮齿动物。这是该领域首次对暴露于铅的野生动物进行的实验性代谢组学研究。虽然铅污染地区的血浆苯丙氨酸和异亮氨酸水平明显高于对照组,在污染区,羟基丁酸的含量略高,提示脂质代谢增强的可能性。在区域间最小绝对收缩和选择算子(Lasso)回归模型分析中,苯丙氨酸和异亮氨酸被鉴定为可能的生物标志物,这与随机森林模型是一致的。此外,在随机森林模型中,戊二酸,谷氨酰胺,选择羟基丁酸。与以前的研究一致,富集分析显示尿素循环和ATP结合盒转运蛋白途径发生改变.尽管在这项研究中观察到了区域啮齿动物物种偏见,应该考虑到相对较小的样本量,目前的结果在一定程度上与以前对人类和实验动物的研究一致。
    Lead poisoning is often considered a traditional disease; however, the specific mechanism of toxicity remains unclear. The study of Pb-induced alterations in cellular metabolic pathways is important to understand the biological response and disorders associated with environmental exposure to lead. Metabolomics studies have recently been paid considerable attention to understand in detail the biological response to lead exposure and the associated toxicity mechanisms. In the present study, wild rodents collected from an area contaminated with lead (N = 18) and a control area (N = 10) were investigated. This was the first ever experimental metabolomic study of wildlife exposed to lead in the field. While the levels of plasma phenylalanine and isoleucine were significantly higher in a lead-contaminated area versus the control area, hydroxybutyric acid was marginally significantly higher in the contaminated area, suggesting the possibility of enhancement of lipid metabolism. In the interregional least-absolute shrinkage and selection operator (lasso) regression model analysis, phenylalanine and isoleucine were identified as possible biomarkers, which is in agreement with the random forest model. In addition, in the random forest model, glutaric acid, glutamine, and hydroxybutyric acid were selected. In agreement with previous studies, enrichment analysis showed alterations in the urea cycle and ATP-binding cassette transporter pathways. Although regional rodent species bias was observed in this study, and the relatively small sample size should be taken into account, the present results are to some extent consistent with those of previous studies on humans and laboratory animals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    BACKGROUND: ceRNAs have emerged as pivotal players in the regulation of gene expression and play a crucial role in the physiology and development of various cancers. Nevertheless, the function and underlying mechanisms of ceRNAs in esophageal cancer (EC) are still largely unknown.
    METHODS: In this study, profiles of DEmRNAs, DElncRNAs, and DEmiRNAs between normal and EC tumor tissue samples were obtained from the Cancer Genome Atlas database using the DESeq package in R by setting the adjusted P<0.05 and |log2(fold change)|>2 as the cutoff. The ceRNA network (ceRNet) was initially constructed to reveal the interaction of these ceRNAs during carcinogenesis based on the bioinformatics of miRcode, miRDB, miRTarBase, and TargetScan. Then, independent microarray data of GSE6188, GSE89102, and GSE92396 and correlation analysis were used to validate molecular biomarkers in the initial ceRNet. Finally, a least absolute shrinkage and selection operator logistic regression model was built using an oncogenic ceRNet to diagnose EC more accurately.
    RESULTS: We successfully constructed an oncogenic ceRNet of EC, crosstalk of hsa-miR372-centered CADM2-ADAMTS9-AS2 and hsa-miR145-centered SERPINE1-PVT1. In addition, the risk-score model -0.0053*log2(CADM2)+0.0168*log2(SERPINE1)-0.0073*log2(ADAMTS9-AS2)+0.0905*log2(PVT1)+0.0047*log2(hsa-miR372)-0.0193*log2(hsa-miR145), (log2[gene count]) could improve diagnosis of EC with an AUC of 0.988.
    CONCLUSIONS: We identified two novel pairs of ceRNAs in EC and its role of diagnosis. The pairs of hsa-miR372-centered CADM2-ADAMTS9-AS2 and hsa-miR145-centered SERPINE1-PVT1 were likely potential carcinogenic mechanisms of EC, and their joint detection could improve diagnostic accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号