random forest classifier

随机森林分类器
  • 文章类型: Journal Article
    微生物-药物关联的识别可以极大地促进药物研发。用于筛选微生物-药物关联的传统方法是耗时的,人力密集型,而且行为成本很高,所以计算方法是一个很好的选择。然而,他们中的大多数忽略了丰富序列的组合,结构信息,和微生物-药物网络拓扑。
    在这项研究中,我们开发了一个基于改进型图注意力变分自编码器(MGAVAEMDA)的计算框架,通过将生物信息与变分自编码器相结合来推断潜在的微药物关联.在MGAVAEMDA,我们首先使用了多个数据库,其中包括微生物序列,药物结构,和微生物-药物关联数据库,经过多次相似度计算,建立微生物和药物的两个综合特征矩阵,聚变,平滑,和阈值。然后,我们采用了变分自动编码器和图形注意力的组合来提取微生物和药物的低维特征表示。最后,将低维特征表示和图形邻接矩阵输入随机森林分类器,以获得微生物-药物关联评分,从而识别潜在的微生物-药物关联.此外,为了校正模型复杂性和冗余计算以提高效率,我们引入了一个改进的图卷积神经网络嵌入到变分自动编码器用于计算低维特征。
    实验结果表明,MGAVAEMDA的预测性能优于五种最先进的方法。对于主要测量(AUC=0.9357,AUPR=0.9378),与次优方法相比,MGAVAEMDA的相对改进分别为1.76%和1.47%,分别。
    我们对两种药物进行了案例研究,发现PubMed中已报道了超过85%的预测关联。综合实验结果验证了我们模型在准确推断潜在微生物-药物关联方面的可靠性。
    UNASSIGNED: The identification of microbe-drug associations can greatly facilitate drug research and development. Traditional methods for screening microbe-drug associations are time-consuming, manpower-intensive, and costly to conduct, so computational methods are a good alternative. However, most of them ignore the combination of abundant sequence, structural information, and microbe-drug network topology.
    UNASSIGNED: In this study, we developed a computational framework based on a modified graph attention variational autoencoder (MGAVAEMDA) to infer potential microbedrug associations by combining biological information with the variational autoencoder. In MGAVAEMDA, we first used multiple databases, which include microbial sequences, drug structures, and microbe-drug association databases, to establish two comprehensive feature matrices of microbes and drugs after multiple similarity computations, fusion, smoothing, and thresholding. Then, we employed a combination of variational autoencoder and graph attention to extract low-dimensional feature representations of microbes and drugs. Finally, the lowdimensional feature representation and graphical adjacency matrix were input into the random forest classifier to obtain the microbe-drug association score to identify the potential microbe-drug association. Moreover, in order to correct the model complexity and redundant calculation to improve efficiency, we introduced a modified graph convolutional neural network embedded into the variational autoencoder for computing low dimensional features.
    UNASSIGNED: The experiment results demonstrate that the prediction performance of MGAVAEMDA is better than the five state-of-the-art methods. For the major measurements (AUC =0.9357, AUPR =0.9378), the relative improvements of MGAVAEMDA compared to the suboptimal methods are 1.76 and 1.47%, respectively.
    UNASSIGNED: We conducted case studies on two drugs and found that more than 85% of the predicted associations have been reported in PubMed. The comprehensive experimental results validated the reliability of our models in accurately inferring potential microbe-drug associations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:以前预测心脏手术后谵妄的模型仍然不足。本研究旨在开发和验证基于机器学习的心脏瓣膜手术患者术后谵妄(POD)预测模型。
    方法:从中国南方某三级和主要转诊医院提取心外科重症监护病房(CSICU)1年以上的电子医疗信息,从2019年6月到2020年6月。本研究共纳入了心脏瓣膜手术后入住CSICU的507例患者。七种经典机器学习算法(随机森林分类器,Logistic回归,支持向量机分类器,K近邻分类器,高斯朴素贝叶斯,梯度提升决策树,和感知器.)用于在完整(q=31)和选定(q=19)特征集下开发谵妄预测模型,分别。
    结果:随机森林分类器在两个特征数据集中都表现得非常好,完整特征数据集的曲线下面积(AUC)为0.92,所选特征数据集的AUC为0.86。此外,它实现了相对较低的预期校准误差(ECE)和最高的平均精度(AP),完整特征数据集的AP为0.80,选定特征数据集的AP为0.73。为了进一步评估性能最佳的随机森林分类器,使用SHAP(Shapley加法解释),和重要性矩阵图,散点图,并生成了摘要图。
    结论:我们建立了基于机器学习的预测模型来预测心脏瓣膜手术患者的POD。随机森林模型在预测方面具有最好的预测性能,有助于改善POD患者的预后。
    BACKGROUND: Previous models for predicting delirium after cardiac surgery remained inadequate. This study aimed to develop and validate a machine learning-based prediction model for postoperative delirium (POD) in cardiac valve surgery patients.
    METHODS: The electronic medical information of the cardiac surgical intensive care unit (CSICU) was extracted from a tertiary and major referral hospital in southern China over 1 year, from June 2019 to June 2020. A total of 507 patients admitted to the CSICU after cardiac valve surgery were included in this study. Seven classical machine learning algorithms (Random Forest Classifier, Logistic Regression, Support Vector Machine Classifier, K-nearest Neighbors Classifier, Gaussian Naive Bayes, Gradient Boosting Decision Tree, and Perceptron.) were used to develop delirium prediction models under full (q = 31) and selected (q = 19) feature sets, respectively.
    RESULTS: The Random Forest classifier performs exceptionally well in both feature datasets, with an Area Under the Curve (AUC) of 0.92 for the full feature dataset and an AUC of 0.86 for the selected feature dataset. Additionally, it achieves a relatively lower Expected Calibration Error (ECE) and the highest Average Precision (AP), with an AP of 0.80 for the full feature dataset and an AP of 0.73 for the selected feature dataset. To further evaluate the best-performing Random Forest classifier, SHAP (Shapley Additive Explanations) was used, and the importance matrix plot, scatter plots, and summary plots were generated.
    CONCLUSIONS: We established machine learning-based prediction models to predict POD in patients undergoing cardiac valve surgery. The random forest model has the best predictive performance in prediction and can help improve the prognosis of patients with POD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    COVID-19死亡率预测背景COVID-19已成为全球主要的公共卫生问题,尽管有预防和努力。每天COVID-19病例数迅速增加,与测试程序相关的时间和财务成本是繁重的。方法为了克服这一点,我们的目标是使用机器学习模型鉴定免疫和代谢生物标志物以预测COVID-19死亡率.我们纳入了2020年1月1日至9月30日期间香港公立医院的住院患者,这些患者使用RT-PCR诊断为COVID-19。我们开发了三种机器学习模型来根据COVID-19患者的电子病历数据预测其死亡率。我们进行了统计分析,以比较深度神经网络(DNN)的训练后的机器学习模型,随机森林分类器(RF)和支持向量机(SVM)使用来自5059名患者(中位年龄=46岁;49.3%男性)的数据,这些患者基于电子健康记录和532,427名患者作为对照的数据检测出COVID-19阳性。结果我们确定了可以准确预测COVID-19死亡风险的前20种免疫和代谢生物标志物,ROC-AUC为0.98(95%CI0.96-0.98)。在使用的三种模型中,我们的结果表明,随机森林(RF)模型在COVID-19患者中实现了最准确的死亡率预测,肾小球滤过,白蛋白,尿素,降钙素原,c反应蛋白,氧气,碳酸氢盐,二氧化碳,铁蛋白,葡萄糖,红细胞,肌酐,淋巴细胞,血液和白细胞的PH是确定的最重要的生物标志物。来自广华医院的队列(131名患者)用于模型验证,ROC-AUC为0.90(95%CI0.84-0.92)。结论建议医师密切监测血液学,凝血,心脏,肝,COVID-19患者中肾脏和炎症因子可能进展为严重疾病。据我们所知,以前的研究中没有发现重要的免疫和代谢生物标志物,达到我们研究中所证明的程度.
    在线版本包含补充材料,可在10.1186/s44247-022-00001-0获得。
    COVID-19 mortality prediction Background COVID-19 has become a major global public health problem, despite prevention and efforts. The daily number of COVID-19 cases rapidly increases, and the time and financial costs associated with testing procedure are burdensome. Method To overcome this, we aim to identify immunological and metabolic biomarkers to predict COVID-19 mortality using a machine learning model. We included inpatients from Hong Kong\'s public hospitals between January 1, and September 30, 2020, who were diagnosed with COVID-19 using RT-PCR. We developed three machine learning models to predict the mortality of COVID-19 patients based on data in their electronic medical records. We performed statistical analysis to compare the trained machine learning models which are Deep Neural Networks (DNN), Random Forest Classifier (RF) and Support Vector Machine (SVM) using data from a cohort of 5,059 patients (median age = 46 years; 49.3% male) who had tested positive for COVID-19 based on electronic health records and data from 532,427 patients as controls. Result We identified top 20 immunological and metabolic biomarkers that can accurately predict the risk of mortality from COVID-19 with ROC-AUC of 0.98 (95% CI 0.96-0.98). Of the three models used, our result demonstrate that the random forest (RF) model achieved the most accurate prediction of mortality among COVID-19 patients with age, glomerular filtration, albumin, urea, procalcitonin, c-reactive protein, oxygen, bicarbonate, carbon dioxide, ferritin, glucose, erythrocytes, creatinine, lymphocytes, PH of blood and leukocytes among the most important biomarkers identified. A cohort from Kwong Wah Hospital (131 patients) was used for model validation with ROC-AUC of 0.90 (95% CI 0.84-0.92). Conclusion We recommend physicians closely monitor hematological, coagulation, cardiac, hepatic, renal and inflammatory factors for potential progression to severe conditions among COVID-19 patients. To the best of our knowledge, no previous research has identified important immunological and metabolic biomarkers to the extent demonstrated in our study.
    UNASSIGNED: The online version contains supplementary material available at 10.1186/s44247-022-00001-0.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大脑中的脑微出血(CMBs)是严重脑疾病如痴呆和缺血性中风的重要指标。一般来说,CMB由专家手动检测,这是一项生产力有限的详尽任务。由于CMBs具有复杂的形态学性质,手动检测容易出错。本文提出了一种基于统计特征提取和分类的脑磁敏感加权成像(SWI)扫描中基于机器学习的自动CMB检测技术。所提出的方法包括三个步骤:(1)去除颅骨和提取大脑;(2)阈值用于提取初始候选物;和(3)提取特征并应用分类模型,例如随机森林和朴素贝叶斯分类器,以检测真阳性CMB。所提出的技术在由20个受试者组成的数据集上进行了验证。该数据集被分成由具有104个微出血的14个受试者组成的训练数据和由具有63个微出血的6个受试者组成的测试数据。我们能够使用随机森林分类器实现85.7%的灵敏度,每个CMB有4.2个假阳性,朴素贝叶斯分类器实现了90.5%的灵敏度,每个CMB有5.5个假阳性。所提出的技术优于先前研究中提出的许多最先进的方法。
    Cerebral microbleeds (CMBs) in the brain are the essential indicators of critical brain disorders such as dementia and ischemic stroke. Generally, CMBs are detected manually by experts, which is an exhaustive task with limited productivity. Since CMBs have complex morphological nature, manual detection is prone to errors. This paper presents a machine learning-based automated CMB detection technique in the brain susceptibility-weighted imaging (SWI) scans based on statistical feature extraction and classification. The proposed method consists of three steps: (1) removal of the skull and extraction of the brain; (2) thresholding for the extraction of initial candidates; and (3) extracting features and applying classification models such as random forest and naïve Bayes classifiers for the detection of true positive CMBs. The proposed technique is validated on a dataset consisting of 20 subjects. The dataset is divided into training data that consist of 14 subjects with 104 microbleeds and testing data that consist of 6 subjects with 63 microbleeds. We were able to achieve 85.7% sensitivity using the random forest classifier with 4.2 false positives per CMB, and the naïve Bayes classifier achieved 90.5% sensitivity with 5.5 false positives per CMB. The proposed technique outperformed many state-of-the-art methods proposed in previous studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Published Erratum
    [这修正了文章DOI:10.3389/freur.2023.1129470。].
    [This corrects the article DOI: 10.3389/fneur.2023.1129470.].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阿尔茨海默病(Alzheimer’sdisease,AD)是一种神经退行性疾病,主要发生在老年人认知障碍患者身上。尽管细胞外β-淀粉样蛋白(Aβ)积累和tau蛋白过度磷酸化被认为是导致AD的主要原因,AD的分子机制尚不清楚。因此,在这项研究中,我们旨在探索AD的潜在生物标志物。下一代测序(NGS)数据集,从基因表达综合(GEO)数据库收集GSE173955和GSE203206。差异表达基因(DEGs)分析,基因本体论(GO)功能富集,京都基因和基因组百科全书(KEGG)途径富集,和蛋白质-蛋白质网络被用来鉴定与AD潜在相关的基因。使用Cytoscape分析基于DEG的蛋白质-蛋白质相互作用(PPI)网络,表明神经炎症和T细胞抗原受体(TCR)相关基因(LCK,ZAP70和CD44)是前三个hub基因。接下来,我们在AD数据库中验证了这三个hub基因,并利用来自不同AD数据集(GSE15222)的两个机器学习模型来观察它们与AD的一般关系.使用随机森林分类器的分析表明,使用前三个基因作为输入观察到的准确性(78%)与使用所有基因作为输入观察到的准确性(84%)仅略有不同。此外,另一个数据集,GSE97760,使用我们新的特征值分解方法进行了分析,表明前三个hub基因可能与AD相关的tau蛋白病有关,而不是Aβ病理学。此外,蛋白质-蛋白质对接模拟显示,顶部hub基因可以与乙酰胆碱酯酶(ACHE)形成稳定的结合位点。这表明hub基因和ACHE之间存在潜在的相互作用,这在抗AD药物设计的开发中起着至关重要的作用。总的来说,这项研究的结果,系统分析了几个AD数据集,说明了LCK,ZAP70和CD44可用作AD生物标志物。我们还建立了对AD患者进行分类的稳健预测模型。
    Alzheimer\'s disease (AD) is a neurodegenerative disease that primarily occurs in elderly individuals with cognitive impairment. Although extracellular β-amyloid (Aβ) accumulation and tau protein hyperphosphorylation are considered to be leading causes of AD, the molecular mechanism of AD remains unknown. Therefore, in this study, we aimed to explore potential biomarkers of AD. Next-generation sequencing (NGS) datasets, GSE173955 and GSE203206, were collected from the Gene Expression Omnibus (GEO) database. Analysis of differentially expressed genes (DEGs), gene ontology (GO) functional enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, and protein-protein networks were performed to identify genes that are potentially associated with AD. Analysis of the DEG based protein-protein interaction (PPI) network using Cytoscape indicated that neuroinflammation and T-cell antigen receptor (TCR)-associated genes (LCK, ZAP70, and CD44) were the top three hub genes. Next, we validated these three hub genes in the AD database and utilized two machine learning models from different AD datasets (GSE15222) to observe their general relationship with AD. Analysis using the random forest classifier indicated that accuracy (78%) observed using the top three genes as inputs differed only slightly from that (84%) observed using all genes as inputs. Furthermore, another data set, GSE97760, which was analyzed using our novel eigenvalue decomposition method, indicated that the top three hub genes may be involved in tauopathies associated with AD, rather than Aβ pathology. In addition, protein-protein docking simulation revealed that the top hub genes could form stable binding sites with acetylcholinesterase (ACHE). This suggests a potential interaction between hub genes and ACHE, which plays an essential role in the development of anti-AD drug design. Overall, the findings of this study, which systematically analyzed several AD datasets, illustrated that LCK, ZAP70, and CD44 may be used as AD biomarkers. We also established a robust prediction model for classifying patients with AD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    骨肉瘤占成人原发性骨恶性肿瘤的28%,在儿童和青少年(<20岁)中高达56%。然而,早期诊断和治疗仍然不足,仍然需要新的改进。由于传统诊断方法较少,因此存在漏诊。临床症状通常在诊断前就已经存在。本研究旨在开发用于诊断骨肉瘤的新型有效预测模型,并确定探索骨肉瘤标志物的潜在靶标。首先,骨肉瘤和正常组织表达微阵列数据集从基因表达综合(GEO)下载.然后,我们在训练组中筛选了骨肉瘤和正常组的差异表达基因(DEGs)。接下来,为了探索DEGs的生物学相关作用,还对DEGs进行了形貌和富集分析。使用R软件中的“randomForest”和“neuralnet”软件包选择代表性基因并构建骨肉瘤诊断模型。下一步是验证人工神经网络的模型。然后,我们使用训练集数据进行了免疫浸润分析.最后,我们使用代表性基因构建了预后模型进行预后分析.还分析了骨肉瘤的拷贝数。随机森林分类器识别了9个代表性基因(ANK1、TGFBR3、TNFRSF21、HSPB8、ITGA7、RHD、AASS,GREM2,NFASC)。HSPB8,RHD,AASS,和NFASC是我们鉴定的基因,以前没有报道过与骨肉瘤相关。我们构建的骨肉瘤诊断模型在训练和验证组中具有良好的性能,曲线下面积(AUC)为1和0.987,分别。这项研究为骨肉瘤的早期诊断开辟了新的视野,并为骨肉瘤的未来治疗提供了代表性的标志物。这是首次开创骨肉瘤基因诊断模型的建立,推进骨肉瘤诊断和治疗的发展。
    Osteosarcoma accounts for 28% of primary bone malignancies in adults and up to 56% in children and adolescents (<20 years). However, early diagnosis and treatment are still inadequate, and new improvements are still needed. Missed diagnoses exist due to fewer traditional diagnostic methods, and clinical symptoms are often already present before diagnosis. This study aimed to develop novel and efficient predictive models for the diagnosis of osteosarcoma and to identify potential targets for exploring osteosarcoma markers. First, osteosarcoma and normal tissue expression microarray datasets were downloaded from the Gene Expression Omnibus (GEO). Then we screened the differentially expressed genes (DEGs) in the osteosarcoma and normal groups in the training group. Next, in order to explore the biologically relevant role of DEGs, Metascape and enrichment analyses were also performed on DEGs. The \"randomForest\" and \"neuralnet\" packages in R software were used to select representative genes and construct diagnostic models for osteosarcoma. The next step is to validate the model of the artificial neural network. Then, we performed an immune infiltration analysis by using the training set data. Finally, we constructed a prognostic model using representative genes for prognostic analysis. The copy number of osteosarcoma was also analyzed. A random forest classifier identified nine representative genes (ANK1, TGFBR3, TNFRSF21, HSPB8, ITGA7, RHD, AASS, GREM2, NFASC). HSPB8, RHD, AASS, and NFASC were genes we identified that have not been previously reported to be associated with osteosarcoma. The osteosarcoma diagnostic model we constructed has good performance with areas under the curves (AUCs) of 1 and 0.987 in the training and validation groups, respectively. This study opens new horizons for the early diagnosis of osteosarcoma and provides representative markers for the future treatment of osteosarcoma. This is the first study to pioneer the establishment of a genetic diagnosis model for osteosarcoma and advance the development of osteosarcoma diagnosis and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    酱香白酒(SAB)是中国最著名的白酒之一。SAB中含有500多种香气化合物。然而,SAB风味中的关键香气成分尚不清楚。挥发物在SAB香气中起着重要作用,并且与SAB质量高度相关。在本研究中,使用带有火焰离子化检测器的气相色谱(GC-FID)对66个SAB样品中的63个挥发性化合物进行了定量。作者分析了SAB样品的两个质量组中的气味贡献和挥发性化合物相关性。此外,使用基于气味活性值(OAV)比率的随机森林分类器来解释两个质量组之间的挥发性化合物关系差异.我们的结果证明更高质量的SAB具有更丰富的香气,并表明一组果味状的戊酸乙酯,绿色和麦芽样异丁醛和麦芽样3-甲基丁醛和甜味样糠醛,在更高质量的SAB中具有更紧密的共丰度相关性。这些结果表明,SAB中挥发性化合物的香气和贡献不仅应与化合物的气味活性值进行分析,还有不同香气化合物之间的相关性。
    Sauce-aroma Baijiu (SAB) is one of the most famous Baijius in China; SAB has more than 500 aroma compounds in it. However, the key aroma compound in SAB flavor remains unclear. Volatiles play an important role in SAB aroma and are highly correlated to SAB quality. In the present study, 63 volatile compounds were quantified among 66 SAB samples using gas chromatography with flame ionization detector (GC-FID). The authors analyzed odor contributions and volatile compound correlations in two quality groups of SAB samples. Moreover, an odor activity value (OAV) ratio-based random forest classifier was used to explain the volatile compound relationship differentiations between the two quality groups. Our results proved higher quality SABs had richer aromas and indicated a set of fruity-like ethyl valerate, green- and malt-like isobutyraldehyde and malt-like 3-methylbutyraldehyde and sweet-like furfural, had closer co-abundance correlations in higher quality SABs. These results indicated that the aroma and contributions of volatile compounds in SABs should be analyzed not only with compound odor activity values, but also the correlations between different aroma compounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    苯丙酮尿症(PKU)是一种具有氨基酸代谢缺陷的遗传性疾病,这对新生儿和儿童的发育有很大的危害。早期诊断和治疗可有效预防疾病进展。在这里,我们使用随机森林分类器(RFC)开发了PKU筛选模型,以提高PKU筛选性能,具有出色的灵敏度,所有验证数据集和两个测试中国人群的假阳性率(FPR)和阳性预测值(PPV)。比较基于机器学习的几种不同分类模型和传统逻辑回归模型,RFC具有突出的优势。RFC有望应用于新生儿PKU筛查。
    Phenylketonuria (PKU) is a genetic disorder with amino acid metabolic defect, which does great harms to the development of newborns and children. Early diagnosis and treatment can effectively prevent the disease progression. Here we developed a PKU screening model using random forest classifier (RFC) to improve PKU screening performance with excellent sensitivity, false positive rate (FPR) and positive predictive value (PPV) in all the validation dataset and two testing Chinese populations. RFC represented outstanding advantages comparing several different classification models based on machine learning and the traditional logistic regression model. RFC is promising to be applied to neonatal PKU screening.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    柳枝草低地生态型与高地生态型相比,生物量明显较高,但耐寒性较低。了解冷反应的分子机制,包括转录水平的,有助于提高低温和冰冻环境条件下对高产柳枝莲的耐受性。这里,通过分析现有的柳枝枝属转录组数据集,通过计算剖析了柳枝科学系对寒冷的转录反应的时间顺式调控基础。我们发现,随着冷处理时间从30分钟增加到24小时,冷反应基因和富集基因本体论术语的数量增加,提示冷反应基因表达中的反应/级联效应。为了确定可能对调节冷反应很重要的基因组序列,使用富含冷反应基因的基因区和侧翼区而非无反应基因的k-mer序列建立了预测冷反应的机器学习模型。这些k-mers,被称为推定的顺式调节元件(pCREs)的可能是柳枝孢中冷反应的调节序列。总共有655个pCRE,其中54个在所有冷处理时间点中是重要的。与此一致,35个已知的冷反应CRE中有8个与模型中排名最高的pCRE相似,只有这8个对预测时间冷反应很重要。更重要的是,大多数排名靠前的pCRE都是冷调控中的新序列。我们的发现表明,以前未知的其他序列元素对冷反应调节很重要,需要进一步研究。
    Switchgrass low-land ecotypes have significantly higher biomass but lower cold tolerance compared to up-land ecotypes. Understanding the molecular mechanisms underlying cold response, including the ones at transcriptional level, can contribute to improving tolerance of high-yield switchgrass under chilling and freezing environmental conditions. Here, by analyzing an existing switchgrass transcriptome dataset, the temporal cis-regulatory basis of switchgrass transcriptional response to cold is dissected computationally. We found that the number of cold-responsive genes and enriched Gene Ontology terms increased as duration of cold treatment increased from 30 min to 24 hours, suggesting an amplified response/cascading effect in cold-responsive gene expression. To identify genomic sequences likely important for regulating cold response, machine learning models predictive of cold response were established using k-mer sequences enriched in the genic and flanking regions of cold-responsive genes but not non-responsive genes. These k-mers, referred to as putative cis-regulatory elements (pCREs) are likely regulatory sequences of cold response in switchgrass. There are in total 655 pCREs where 54 are important in all cold treatment time points. Consistent with this, eight of 35 known cold-responsive CREs were similar to top-ranked pCREs in the models and only these eight were important for predicting temporal cold response. More importantly, most of the top-ranked pCREs were novel sequences in cold regulation. Our findings suggest additional sequence elements important for cold-responsive regulation previously not known that warrant further studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号