Zero-inflated negative binomial mixed model

  • 文章类型: Journal Article
    背景:单细胞RNA测序(scRNA-seq)技术能够以单细胞分辨率评估转录组范围的变化。由于不同受试者的环境暴露和遗传背景的异质性,受试者效应是多个受试者的scRNA-seq数据变异的主要来源,这严重混淆了细胞类型特异性差异表达(DE)分析。此外,丢失事件在scRNA-seq数据中很普遍,导致数据中过多的零,这进一步加剧了DE分析中的挑战。
    结果:我们开发了iDESC来检测scRNA-seq数据中两组受试者之间的细胞类型特异性DE基因。iDESC使用零膨胀负二项式混合模型来考虑受试者效应和辍学。脱失事件的发生率(脱失率)被证明取决于基因表达水平,它是通过跨基因汇集信息来建模的。主题效应被建模为负二项式分量的对数均值中的随机效应。我们评估并比较了iDESC与11种现有DE分析方法的性能。使用模拟数据,我们证明了与现有方法相比,iDESC具有良好的I型误差控制和更高的功率。将具有良好控制的I型错误的那些方法应用于来自相同组织和疾病的三个真实scRNA-seq数据集表明,iDESC的结果实现了数据集之间的最佳一致性和最佳疾病相关性。
    结论:iDESC能够通过将受试者效应与疾病效应分开,并考虑辍学来鉴定DE基因,从而获得更准确和可靠的DE分析结果。提示在对多个受试者的scRNA-seq数据的DE分析中考虑受试者效应和退出的重要性。
    BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis.
    RESULTS: We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance.
    CONCLUSIONS: iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    野火在最近几十年发生了变化。灾难性的野火使得有必要在国家范围内建立准确的预测模型来组织消防资源。在地中海国家,野火的数量相当多,但主要集中在夏季。由于季节性,有些地区的火灾数量在某些月份为零,而在其他地区则过度分散。零膨胀负二项混合模型适用于这种类型的数据,因为它们可以描述解释火灾数量及其不发生的模式,并且还提供有用的预测工具。除了基于模型的预测,参数自举方法用于估计均方误差和构造预测区间。统计方法和开发的软件用于建模和预测2002年至2015年间西班牙各省和月份的野火数量。
    Wildfires have changed in recent decades. The catastrophic wildfires make it necessary to have accurate predictive models on a country scale to organize firefighting resources. In Mediterranean countries, the number of wildfires is quite high but they are mainly concentrated around summer months. Because of seasonality, there are territories where the number of fires is zero in some months and is overdispersed in others. Zero-inflated negative binomial mixed models are adapted to this type of data because they can describe patterns that explain both number of fires and their non-occurrence and also provide useful prediction tools. In addition to model-based predictions, a parametric bootstrap method is applied for estimating mean squared errors and constructing prediction intervals. The statistical methodology and developed software are applied to model and to predict number of wildfires in Spain between 2002 and 2015 by provinces and months.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    How individuals respond to environmental change determines the strength and direction of biological processes like recruitment and growth that underpin population productivity. Ascertaining the relative importance of environmental factors can, however, be difficult given the numerous mechanisms through which they affect individuals. This is especially true in dynamic and complex estuarine environments. Here, we develop long-term otolith-based indices of recruitment and growth for estuary perch Percalates colonorum (Bemm River, Australia), to explore the importance of intrinsic (individual, demographic) and extrinsic (hydrologic, climatic, density-dependent) factors in driving estuarine fish productivity. Analyses involved a novel zero-inflated specification of catch curve regression and mixed effects modelling. The 39 years of recruitment and 46 years of growth data, spanning a period of environmental change including severe drought, displayed considerable inter-annual variation. Recruitment success was strongly related to high freshwater inflows during the spawning season, suggesting that these conditions act as spawning cues for adults and potentially provide favourable conditions for larvae. Individuals displayed age-dependent growth, with highest rates observed at younger ages in years characterized by warm temperatures, and to a lesser degree, greater magnitude base inflow conditions. We detected systematic among-year-class growth differences, but these were not attributable to year class strength, suggesting that environmental conditions experienced by individuals as juveniles can have long-lasting effects of greater importance to population productivity than density-dependent growth responses. The primacy of temperature in driving growth variation highlights that under-appreciated climatic variation can affect estuarine fish productivity through direct physiological and indirect food web mechanisms. We predict that climatic warming will promote individual growth in southerly populations of P. colonorum but concurrently limit recruitment due to forecast reductions in spawning season river discharge. Disparate trait responses are likely in other fishes as they respond to multiple and changing environmental drivers, making predictions of future population productivity challenging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号