Data-driven modeling

数据驱动建模
  • 文章类型: Journal Article
    地下水系统是巨大的天然水库,用于支持人类的用水需求和生态系统服务。已经开发了各种建模方法来帮助管理这些复杂的高动态系统。本文讨论了三种建模方法的优点和局限性,即:基于过程的,数据驱动和系统动力学建模。出于演示目的,这三种建模方法适用于科尼亚封闭盆地,位于土耳其中部的半干旱气候的大型农业区。基于过程的建模基于管理过程的基于理论的表示,但受到计算工作量和定义表征异质含水层系统所需输入参数的难度的限制。如果准确定义了气候和水需求情景,则基于过程的模型被证明是用于资源管理目的的强大工具。数据驱动模型是管理地下水资源的有效工具,但高度依赖于包含可能的系统响应范围的大型训练数据集的可用性。代理建模方法的高效率使其成为实时决策支持系统和数字孪生平台等应用的理想工具。系统动力学建模在涉及多个利益相关者及其决策的社会经济背景下研究地下水开采问题。它将地下水流模型与社会经济学和内生决策规则相结合,以进行情景分析并支持政策制定。本文提出的分析和模型演示强调了这三种建模方法的相互联系和互补性,以及需要更综合地使用这些建模方法来增强地下水系统的多部门管理。
    Groundwater systems are vast natural water reservoirs used to support human water demands and ecosystem services. Various modeling approaches have been developed to help manage these complex highly-dynamic systems. This paper discusses the strengths and limitations of three modeling approaches, namely: process-based, data-driven and system dynamics modeling. For demonstration purposes, the three modeling approaches are applied to the Konya Closed Basin, a large agricultural region with semi-dry climate located in central Turkey. Process-based modeling is grounded in the theory-based representation of the governing processes but is somewhat limited by the computational effort and the difficulty of defining the required input parameters that characterize the heterogeneous aquifer system. Process-based models are shown to be powerful tools for resource management purposes provided climatic and water demand scenarios are accurately defined. Data-driven models are efficient tools for the management of groundwater resources but are highly dependent on the availability of large training data sets encompassing the spectrum of possible system responses. The high efficiency of surrogate modeling approaches makes them ideal tools for incorporation into applications such as real-time decision support systems and digital twin platforms. System dynamics modeling examines the groundwater exploitation problem within a socio-economic context that involves multiple stakeholders and their decision making. It combines groundwater flow models with socio-economics and endogenous decision rules to conduct scenario analysis and support policy development. The analyses and model demonstrations presented in this paper underscore the interconnectedness and complementarity of these three modeling approaches and the need for more integrated use of these modeling approaches for enhanced multi-sectoral management of groundwater systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在儿童早期发展中,睡眠行为从双相模式(白天小睡和夜间睡眠)过渡到单相模式(仅夜间睡眠)。向巩固夜间睡眠的过渡,这种情况发生在大多数2至5岁的儿童中,是一个重要的发展里程碑,反映了发展中的稳态睡眠驱动和昼夜节律系统之间的相互作用。使用基于生理学的睡眠-觉醒调节网络的数学模型,该模型受学龄前参与者的观察和实验数据的约束,我们分析了体内平衡睡眠驱动的发育介导变化如何有助于从午睡到非午睡睡眠模式的转变。我们通过识别对典型的2岁小睡行为和5岁非小睡行为进行建模的参数集来建立基线行为。然后,我们改变了与2岁和5岁参数值之间的稳态睡眠驱动的动力学和敏感性相关的六个模型参数,以诱导从双相睡眠到单相睡眠的过渡。我们通过独立改变这些参数的年龄依赖性发育轨迹来分析这些参数对睡眠模式的个体贡献。参数根据不同的演化曲线而变化,并产生代表不同年龄的过渡开始的分叉序列,过渡持续时间,和过渡睡眠模式。最后,我们考虑午睡和不午睡的光时间表,以加强午睡或促进过渡到巩固睡眠的能力,分别。这些建模结果提供了对稳态睡眠驱动在促进发育介导的睡眠行为转变中的个体差异中的作用的见解,并为识别促进幼儿健康睡眠巩固的基于光或行为的干预措施奠定了基础。
    Across early childhood development, sleep behavior transitions from a biphasic pattern (a daytime nap and nighttime sleep) to a monophasic pattern (only nighttime sleep). The transition to consolidated nighttime sleep, which occurs in most children between 2- and 5-years-old, is a major developmental milestone and reflects interactions between the developing homeostatic sleep drive and circadian system. Using a physiologically-based mathematical model of the sleep-wake regulatory network constrained by observational and experimental data from preschool-aged participants, we analyze how developmentally-mediated changes in the homeostatic sleep drive may contribute to the transition from napping to non-napping sleep patterns. We establish baseline behavior by identifying parameter sets that model typical 2-year-old napping behavior and 5-year-old non-napping behavior. Then we vary six model parameters associated with the dynamics of and sensitivity to the homeostatic sleep drive between the 2-year-old and 5-year-old parameter values to induce the transition from biphasic to monophasic sleep. We analyze the individual contributions of these parameters to sleep patterning by independently varying their age-dependent developmental trajectories. Parameters vary according to distinct evolution curves and produce bifurcation sequences representing various ages of transition onset, transition durations, and transitional sleep patterns. Finally, we consider the ability of napping and non-napping light schedules to reinforce napping or promote a transition to consolidated sleep, respectively. These modeling results provide insight into the role of the homeostatic sleep drive in promoting interindividual variability in developmentally-mediated transitions in sleep behavior and lay foundations for the identification of light- or behavior-based interventions that promote healthy sleep consolidation in early childhood.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    细胞增殖的调节是组织发育和体内平衡的一个重要方面,在形态发生中起着重要作用。伤口愈合,和肿瘤侵袭。这种调节的现象是接触抑制,描述了扩散的急剧放缓,当多个细胞彼此接触时,细胞迁移和单个细胞生长。虽然许多生理,分子和遗传因素是已知的,接触抑制的机理尚不完全清楚。特别是,由于界面接触引起的细胞信号传导与接触抑制的相关性仍存在争议。过去,细胞自动机(CA)已被用作数值有效的数学模型来研究细胞集合的动力学,但是它们不适合探索接触抑制的起源,因为这种基于试剂的模型假设细胞大小固定。我们开发了一个最小的,数据驱动模型通过扩展概率CA来模拟平面细胞培养物的动力学,以纳入生长和细胞分裂过程中单个细胞的大小变化。我们成功地将此模型应用于先前关于上皮组织接触抑制的体外实验:在系统校准模型参数以测量单细胞动力学之后,我们的CA模型定量地再现了对突发事件的独立测量,文化广泛的特点,像菌落大小一样,细胞密度和集体细胞迁移。特别是,CA模型的动力学还表现出从低密度汇合状态到静止的汇合后状态的过渡,细胞大小和运动迅速减小。这意味着体积排除原则,机械约束是模型中唯一包含的细胞间相互作用,与大小依赖性增殖率配对足以产生观察到的接触抑制。我们讨论了我们的方法如何在CA框架中引入有效的生物机械相互作用,以供未来研究使用。
    Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: After a systematic calibration of the model parameters to measurements of single-cell dynamics, our CA model quantitatively reproduces independent measurements of emergent, culture-wide features, like colony size, cell density and collective cell migration. In particular, the dynamics of the CA model also exhibit the transition from a low-density confluent regime to a stationary postconfluent regime with a rapid decrease in cell size and motion. This implies that the volume exclusion principle, a mechanical constraint which is the only inter-cellular interaction incorporated in the model, paired with a size-dependent proliferation rate is sufficient to generate the observed contact inhibition. We discuss how our approach enables the introduction of effective bio-mechanical interactions in a CA framework for future studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究介绍了一种用于对表现出非线性特征的组件进行数据驱动回归建模的方法,利用非线性动力学稀疏辨识(SINDy)方法。将SINDy方法扩展为建立具有非线性特征的互连组件的回归模型,产生具有物理可解释解的控制方程。所提出的方法侧重于提取在各种回归模型之间平衡准确性和稀疏性的模型。在这个过程中,使用线性项权重和误差直方图生成综合模型.通过涉及具有非线性特征的海绵垫的案例研究证明了所提出方法的适用性。通过将预测模型与实验响应进行对比,验证了该方法的可靠性。结果突出表明,回归模型,基于所提出的技术,能有效地建立精确的动力系统模型,考虑现实条件。
    This research introduces a methodology for data-driven regression modeling of components exhibiting nonlinear characteristics, utilizing the sparse identification of nonlinear dynamics (SINDy) method. The SINDy method is extended to formulate regression models for interconnecting components with nonlinear traits, yielding governing equations with physically interpretable solutions. The proposed methodology focuses on extracting a model that balances accuracy and sparsity among various regression models. In this process, a comprehensive model was generated using linear term weights and an error histogram. The applicability of the proposed approach is demonstrated through a case study involving a sponge gasket with nonlinear characteristics. By contrasting the predictive model with experimental responses, the reliability of the methodology is verified. The results highlight that the regression model, based on the proposed technique, can effectively establish an accurate dynamical system model, accounting for realistic conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    厌氧膜生物反应器(AnMBR)不仅是水的再生技术,而且还可以去除病毒;但是,AnMBR的病毒去除效率尚未得到充分研究。此外,去除效率估计需要进水和出水中病毒浓度的数据集,但是,由于病毒定量过程通常很耗时,并且需要专门的设备和训练有素的人员,因此对其进行监视并不容易进行实际操作。因此,在这项研究中,我们的目标是找出钥匙,在AnMBR中监测变量,并使用选定的变量建立数据驱动模型来预测病毒去除效率。我们监测了仙台AnMBR的运行和环境条件,日本,并在六个月内每周测量一次病毒浓度。Spearman等级相关分析表明,进水和混合液悬浮固体(MLSS)的pH值与辣椒轻度斑驳病毒的对数减少值强相关。表明静电相互作用在AnMBR病毒去除中起主导作用。在候选模型中,随机森林模型使用选定的变量,包括流入和MLSSpH值,表现优于其他.这项研究证明了AnMBR作为具有高微生物安全性的市政废水再生的可行选择的潜力。
    The anaerobic membrane bioreactor (AnMBR) is a promising technology for not only water reclamation but also virus removal; however, the virus removal efficiency of AnMBR has not been fully investigated. Additionally, the removal efficiency estimation requires datasets of virus concentration in influent and effluent, but its monitoring is not easy to perform for practical operation because the virus quantification process is generally time-consuming and requires specialized equipment and trained personnel. Therefore, in this study, we aimed to identify the key, monitorable variables in AnMBR and establish the data-driven models using the selected variables to predict virus removal efficiency. We monitored operational and environmental conditions of AnMBR in Sendai, Japan and measured virus concentration once a week for six months. Spearman\'s rank correlation analysis revealed that the pH values of influent and mixed liquor suspended solids (MLSS) were strongly correlated with the log reduction value of pepper mild mottle virus, indicating that electrostatic interactions played a dominant role in AnMBR virus removal. Among the candidate models, the random forest model using selected variables including influent and MLSS pH outperformed the others. This study has demonstrated the potential of AnMBR as a viable option for municipal wastewater reclamation with high microbial safety.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    建立数据驱动模型是从经验数据中提取信息的有效策略。用最佳拟合方法使模型参数专门适应数据,将相关信息编码到数学模型中。随后,最优控制框架提取最有效的目标,通过外部刺激将模型引导到所需的变化。DataXflow软件框架集成了三个软件管道,用于模型拟合的D2D,一个解决最优控制问题的框架,包括外部刺激和JimenaE提供图形用户界面,以采用其他框架降低编程技能需求的障碍,并同时自动化重复发生的建模任务。这些任务包括从图形生成方程和脚本生成,还允许使用许多代理来接近系统,比如复杂的基因调控网络.定义了模型的所需状态,治疗干预被建模为外部刺激。最优控制框架通过提供最有效地影响所需变化的那些外部刺激来有目的地利用模型编码的信息。DataXflow的实现可在https://github.com/MarvelousHopefull/DataXflow下获得。我们通过从测量数据中检测肺癌治疗的特定药物靶标以降低增殖和增加凋亡来展示其应用。通过迭代建模过程细化模型的拓扑结构,肿瘤的调控网络是从数据中产生的。在我们的实施例中,最优控制框架的应用揭示了AURKA的抑制和CDH1的激活作为最有效的药物靶标组合。DataXflow为数据生成及其分析之间的敏捷相互作用铺平了道路,可能通过有效的药物靶标识别来加速癌症研究。即使在复杂的网络中。
    Building data-driven models is an effective strategy for information extraction from empirical data. Adapting model parameters specifically to data with a best fitting approach encodes the relevant information into a mathematical model. Subsequently, an optimal control framework extracts the most efficient targets to steer the model into desired changes via external stimuli. The DataXflow software framework integrates three software pipelines, D2D for model fitting, a framework solving optimal control problems including external stimuli and JimenaE providing graphical user interfaces to employ the other frameworks lowering the barriers for the need of programming skills, and simultaneously automating reoccurring modeling tasks. Such tasks include equation generation from a graph and script generation allowing also to approach systems with many agents, like complex gene regulatory networks. A desired state of the model is defined, and therapeutic interventions are modeled as external stimuli. The optimal control framework purposefully exploits the model-encoded information by providing those external stimuli that effect the desired changes most efficiently. The implementation of DataXflow is available under https://github.com/MarvelousHopefull/DataXflow. We showcase its application by detecting specific drug targets for a therapy of lung cancer from measurement data to lower proliferation and increase apoptosis. By an iterative modeling process refining the topology of the model, the regulatory network of the tumor is generated from the data. An application of the optimal control framework in our example reveals the inhibition of AURKA and the activation of CDH1 as the most efficient drug target combination. DataXflow paves the way to an agile interplay between data generation and its analysis potentially accelerating cancer research by an efficient drug target identification, even in complex networks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在道路安全和向自动驾驶的演进领域,高级驾驶辅助和自动驾驶(ADAS/AD)系统发挥着关键作用。随着这些系统复杂性的增长,全面测试势在必行,随着虚拟测试环境变得至关重要,特别是对于处理多样化和具有挑战性的场景。雷达传感器是ADAS/AD单元的组成部分,即使在不利条件下也以其强大的性能而闻名。然而,准确建模雷达的感知,特别是雷达横截面(RCS),证明具有挑战性。本文采用数据驱动的方法,使用高斯混合模型(GMM)对雷达对各种车辆和纵横角的感知进行建模。贝叶斯变分方法自动推断模型复杂性。该模型扩展为基于对象列表的综合雷达传感器模型,结合遮挡效应和基于RCS的可检测性决策。通过准确再现RCS行为和散点分布来证明模型的有效性。在不同的场景中展示了传感器模型的全部功能。灵活的模块化框架已被证明适合对特定方面进行建模,并允许简单的模型扩展。同时,除了模型扩展之外,提出了更广泛的验证,以提高准确性,拓宽模型的适用性。
    In the realm of road safety and the evolution toward automated driving, Advanced Driver Assistance and Automated Driving (ADAS/AD) systems play a pivotal role. As the complexity of these systems grows, comprehensive testing becomes imperative, with virtual test environments becoming crucial, especially for handling diverse and challenging scenarios. Radar sensors are integral to ADAS/AD units and are known for their robust performance even in adverse conditions. However, accurately modeling the radar\'s perception, particularly the radar cross-section (RCS), proves challenging. This paper adopts a data-driven approach, using Gaussian mixture models (GMMs) to model the radar\'s perception for various vehicles and aspect angles. A Bayesian variational approach automatically infers model complexity. The model is expanded into a comprehensive radar sensor model based on object lists, incorporating occlusion effects and RCS-based detectability decisions. The model\'s effectiveness is demonstrated through accurate reproduction of the RCS behavior and scatter point distribution. The full capabilities of the sensor model are demonstrated in different scenarios. The flexible and modular framework has proven apt for modeling specific aspects and allows for an easy model extension. Simultaneously, alongside model extension, more extensive validation is proposed to refine accuracy and broaden the model\'s applicability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    循环经济(CE)旨在通过战略将经济增长与有限资源的消耗脱钩,比如消除浪费,使用中的循环材料,再生自然系统。由于数据科学(DS)的快速发展,在过去的十年里,在向行政长官过渡方面取得了有希望的进展。DS提供各种方法来实现准确的预测,加速产品可持续设计,延长资产寿命,优化材料流通所需的基础设施,并提供基于证据的见解。尽管在这一领域取得了令人兴奋的科学进步,仍然缺乏对该主题的全面审查,以总结过去的成就,综合获得的知识,并导航未来的研究方向。在本文中,我们试图总结DS是如何加速向CE过渡的。我们对DS在何处以及如何帮助CE过渡进行了严格的审查,重点关注四个领域,包括(1)表征社会经济代谢,(2)通过提高材料效率和优化产品设计来减少不必要的浪费产生,(3)通过维修延长产品寿命,(4)促进废物再利用和回收。我们还介绍了当前应用中的局限性和挑战,并讨论了为该领域的未来研究提供清晰路线图的机会。
    The circular economy (CE) aims to decouple the growth of the economy from the consumption of finite resources through strategies, such as eliminating waste, circulating materials in use, and regenerating natural systems. Due to the rapid development of data science (DS), promising progress has been made in the transition toward CE in the past decade. DS offers various methods to achieve accurate predictions, accelerate product sustainable design, prolong asset life, optimize the infrastructure needed to circulate materials, and provide evidence-based insights. Despite the exciting scientific advances in this field, there still lacks a comprehensive review on this topic to summarize past achievements, synthesize knowledge gained, and navigate future research directions. In this paper, we try to summarize how DS accelerated the transition to CE. We conducted a critical review of where and how DS has helped the CE transition with a focus on four areas including (1) characterizing socioeconomic metabolism, (2) reducing unnecessary waste generation by enhancing material efficiency and optimizing product design, (3) extending product lifetime through repair, and (4) facilitating waste reuse and recycling. We also introduced the limitations and challenges in the current applications and discussed opportunities to provide a clear roadmap for future research in this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    从测量数据中解码非线性振荡器网络的连接结构对于理解和控制网络功能是一项艰巨而必不可少的任务。已经提出了几种数据驱动的网络推理算法,但是通常认为充足的测量数据的前提在实践中往往难以满足。在本文中,我们通过将相关统计与模型拟合过程相结合,提出了一种数据高效的网络推断技术。所提出的方法可以在测量数据有限的情况下可靠地识别网络结构。我们将提出的方法与Stuart-Landau振荡器网络上的现有技术进行了比较,描述昼夜节律基因表达的振荡器,以及从Rössler电子振荡器网络获得的嘈杂实验数据。
    Decoding the connectivity structure of a network of nonlinear oscillators from measurement data is a difficult yet essential task for understanding and controlling network functionality. Several data-driven network inference algorithms have been presented, but the commonly considered premise of ample measurement data is often difficult to satisfy in practice. In this paper, we propose a data-efficient network inference technique by combining correlation statistics with the model-fitting procedure. The proposed approach can identify the network structure reliably in the case of limited measurement data. We compare the proposed method with existing techniques on a network of Stuart-Landau oscillators, oscillators describing circadian gene expression, and noisy experimental data obtained from Rössler Electronic Oscillator network.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:慢性肾脏病(CKD)需要准确预测肾脏替代疗法(RRT)的启动风险。这项研究开发了深度学习算法(DLA),通过结合病史和处方以及生化研究来预测CKD患者的RRT风险。
    方法:一项多中心回顾性队列研究在香港三家主要医院进行。包括eGFR<30ml/min/1.73m2的CKD患者。使用患者数据创建和训练各种结构的DLA。使用测试集,将DLA预测性能与肾衰竭风险方程(KFRE)进行比较。
    结果:DLA在预测RRT起始风险方面优于KFRE(CNN+LSTM+ANN层ROC-AUC=0.90;CNNROC-AUC=0.91;4变量KFRE:ROC-AUC=0.84;8变量KFRE:ROC-AUC=0.84)。DLA准确预测5年后未编码的肾移植和需要透析的患者,展示了他们捕捉非线性关系的能力。
    结论:DLA可以准确预测CKD患者的RRT风险,超越KFRE等传统方法。合并病史和处方可提高预测性能。虽然我们的研究结果表明,DLA有望改善CKD管理中的患者护理和资源分配,进一步的前瞻性观察研究和随机对照试验是必要的,以充分了解其影响,特别是关于DLA的可解释性,偏置最小化,和过拟合减少。总的来说,我们的研究强调了DLA作为推进CKD管理和预测RRT起始风险的潜在有价值工具的新兴作用.
    BACKGROUND: Chronic kidney disease (CKD) requires accurate prediction of renal replacement therapy (RRT) initiation risk. This study developed deep learning algorithms (DLAs) to predict RRT risk in CKD patients by incorporating medical history and prescriptions in addition to biochemical investigations.
    METHODS: A multi-centre retrospective cohort study was conducted in three major hospitals in Hong Kong. CKD patients with an eGFR < 30ml/min/1.73m2 were included. DLAs of various structures were created and trained using patient data. Using a test set, the DLAs\' predictive performance was compared to Kidney Failure Risk Equation (KFRE).
    RESULTS: DLAs outperformed KFRE in predicting RRT initiation risk (CNN + LSTM + ANN layers ROC-AUC = 0.90; CNN ROC-AUC = 0.91; 4-variable KFRE: ROC-AUC = 0.84; 8-variable KFRE: ROC-AUC = 0.84). DLAs accurately predicted uncoded renal transplants and patients requiring dialysis after 5 years, demonstrating their ability to capture non-linear relationships.
    CONCLUSIONS: DLAs provide accurate predictions of RRT risk in CKD patients, surpassing traditional methods like KFRE. Incorporating medical history and prescriptions improves prediction performance. While our findings suggest that DLAs hold promise for improving patient care and resource allocation in CKD management, further prospective observational studies and randomized controlled trials are necessary to fully understand their impact, particularly regarding DLA interpretability, bias minimization, and overfitting reduction. Overall, our research underscores the emerging role of DLAs as potentially valuable tools in advancing the management of CKD and predicting RRT initiation risk.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号