Data-driven modeling

  • 文章类型: Journal Article
    Groundwater systems are vast natural water reservoirs used to support human water demands and ecosystem services. Various modeling approaches have been developed to help manage these complex highly-dynamic systems. This paper discusses the strengths and limitations of three modeling approaches, namely: process-based, data-driven and system dynamics modeling. For demonstration purposes, the three modeling approaches are applied to the Konya Closed Basin, a large agricultural region with semi-dry climate located in central Turkey. Process-based modeling is grounded in the theory-based representation of the governing processes but is somewhat limited by the computational effort and the difficulty of defining the required input parameters that characterize the heterogeneous aquifer system. Process-based models are shown to be powerful tools for resource management purposes provided climatic and water demand scenarios are accurately defined. Data-driven models are efficient tools for the management of groundwater resources but are highly dependent on the availability of large training data sets encompassing the spectrum of possible system responses. The high efficiency of surrogate modeling approaches makes them ideal tools for incorporation into applications such as real-time decision support systems and digital twin platforms. System dynamics modeling examines the groundwater exploitation problem within a socio-economic context that involves multiple stakeholders and their decision making. It combines groundwater flow models with socio-economics and endogenous decision rules to conduct scenario analysis and support policy development. The analyses and model demonstrations presented in this paper underscore the interconnectedness and complementarity of these three modeling approaches and the need for more integrated use of these modeling approaches for enhanced multi-sectoral management of groundwater systems.






  • 文章类型: Journal Article
    Across early childhood development, sleep behavior transitions from a biphasic pattern (a daytime nap and nighttime sleep) to a monophasic pattern (only nighttime sleep). The transition to consolidated nighttime sleep, which occurs in most children between 2- and 5-years-old, is a major developmental milestone and reflects interactions between the developing homeostatic sleep drive and circadian system. Using a physiologically-based mathematical model of the sleep-wake regulatory network constrained by observational and experimental data from preschool-aged participants, we analyze how developmentally-mediated changes in the homeostatic sleep drive may contribute to the transition from napping to non-napping sleep patterns. We establish baseline behavior by identifying parameter sets that model typical 2-year-old napping behavior and 5-year-old non-napping behavior. Then we vary six model parameters associated with the dynamics of and sensitivity to the homeostatic sleep drive between the 2-year-old and 5-year-old parameter values to induce the transition from biphasic to monophasic sleep. We analyze the individual contributions of these parameters to sleep patterning by independently varying their age-dependent developmental trajectories. Parameters vary according to distinct evolution curves and produce bifurcation sequences representing various ages of transition onset, transition durations, and transitional sleep patterns. Finally, we consider the ability of napping and non-napping light schedules to reinforce napping or promote a transition to consolidated sleep, respectively. These modeling results provide insight into the role of the homeostatic sleep drive in promoting interindividual variability in developmentally-mediated transitions in sleep behavior and lay foundations for the identification of light- or behavior-based interventions that promote healthy sleep consolidation in early childhood.






  • 文章类型: Journal Article
    Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: After a systematic calibration of the model parameters to measurements of single-cell dynamics, our CA model quantitatively reproduces independent measurements of emergent, culture-wide features, like colony size, cell density and collective cell migration. In particular, the dynamics of the CA model also exhibit the transition from a low-density confluent regime to a stationary postconfluent regime with a rapid decrease in cell size and motion. This implies that the volume exclusion principle, a mechanical constraint which is the only inter-cellular interaction incorporated in the model, paired with a size-dependent proliferation rate is sufficient to generate the observed contact inhibition. We discuss how our approach enables the introduction of effective bio-mechanical interactions in a CA framework for future studies.






  • 文章类型: Journal Article
    This research introduces a methodology for data-driven regression modeling of components exhibiting nonlinear characteristics, utilizing the sparse identification of nonlinear dynamics (SINDy) method. The SINDy method is extended to formulate regression models for interconnecting components with nonlinear traits, yielding governing equations with physically interpretable solutions. The proposed methodology focuses on extracting a model that balances accuracy and sparsity among various regression models. In this process, a comprehensive model was generated using linear term weights and an error histogram. The applicability of the proposed approach is demonstrated through a case study involving a sponge gasket with nonlinear characteristics. By contrasting the predictive model with experimental responses, the reliability of the methodology is verified. The results highlight that the regression model, based on the proposed technique, can effectively establish an accurate dynamical system model, accounting for realistic conditions.






  • 文章类型: Journal Article
    The anaerobic membrane bioreactor (AnMBR) is a promising technology for not only water reclamation but also virus removal; however, the virus removal efficiency of AnMBR has not been fully investigated. Additionally, the removal efficiency estimation requires datasets of virus concentration in influent and effluent, but its monitoring is not easy to perform for practical operation because the virus quantification process is generally time-consuming and requires specialized equipment and trained personnel. Therefore, in this study, we aimed to identify the key, monitorable variables in AnMBR and establish the data-driven models using the selected variables to predict virus removal efficiency. We monitored operational and environmental conditions of AnMBR in Sendai, Japan and measured virus concentration once a week for six months. Spearman\'s rank correlation analysis revealed that the pH values of influent and mixed liquor suspended solids (MLSS) were strongly correlated with the log reduction value of pepper mild mottle virus, indicating that electrostatic interactions played a dominant role in AnMBR virus removal. Among the candidate models, the random forest model using selected variables including influent and MLSS pH outperformed the others. This study has demonstrated the potential of AnMBR as a viable option for municipal wastewater reclamation with high microbial safety.






  • 文章类型: Journal Article
    Building data-driven models is an effective strategy for information extraction from empirical data. Adapting model parameters specifically to data with a best fitting approach encodes the relevant information into a mathematical model. Subsequently, an optimal control framework extracts the most efficient targets to steer the model into desired changes via external stimuli. The DataXflow software framework integrates three software pipelines, D2D for model fitting, a framework solving optimal control problems including external stimuli and JimenaE providing graphical user interfaces to employ the other frameworks lowering the barriers for the need of programming skills, and simultaneously automating reoccurring modeling tasks. Such tasks include equation generation from a graph and script generation allowing also to approach systems with many agents, like complex gene regulatory networks. A desired state of the model is defined, and therapeutic interventions are modeled as external stimuli. The optimal control framework purposefully exploits the model-encoded information by providing those external stimuli that effect the desired changes most efficiently. The implementation of DataXflow is available under We showcase its application by detecting specific drug targets for a therapy of lung cancer from measurement data to lower proliferation and increase apoptosis. By an iterative modeling process refining the topology of the model, the regulatory network of the tumor is generated from the data. An application of the optimal control framework in our example reveals the inhibition of AURKA and the activation of CDH1 as the most efficient drug target combination. DataXflow paves the way to an agile interplay between data generation and its analysis potentially accelerating cancer research by an efficient drug target identification, even in complex networks.






  • 文章类型: Journal Article
    In the realm of road safety and the evolution toward automated driving, Advanced Driver Assistance and Automated Driving (ADAS/AD) systems play a pivotal role. As the complexity of these systems grows, comprehensive testing becomes imperative, with virtual test environments becoming crucial, especially for handling diverse and challenging scenarios. Radar sensors are integral to ADAS/AD units and are known for their robust performance even in adverse conditions. However, accurately modeling the radar\'s perception, particularly the radar cross-section (RCS), proves challenging. This paper adopts a data-driven approach, using Gaussian mixture models (GMMs) to model the radar\'s perception for various vehicles and aspect angles. A Bayesian variational approach automatically infers model complexity. The model is expanded into a comprehensive radar sensor model based on object lists, incorporating occlusion effects and RCS-based detectability decisions. The model\'s effectiveness is demonstrated through accurate reproduction of the RCS behavior and scatter point distribution. The full capabilities of the sensor model are demonstrated in different scenarios. The flexible and modular framework has proven apt for modeling specific aspects and allows for an easy model extension. Simultaneously, alongside model extension, more extensive validation is proposed to refine accuracy and broaden the model\'s applicability.






  • 文章类型: Journal Article
    The circular economy (CE) aims to decouple the growth of the economy from the consumption of finite resources through strategies, such as eliminating waste, circulating materials in use, and regenerating natural systems. Due to the rapid development of data science (DS), promising progress has been made in the transition toward CE in the past decade. DS offers various methods to achieve accurate predictions, accelerate product sustainable design, prolong asset life, optimize the infrastructure needed to circulate materials, and provide evidence-based insights. Despite the exciting scientific advances in this field, there still lacks a comprehensive review on this topic to summarize past achievements, synthesize knowledge gained, and navigate future research directions. In this paper, we try to summarize how DS accelerated the transition to CE. We conducted a critical review of where and how DS has helped the CE transition with a focus on four areas including (1) characterizing socioeconomic metabolism, (2) reducing unnecessary waste generation by enhancing material efficiency and optimizing product design, (3) extending product lifetime through repair, and (4) facilitating waste reuse and recycling. We also introduced the limitations and challenges in the current applications and discussed opportunities to provide a clear roadmap for future research in this field.






  • 文章类型: Journal Article
    Decoding the connectivity structure of a network of nonlinear oscillators from measurement data is a difficult yet essential task for understanding and controlling network functionality. Several data-driven network inference algorithms have been presented, but the commonly considered premise of ample measurement data is often difficult to satisfy in practice. In this paper, we propose a data-efficient network inference technique by combining correlation statistics with the model-fitting procedure. The proposed approach can identify the network structure reliably in the case of limited measurement data. We compare the proposed method with existing techniques on a network of Stuart-Landau oscillators, oscillators describing circadian gene expression, and noisy experimental data obtained from Rössler Electronic Oscillator network.






  • 文章类型: Journal Article
    BACKGROUND: Chronic kidney disease (CKD) requires accurate prediction of renal replacement therapy (RRT) initiation risk. This study developed deep learning algorithms (DLAs) to predict RRT risk in CKD patients by incorporating medical history and prescriptions in addition to biochemical investigations.
    METHODS: A multi-centre retrospective cohort study was conducted in three major hospitals in Hong Kong. CKD patients with an eGFR < 30ml/min/1.73m2 were included. DLAs of various structures were created and trained using patient data. Using a test set, the DLAs\' predictive performance was compared to Kidney Failure Risk Equation (KFRE).
    RESULTS: DLAs outperformed KFRE in predicting RRT initiation risk (CNN + LSTM + ANN layers ROC-AUC = 0.90; CNN ROC-AUC = 0.91; 4-variable KFRE: ROC-AUC = 0.84; 8-variable KFRE: ROC-AUC = 0.84). DLAs accurately predicted uncoded renal transplants and patients requiring dialysis after 5 years, demonstrating their ability to capture non-linear relationships.
    CONCLUSIONS: DLAs provide accurate predictions of RRT risk in CKD patients, surpassing traditional methods like KFRE. Incorporating medical history and prescriptions improves prediction performance. While our findings suggest that DLAs hold promise for improving patient care and resource allocation in CKD management, further prospective observational studies and randomized controlled trials are necessary to fully understand their impact, particularly regarding DLA interpretability, bias minimization, and overfitting reduction. Overall, our research underscores the emerging role of DLAs as potentially valuable tools in advancing the management of CKD and predicting RRT initiation risk.





