Containerization

集装箱化
  • 文章类型: Journal Article
    BrainForge是一款支持云的产品,基于网络的神经影像学研究分析平台。该网站允许用户存档来自研究的数据,并在高性能计算集群上毫不费力地处理数据。分析完成后,结果可以快速与同事分享。BrainForge为希望分析神经成像数据的研究人员解决了多个问题,包括与软件有关的问题,再现性,计算资源,和数据共享。BrainForge目前可以处理结构,功能,扩散,和动脉自旋标记MRI模式,包括预处理和组级分析。目前正在添加其他管道,和管道可以接受BIDS格式。分析完全在Singularity容器内部进行,并利用流行的软件包,包括Nipype,统计参数映射,fMRI工具箱的ICA组,和FreeSurfer。BrainForge还具有多个用于组分析的接口,包括全自动自适应ICA方法。
    BrainForge is a cloud-enabled, web-based analysis platform for neuroimaging research. This website allows users to archive data from a study and effortlessly process data on a high-performance computing cluster. After analyses are completed, results can be quickly shared with colleagues. BrainForge solves multiple problems for researchers who want to analyze neuroimaging data, including issues related to software, reproducibility, computational resources, and data sharing. BrainForge can currently process structural, functional, diffusion, and arterial spin labeling MRI modalities, including preprocessing and group level analyses. Additional pipelines are currently being added, and the pipelines can accept the BIDS format. Analyses are conducted completely inside of Singularity containers and utilize popular software packages including Nipype, Statistical Parametric Mapping, the Group ICA of fMRI Toolbox, and FreeSurfer. BrainForge also features several interfaces for group analysis, including a fully automated adaptive ICA approach.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    用于大规模药物发现模拟的高性能计算(HPC)平台需要对专业硬件进行大量投资,维护,资源管理,和运行成本。计算硬件的快速增长使得提供具有成本效益的,健壮,安全,以及通过云实现本地(本地)HPC的可扩展替代方案,雾,和边缘计算。它为药物发现提供了最新的基于机器学习(ML)和人工智能(AI)的工具,比如BERT,BARD,AlphaFold2和GPT。本章试图概述用于开发具有部署不可知(本地到云和混合)用例的科学软件或应用程序的软件体系结构类型。此外,本章旨在概述创新如何破坏在本地HPC上运行的单片软件的正统思维,并提供跨分布式的基于微服务驱动的应用程序编程(API)和基于消息解析接口(MPI)的科学计算的范式转变,高可用基础设施。这与敏捷DevOps相结合,和良好的编码实践,低代码和无代码应用程序开发框架,具有成本效益,安全,自动化,和强大的科学应用生命周期管理。
    The high-performance computing (HPC) platform for large-scale drug discovery simulation demands significant investment in speciality hardware, maintenance, resource management, and running costs. The rapid growth in computing hardware has made it possible to provide cost-effective, robust, secure, and scalable alternatives to the on-premise (on-prem) HPC via Cloud, Fog, and Edge computing. It has enabled recent state-of-the-art machine learning (ML) and artificial intelligence (AI)-based tools for drug discovery, such as BERT, BARD, AlphaFold2, and GPT. This chapter attempts to overview types of software architectures for developing scientific software or application with deployment agnostic (on-prem to cloud and hybrid) use cases. Furthermore, the chapter aims to outline how the innovation is disrupting the orthodox mindset of monolithic software running on on-prem HPC and provide the paradigm shift landscape to microservices driven application programming (API) and message parsing interface (MPI)-based scientific computing across the distributed, high-available infrastructure. This is coupled with agile DevOps, and good coding practices, low code and no-code application development frameworks for cost-efficient, secure, automated, and robust scientific application life cycle management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    物联网解决方案中的边缘设备数量迅速增加,生成大量需要有效处理和分析的数据。传统的基于云的架构可能面临延迟,带宽,以及处理这种数据泛滥时的隐私挑战。目前没有统一的方法来创建边缘计算解决方案。这项工作通过在网络边缘探索数据处理解决方案的容器化来解决这个问题。当前的方法涉及创建与所使用的设备兼容的专用应用程序。另一种方法涉及使用容器化来进行部署和监视。边缘环境的异构性将极大地受益于通用模块化平台。我们提出的基于边缘计算的框架实现了流提取,变换,和使用ZeroMQ作为通信骨干和容器化进行可扩展部署的数据处理和分析的负载管道。结果表明了所提出的框架的有效性,使其适用于时间敏感的物联网应用。
    There is a rapid increase in the number of edge devices in IoT solutions, generating vast amounts of data that need to be processed and analyzed efficiently. Traditional cloud-based architectures can face latency, bandwidth, and privacy challenges when dealing with this data flood. There is currently no unified approach to the creation of edge computing solutions. This work addresses this problem by exploring containerization for data processing solutions at the network\'s edge. The current approach involves creating a specialized application compatible with the device used. Another approach involves using containerization for deployment and monitoring. The heterogeneity of edge environments would greatly benefit from a universal modular platform. Our proposed edge computing-based framework implements a streaming extract, transform, and load pipeline for data processing and analysis using ZeroMQ as the communication backbone and containerization for scalable deployment. Results demonstrate the effectiveness of the proposed framework, making it suitable for time-sensitive IoT applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着高通量生物技术平台的出现及其不断增长的能力,生命科学已经变成了数字化,计算和数据密集型学科。因此,在常规生产的背景下,使用生物信息学管道进行标准分析已成为一项挑战,以便可以实时处理数据并尽快将其交付给最终用户。工作流管理系统以及包装系统和容器化技术的使用为应对这一挑战提供了机会。虽然非常强大,它们可以以多种方式使用和组合,这些方式可能因开发人员而异。因此,促进工作流实施的同质性需要指南和协议,详细说明如何编写和组织生物信息学管道的源代码,以确保其可用性,可维护性,互操作性,可持续性便携性,再现性,可扩展性和效率。在Nextflow上大写,康达,Docker,奇点和NF核心倡议,我们在生物信息学管道的开发生命周期和生产运营部署中提出了一套最佳实践,这些实践针对不同的专家社区,包括i)生物信息学和统计学家ii)软件工程师和iii)数据经理和核心设施工程师。我们实现了Geniac(nextflow管道的自动配置GENerator和安装程序),它由一个包含三个组件的工具箱组成:i)在https://geniac上提供的技术文档。readthedocs.io详细介绍了使用Nextflow的生物信息学管道的编码指南,ii)带有linter的命令行界面,以检查代码是否遵守准则,和iii)用于生成配置文件的附加组件,构建容器并部署管道。Geniac工具箱旨在通过解析Nextflow管道的源代码来协调开发人员之间的开发实践,并自动生成配置文件和容器。
    With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in many multiple ways which may differ from one developer to another. Therefore, promoting the homogeneity of the workflow implementation requires guidelines and protocols which detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which target different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a command line interface with a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    神经成像技术经历了爆炸性的增长,并改变了对健康和疾病中神经机制的研究。然而,考虑到处理神经影像数据的复杂工具的多样性,该领域在方法集成方面面临挑战,特别是跨多种模式和物种。具体来说,研究人员通常不得不依赖限制可重复性的孤立方法,具有特殊的数据组织和有限的软件互操作性。
    为了应对这些挑战,我们开发了定量神经成像环境和工具箱(QuNex),用于一致的端到端处理和分析的平台。QuNex为神经成像分析提供了几种新颖的功能,包括一个“交钥匙”命令,用于可重复部署自定义工作流,从加载原始数据到生成分析特征。
    该平台可实现多模态,社区开发的神经成像软件,通过一个带有软件开发工具包(SDK)的扩展框架无缝集成社区工具。严重的,它支持高吞吐量,高性能计算环境中的并行处理,无论是在本地还是在云中。值得注意的是,QuNex已经成功地处理了神经成像联盟的超过10,000次扫描,包括多个临床数据集。此外,QuNex通过内聚的翻译平台实现了人类和非人类工作流的集成。
    集体,这一努力将显著影响神经成像方法在采集方法之间的整合,管道,数据集,计算环境,和物种。在这个平台上建设将使更快,可扩展,以及神经成像技术对健康和疾病的可复制影响。
    UNASSIGNED: Neuroimaging technology has experienced explosive growth and transformed the study of neural mechanisms across health and disease. However, given the diversity of sophisticated tools for handling neuroimaging data, the field faces challenges in method integration, particularly across multiple modalities and species. Specifically, researchers often have to rely on siloed approaches which limit reproducibility, with idiosyncratic data organization and limited software interoperability.
    UNASSIGNED: To address these challenges, we have developed Quantitative Neuroimaging Environment & Toolbox (QuNex), a platform for consistent end-to-end processing and analytics. QuNex provides several novel functionalities for neuroimaging analyses, including a \"turnkey\" command for the reproducible deployment of custom workflows, from onboarding raw data to generating analytic features.
    UNASSIGNED: The platform enables interoperable integration of multi-modal, community-developed neuroimaging software through an extension framework with a software development kit (SDK) for seamless integration of community tools. Critically, it supports high-throughput, parallel processing in high-performance compute environments, either locally or in the cloud. Notably, QuNex has successfully processed over 10,000 scans across neuroimaging consortia, including multiple clinical datasets. Moreover, QuNex enables integration of human and non-human workflows via a cohesive translational platform.
    UNASSIGNED: Collectively, this effort stands to significantly impact neuroimaging method integration across acquisition approaches, pipelines, datasets, computational environments, and species. Building on this platform will enable more rapid, scalable, and reproducible impact of neuroimaging technology across health and disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    云计算是一种商业和经济范式,自2006年以来一直受到关注,目前是IT领域最重要的技术。从云计算的概念到能源效率,云一直是很多讨论的主题。仅数据中心的能耗将从2016年的200TWh上升到2030年的2967TWh。数据中心需要大量的电力来提供服务,这增加了二氧化碳的排放量。在这份调查报告中,已经讨论了可用于构建绿色数据中心的基于软件的技术,包括单个软件级别的电源管理。本文讨论了容器中的能源效率以及用于降低数据中心功耗的解决问题的方法。Further,该文件还详细介绍了数据中心对环境的影响,包括电子垃圾和不同国家选择的对数据中心进行评级的各种标准。本文不仅仅展示了新的绿色云计算可能性。相反,它将学术界和社会的注意力和资源集中在一个关键问题上:长期技术进步。本文介绍了可以在各个软件级别应用的新技术,包括在虚拟化级别应用的技术,操作系统级别和应用程序级别。它明确定义了每个级别的不同措施,以减少能源消耗,这显然为当前减少污染的环境问题增加了价值。本文还讨论了困难,关注,以及云数据中心和云组织必须掌握的需求,以及影响绿色云使用的一些因素和案例研究。
    Cloud computing is a commercial and economic paradigm that has gained traction since 2006 and is presently the most significant technology in IT sector. From the notion of cloud computing to its energy efficiency, cloud has been the subject of much discussion. The energy consumption of data centres alone will rise from 200 TWh in 2016 to 2967 TWh in 2030. The data centres require a lot of power to provide services, which increases CO2 emissions. In this survey paper, software-based technologies that can be used for building green data centers and include power management at individual software level has been discussed. The paper discusses the energy efficiency in containers and problem-solving approaches used for reducing power consumption in data centers. Further, the paper also gives details about the impact of data centers on environment that includes the e-waste and the various standards opted by different countries for giving rating to the data centers. This article goes beyond just demonstrating new green cloud computing possibilities. Instead, it focuses the attention and resources of academia and society on a critical issue: long-term technological advancement. The article covers the new technologies that can be applied at the individual software level that includes techniques applied at virtualization level, operating system level and application level. It clearly defines different measures at each level to reduce the energy consumption that clearly adds value to the current environmental problem of pollution reduction. This article also addresses the difficulties, concerns, and needs that cloud data centres and cloud organisations must grasp, as well as some of the factors and case studies that influence green cloud usage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近,软件开发行业正在经历缓慢但真正的转型。软件越来越成为一切的一部分,and,软件开发人员,正试图通过更多的自动化来应对这种爆炸性的需求。由于对新功能和应用程序的部署和可交付性的巨大需求,连续集成(CI)和连续交付(CD)的流水线技术已经有了很大发展。因此,已经开发了DevOps方法和敏捷原则,其中开发人员与基础架构工程师紧密协作,以确保快速可靠地部署其应用程序。多亏了管道方法的思考,项目效率大大提高。敏捷实践代表了在每个冲刺交付中引入新功能的系统。这些实践可能包含开发良好的功能,或者可能包含影响交付的错误或故障。管道方法,本文所描绘的,克服了交货的问题,改进交付时间表,测试负载步骤,和基准任务。它通过集成多个测试步骤来减少系统中断,并为整个过程增加稳定性和可交付性。它提供了标准化,这意味着有一个既定的,经过时间考验的过程来使用,还可以减少歧义和猜测,保证质量,提高生产力。这个工具是用解释语言开发的,即Bash,这提供了一种更简单的方法来将其集成到任何平台。根据实验结果,我们展示了这个解决方案当前创造的价值。该解决方案提供了一种有效和高效的方法来产生,管理,自定义,并通过自动化管道自动化基于Agile的CI和CD项目。建议的系统作为标准CI/CD流程的起点,缓存Docker层供后续使用,并使用Helm在Kubernetes集群中实现高度可用的可交付成果。更改此解决方案的原则并将其扩展到多个平台(窗口)将在未来的讨论中解决。
    Lately, the software development industry is going through a slow but real transformation. Software is increasingly a part of everything, and, software developers, are trying to cope with this exploding demand through more automation. The pipelining technique of continuous integration (CI) and continuous delivery (CD) has developed considerably due to the overwhelming demand for the deployment and deliverability of new features and applications. As a result, DevOps approaches and Agile principles have been developed, in which developers collaborate closely with infrastructure engineers to guarantee that their applications are deployed quickly and reliably. Thanks to pipeline approach thinking, the efficiency of projects has greatly improved. Agile practices represent the introduction to the system of new features in each sprint delivery. Those practices may contain well-developed features or can contain bugs or failures which impact the delivery. The pipeline approach, depicted in this paper, overcomes the problems of delivery, improving the delivery timeline, the test load steps, and the benchmarking tasks. It decreases system interruption by integrating multiple test steps and adds stability and deliverability to the entire process. It provides standardization which means having an established, time-tested process to use, and can also decrease ambiguity and guesswork, guarantee quality and boost productivity. This tool is developed with an interpreted language, namely Bash, which offers an easier method to integrate it into any platform. Based on the experimental results, we demonstrate the value that this solution currently creates. This solution provides an effective and efficient way to generate, manage, customize, and automate Agile-based CI and CD projects through automated pipelines. The suggested system acts as a starting point for standard CI/CD processes, caches Docker layers for subsequent usage, and implements highly available deliverables in a Kubernetes cluster using Helm. Changing the principles of this solution and expanding it into multiple platforms (windows) will be addressed in a future discussion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    平台试验使用的信息系统应处理未预定义的更改。不幸的是,大多数现有临床数据管理系统(CDMS)的技术架构不支持将变更纳入正在进行的试验.适应性临床试验需要先进的架构解决方案设置,以实现适应性临床试验操作所需的生物标志物分层和富集策略。这篇简短的论文介绍了基于微服务的架构解决方案,该解决方案用于运行和支持自适应RECORDS-Trial。
    Information systems used by platform trials should handle changes that are not predefined. Unfortunately, the technical architecture of most existing clinical data management systems (CDMS) do not support changes to be incorporated into an ongoing trial. Adaptive clinical trials need advanced architectural solutions setup to enable biomarker stratification and enrichment strategy necessary for the adaptive clinical trial operation. This short paper presents the microservices-based architecture solution that is used to run and support the adaptive RECORDS-Trial.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Kubernetes(K8s)由于其支持容器部署和动态资源管理的各种功能,有望成为边缘计算基础设施的关键容器编排工具。例如,其水平pod自动伸缩功能通过增加副本数量来提供服务可用性和可扩展性。kube-proxy通过将客户端请求平均分配到K8s集群中应用的所有pod(副本),实现副本之间的流量负载均衡。然而,当请求被转发到远程工作人员时,这种方法可能会导致长时间的延迟,尤其是在边缘计算环境中,工作节点在地理上分散。此外,如果接收worker过载,请求处理延迟会显著增加。为了克服这些限制,本文提出了一种增强型负载均衡器,称为资源自适应代理(RAP)。RAP定期监视每个pod的资源状态以及工作节点之间的网络状态,以帮助进行负载平衡决策。此外,它优先在本地最大程度地处理请求。如果本地工作节点过载,RAP将其请求转发到群集中的最佳节点,同时考虑资源可用性。实验结果表明,与K8s的默认负载平衡机制相比,RAP可以显着提高吞吐量并减少请求延迟。
    Kubernetes (K8s) is expected to be a key container orchestration tool for edge computing infrastructures owing to its various features for supporting container deployment and dynamic resource management. For example, its horizontal pod autoscaling feature provides service availability and scalability by increasing the number of replicas. kube-proxy provides traffic load-balancing between replicas by distributing client requests equally to all pods (replicas) of an application in a K8s cluster. However, this approach can result in long delays when requests are forwarded to remote workers, especially in edge computing environments where worker nodes are geographically dispersed. Moreover, if the receiving worker is overloaded, the request-processing delay can increase significantly. To overcome these limitations, this paper proposes an enhanced load balancer called resource adaptive proxy (RAP). RAP periodically monitors the resource status of each pod and the network status among worker nodes to aid in load-balancing decisions. Furthermore, it preferentially handles requests locally to the maximum extent possible. If the local worker node is overloaded, RAP forwards its requests to the best node in the cluster while considering resource availability. Our experimental results demonstrated that RAP could significantly improve throughput and reduce request latency compared with the default load-balancing mechanism of K8s.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基于云计算的监控系统的使用对于智能建筑已经变得普遍。然而,集权与分权的两难境地,在收集信息并根据信息做出正确决定方面,remains.性能,取决于系统设计,对紧急检测很重要,其中响应时间和加载行为变得非常重要。我们研究了基于边缘计算和容器的智能建筑监控系统的几种设计选项,该系统在必要时向负责人员发送警报。这项研究评估了绩效,包括定性分析和负载测试,我们的实验设置。从700多个边缘节点,与本地解决方案相比,公共云的响应时间降低了30%。对于多达100个边缘节点,后者的值更好,介于两者之间,他们相当相似。根据对结果的解释,我们开发了五种真实世界配置的建议,并介绍了我们在智能建筑群开发中采用的设计选择。
    The use of monitoring systems based on cloud computing has become common for smart buildings. However, the dilemma of centralization versus decentralization, in terms of gathering information and making the right decisions based on it, remains. Performance, dependent on the system design, does matter for emergency detection, where response time and loading behavior become very important. We studied several design options based on edge computing and containers for a smart building monitoring system that sends alerts to the responsible personnel when necessary. The study evaluated performance, including a qualitative analysis and load testing, for our experimental settings. From 700+ edge nodes, we obtained response times that were 30% lower for the public cloud versus the local solution. For up to 100 edge nodes, the values were better for the latter, and in between, they were rather similar. Based on an interpretation of the results, we developed recommendations for five real-world configurations, and we present the design choices adopted in our development for a complex of smart buildings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号