Programming Languages

编程语言
  • 文章类型: Journal Article
    Haxe是一个通用的目的,支持语法宏的面向对象的编程语言。Haxe编译器以其能够将Haxe程序的源代码翻译成包括Java在内的各种其他编程语言的源代码而闻名,C++,JavaScript,和Python。尽管Haxe越来越多地用于各种目的,包括游戏,它尚未引起生物信息学家的广泛关注。这令人惊讶,因为Haxe允许生成同一程序的不同版本(例如,在Web浏览器中为初学者运行的JavaScript图形用户界面版本和C++或Python中的命令行版本以提高性能),同时维护单个代码,许多生物信息学应用应该感兴趣的功能。为了证明Haxe在生物信息学中的有用性,我们在这里介绍Seqphase程序的案例,最初用Perl编写(在服务器上运行CGI版本),并于2010年发布。由于出于安全目的,Perl+CGI不再是可取的,我们决定在Haxe中重写SeqPHASE程序,并将其托管在Github页面(https://eeg-ebe。github.io/Seqphase),从而减轻了配置和维护专用服务器的需要。以SeqPHASE为例,我们讨论了Haxe的源代码转换功能在实现生物信息学软件时的优缺点。
    Haxe is a general purpose, object-oriented programming language supporting syntactic macros. The Haxe compiler is well known for its ability to translate the source code of Haxe programs into the source code of a variety of other programming languages including Java, C++, JavaScript, and Python. Although Haxe is more and more used for a variety of purposes, including games, it has not yet attracted much attention from bioinformaticians. This is surprising, as Haxe allows generating different versions of the same program (e.g. a graphical user interface version in JavaScript running in a web browser for beginners and a command-line version in C++ or Python for increased performance) while maintaining a single code, a feature that should be of interest for many bioinformatic applications. To demonstrate the usefulness of Haxe in bioinformatics, we present here the case story of the program SeqPHASE, written originally in Perl (with a CGI version running on a server) and published in 2010. As Perl+CGI is not desirable anymore for security purposes, we decided to rewrite the SeqPHASE program in Haxe and to host it at Github Pages (https://eeg-ebe.github.io/SeqPHASE), thereby alleviating the need to configure and maintain a dedicated server. Using SeqPHASE as an example, we discuss the advantages and disadvantages of Haxe\'s source code conversion functionality when it comes to implementing bioinformatic software.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人们应该假设系统生物学中的计算机模拟实验比它们的湿实验室对应物更不容易受到可重复性问题的影响。因为它们没有自然的生物变异,它们的环境可以完全控制。然而,最近的研究表明,只有一半的已发表的生物系统的数学模型可以复制没有实质性的努力。在本文中,我们以房室结的一维数学模型为例,研究了复制失败或繁琐的潜在原因,我们花了四个月的时间来繁殖。该模型表明,即使是严格的研究,由于缺少信息,也很难重现。方程和参数中的错误,缺乏可用的数据文件,不可执行代码,缺少或不完整的实验方案,缺少方程式背后的基本原理。这些问题中的许多似乎与软件工程中使用单元测试等技术解决的问题相似,回归测试,持续集成,版本控制,档案服务,和一个全面的模块化设计与广泛的文档。应用这些技术,我们使用建模语言Modelica重新实现被检查的模型。生成的工作流程与模型无关,可以转换为SBML,CellML,和其他语言。它通过在物理上与开发环境分离的服务器上的虚拟机中执行自动测试来保证方法的可重复性。此外,它有助于结果的重现性,因为模型更易于理解,并且因为完整的模型代码,实验协议,和仿真数据已发布,并且可以在本文中使用的确切版本中进行访问。我们发现额外的设计和文档工作是合理的,即使只是考虑开发过程中的直接好处,如更容易和更快的调试,增加方程的可理解性,并减少了从文献中查找细节的要求。
    One should assume that in silico experiments in systems biology are less susceptible to reproducibility issues than their wet-lab counterparts, because they are free from natural biological variations and their environment can be fully controlled. However, recent studies show that only half of the published mathematical models of biological systems can be reproduced without substantial effort. In this article we examine the potential causes for failed or cumbersome reproductions in a case study of a one-dimensional mathematical model of the atrioventricular node, which took us four months to reproduce. The model demonstrates that even otherwise rigorous studies can be hard to reproduce due to missing information, errors in equations and parameters, a lack in available data files, non-executable code, missing or incomplete experiment protocols, and missing rationales behind equations. Many of these issues seem similar to problems that have been solved in software engineering using techniques such as unit testing, regression tests, continuous integration, version control, archival services, and a thorough modular design with extensive documentation. Applying these techniques, we reimplement the examined model using the modeling language Modelica. The resulting workflow is independent of the model and can be translated to SBML, CellML, and other languages. It guarantees methods reproducibility by executing automated tests in a virtual machine on a server that is physically separated from the development environment. Additionally, it facilitates results reproducibility, because the model is more understandable and because the complete model code, experiment protocols, and simulation data are published and can be accessed in the exact version that was used in this article. We found the additional design and documentation effort well justified, even just considering the immediate benefits during development such as easier and faster debugging, increased understandability of equations, and a reduced requirement for looking up details from the literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Programming is one of the most crucial abilities for students in science and technology courses. Few studies on programming ability have considered the effect of students\' construal levels on their learning performance. Therefore, the effects of students\' construal level were explored in this study to fill this research gap and open a new avenue for the improvements in programming ability. The research participants were 110 seventh- and eighth-grade students with basic programming abilities taking an Arduino course. Data were collected from online questionnaires and analyzed using two-way analysis of variance and structural equation modeling to investigate the relationships among construal levels, programming ability, and learning satisfaction. The results revealed that students\' construal levels affect their learning satisfaction and programming ability. These findings indicate that teaching strategies could effectively improve the learning satisfaction and programming ability of junior high school students.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our difficulties in reproducing a published bioinformatics method even though code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the whole method in a Python package to avoid dependency on a MATLAB license and ease the execution of the code on a high-performance computing cluster. Third, we assessed reusability of our reimplementation and the quality of our documentation, testing how easy it would be to start from our implementation to reproduce the results. In a second section, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective levels.While finalizing our code, we created case-specific documentation and tutorials for the associated Python package StratiPy. Readers are invited to experiment with our reproducibility case study by generating the two confusion matrices (see more in section \"Robustness: from MATLAB to Python, language and organization\"). Here, we propose two options: a step-by-step process to follow in a Jupyter/IPython notebook or a Docker container ready to be built and run.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    在大量非结构化数据集中识别罕见但重要的医疗保健事件已成为医疗保健数据分析中的一项常见任务。然而,许多实际数据集中的类分布不平衡极大地阻碍了罕见事件的检测,因为大多数分类方法都隐含地假设类的出现相等,并且旨在最大程度地提高整体分类精度。在这项研究中,我们开发了一个框架,通过纳入不同的再平衡策略来学习分布不平衡的医疗保健数据。评估结果表明,所开发的框架可以显着提高由于相似声音(LASA)混淆引起的医疗事件的检测准确性。具体来说,逻辑回归结合合成少数过采样技术(SMOTE)产生最佳检测结果,与纯逻辑回归(召回=52.1%)相比,召回率显着提高了45.3%(召回率=75.7%)。
    Identifying rare but significant healthcare events in massive unstructured datasets has become a common task in healthcare data analytics. However, imbalanced class distribution in many practical datasets greatly hampers the detection of rare events, as most classification methods implicitly assume an equal occurrence of classes and are designed to maximize the overall classification accuracy. In this study, we develop a framework for learning healthcare data with imbalanced distribution via incorporating different rebalancing strategies. The evaluation results showed that the developed framework can significantly improve the detection accuracy of medical incidents due to look-alike sound-alike (LASA) mix-ups. Specifically, logistic regression combined with the synthetic minority oversampling technique (SMOTE) produces the best detection results, with a significant 45.3% increase in recall (recall = 75.7%) compared with pure logistic regression (recall = 52.1%).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    In recent years, RNA-seq has become an important method in the process of measuring gene expression in various cells and organisms. This chapter will detail all the bioinformatic steps that should be undertaken to determine differentially expressed genes from a typical RNA-seq experiment. Each step will be clearly explained in \"non-bioinformatic\" terminology so that readers embarking on RNA-seq analysis will be able to understand the rationale and reasoning behind each step. Moreover, the exact command lines used to process the data will be presented along with a description of the various flags and commands.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    地理剖析技术用于寻找一系列犯罪的起源。该方法最近被扩展到其他领域。流行病学中最著名的数据之一是约翰·斯诺在伦敦爆发霍乱期间的数据。我们编写了Python脚本来执行分析,以通过使用旧的Snow\的数据集来应用地理概况来个性化感染的起始起源。我们通过对报告霍乱病例的地图的每个点应用权重来修改该方法。权重与给定位置的病例数成正比。地理分析方法的这种修改允许在地图中个性化感染源的最大概率区域,几米宽,包括历史上已知的霍乱来源,那是布罗德街的“经典”水泵。当可以在地图上总结有关感染病例的可用数据时,该方法似乎是一种有用的补充,可以个性化流行病的来源。
    Geographic Profiling technique is used to find the origin of a series of crimes. The method was recently extended to other fields. One of the best renowned data in epidemiology is that by John Snow during an outburst of cholera in London. We wrote Python scripts to perform the analyses to apply the Geographic Profiling for individuating the starting origin of an infection by using the old Snow\'s data set. We modified the method by applying a weight to each point of the map where cases of cholera were reported. The weight was proportional to the number of cases in a given location. This modification of the Geographic Profiling method allowed to individuate in the map an area of maximum probability of the infection source, which was a few meters wide and including the historically known source of cholera, that is the \"classical\" water pump at Broad Street. The method appears to be a useful complement in order to individuate the source of epidemics when available data about the cases of the infections can be summarized on a map.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    This study examined ways to improve the accuracy of translating clinical practice guidelines (CPGs) into a computer-interpretable guideline (CIG) for pressure-ulcer management using the Shareable Active Guideline Environment (SAGE) guideline model, and aimed to verify the accuracy of the obtained CIG. The study was conducted using the following procedures: selecting CPGs, extracting rules from the selected CPGs, developing a CIG using the SAGE guideline model, and verifying the obtained CIG with test cases using an execution engine. The CIG for pressure-ulcer management was developed based on 38 rules and three algorithms at the semiformal representation level using MS Excel and MS Visio. The CIG was encoded by two Activity Graphs consisting of 115 instances representing algorithms and rules as knowledge elements in the SAGE guideline model. Two errors were found and corrected. Results of the study demonstrated that a CIG representing knowledge on pressure-ulcer management can be effectively developed using commonly available programs and the SAGE guideline model, and that the obtained CIG can be verified with a locally developed execution engine. The CIG developed in the study could contribute to health information management once it is implemented successfully in a clinical decision support system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    RegulonDB是存储细菌大肠杆菌的转录调控网络(TRN)背后的生物信息的数据库。它是系统生物学研究细菌基因调控的关键生物信息学资源之一。像大多数生物数据库一样,内容随着时间的推移而漂移,这既是由于新信息的积累,也是由于基础生物学概念的完善。基于以前数据库版本的结论可能不再适用。这里,我们研究了大肠杆菌TRN的一些拓扑性质的变化,由RegulonDB提供的16个版本,以及一个简单的索引,数字控制强度,量化基因表达谱和转录调控网络之间的匹配。虽然许多网络特征在不同版本中发生了巨大变化,数字控制强度仍然相当稳健,并与该指数的先前结果保持一致。我们的研究表明:(I)从网络拓扑得出的结果应该是,如果可能,在一系列数据库版本中进行研究,在得出详细的生物学结论之前,和(Ii)诉诸简单指数,从网络角度解释高吞吐量数据时,可能有助于实现研究结果对潜在生物信息变化的稳健性。数据库URL:www。regulondb。ccg.乌纳姆。mx.
    RegulonDB is a database storing the biological information behind the transcriptional regulatory network (TRN) of the bacterium Escherichia coli. It is one of the key bioinformatics resources for Systems Biology investigations of bacterial gene regulation. Like most biological databases, the content drifts with time, both due to the accumulation of new information and due to refinements in the underlying biological concepts. Conclusions based on previous database versions may no longer hold. Here, we study the change of some topological properties of the TRN of E. coli, as provided by RegulonDB across 16 versions, as well as a simple index, digital control strength, quantifying the match between gene expression profiles and the transcriptional regulatory networks. While many of network characteristics change dramatically across the different versions, the digital control strength remains rather robust and in tune with previous results for this index. Our study shows that: (i) results derived from network topology should, when possible, be studied across a range of database versions, before detailed biological conclusions are derived, and (ii) resorting to simple indices, when interpreting high-throughput data from a network perspective, may help achieving a robustness of the findings against variation of the underlying biological information. Database URL: www.regulondb.ccg.unam.mx.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号