Programming Languages

  • 文章类型: Systematic Review
    Code clones, referring to code fragments that are either similar or identical and are copied and pasted within software systems, have negative effects on both software quality and maintenance. The objective of this work is to systematically review and analyze recurrent neural network techniques used to detect code clones to shed light on the current techniques and offer valuable knowledge to the research community. Upon applying the review protocol, we have successfully identified 20 primary studies within this field from a total of 2099 studies. A deep investigation of these studies reveals that nine recurrent neural network techniques have been utilized for code clone detection, with a notable preference for LSTM techniques. These techniques have demonstrated their efficacy in detecting both syntactic and semantic clones, often utilizing abstract syntax trees for source code representation. Moreover, we observed that most studies applied evaluation metrics like F-score, precision, and recall. Additionally, these studies frequently utilized datasets extracted from open-source systems coded in Java and C programming languages. Notably, the Graph-LSTM technique exhibited superior performance. PyTorch and TensorFlow emerged as popular tools for implementing RNN models. To advance code clone detection research, further exploration of techniques like parallel LSTM, sentence-level LSTM, and Tree-Structured GRU is imperative. In addition, more research is needed to investigate the capabilities of the recurrent neural network techniques for identifying semantic clones across different programming languages and binary codes. The development of standardized benchmarks for languages like Python, Scratch, and C#, along with cross-language comparisons, is essential. Therefore, the utilization of recurrent neural network techniques for clone identification is a promising area that demands further research.






  • 文章类型: Journal Article
    Teaching introductory programming courses is not an easy task. Instructors of introductory programming courses are facing many challenges related to the nature of programming, the students\' characteristics and the traditional teaching methods that they are using. Blended learning seems to be a promising approach to address these challenges. Many studies concluded that blended learning can be more effective than traditional teaching and can improve students\' learning experience. However, the current state of knowledge and practice in applying blended learning to introductory programming courses is limited. In an attempt to begin remedying this gap, this review synthesizes the different blended learning approaches that have been applied in introductory programming courses. It classifies them into five models then discusses the impact of each of these models on the learning experience of novice programmers. It concludes by providing some recommendations for instructors who want to blend their courses as well as some implications for future research.







  • 文章类型: Journal Article
    With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R.
    In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions.
    We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results.
    This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance.







  • 文章类型: Journal Article
    Communicating radiological reports to peers has pedagogical value. Students may be uneasy with the process due to a lack of communication and peer review skills or to their failure to see value in the process. We describe a communication exercise with peer review in an undergraduate veterinary radiology course. The computer code used to manage the course and deliver images online is reported, and we provide links to the executable files. We tested to see if undergraduate peer review of radiological reports has validity and describe student impressions of the learning process. Peer review scores for student-generated radiological reports were compared to scores obtained in the summative multiple choice (MCQ) examination for the course. Student satisfaction was measured using a bespoke questionnaire. There was a weak positive correlation (Pearson correlation coefficient = 0.32, p < 0.01) between peer review scores students received and the student scores obtained in the MCQ examination. The difference in peer review scores received by students grouped according to their level of course performance (high vs. low) was statistically significant (p < 0.05). No correlation was found between peer review scores awarded by the students and the scores they obtained in the MCQ examination (Pearson correlation coefficient = 0.17, p = 0.14). In conclusion, we have created a realistic radiology imaging exercise with readily available software. The peer review scores are valid in that to a limited degree they reflect student future performance in an examination. Students valued the process of learning to communicate radiological findings but do not fully appreciated the value of peer review.






  • 文章类型: Journal Article
    BACKGROUND: In order to further advance research and development on the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) standard, the existing research must be well understood. This paper presents a methodological review of the ODM literature. Specifically, it develops a classification schema to categorize the ODM literature according to how the standard has been applied within the clinical research data lifecycle. This paper suggests areas for future research and development that address ODM\'s limitations and capitalize on its strengths to support new trends in clinical research informatics.
    METHODS: A systematic scan of the following databases was performed: (1) ABI/Inform, (2) ACM Digital, (3) AIS eLibrary, (4) Europe Central PubMed, (5) Google Scholar, (5) IEEE Xplore, (7) PubMed, and (8) ScienceDirect. A Web of Science citation analysis was also performed. The search term used on all databases was \"CDISC ODM.\" The two primary inclusion criteria were: (1) the research must examine the use of ODM as an information system solution component, or (2) the research must critically evaluate ODM against a stated solution usage scenario. Out of 2686 articles identified, 266 were included in a title level review, resulting in 183 articles. An abstract review followed, resulting in 121 remaining articles; and after a full text scan 69 articles met the inclusion criteria.
    RESULTS: As the demand for interoperability has increased, ODM has shown remarkable flexibility and has been extended to cover a broad range of data and metadata requirements that reach well beyond ODM\'s original use cases. This flexibility has yielded research literature that covers a diverse array of topic areas. A classification schema reflecting the use of ODM within the clinical research data lifecycle was created to provide a categorized and consolidated view of the ODM literature. The elements of the framework include: (1) EDC (Electronic Data Capture) and EHR (Electronic Health Record) infrastructure; (2) planning; (3) data collection; (4) data tabulations and analysis; and (5) study archival. The analysis reviews the strengths and limitations of ODM as a solution component within each section of the classification schema. This paper also identifies opportunities for future ODM research and development, including improved mechanisms for semantic alignment with external terminologies, better representation of the CDISC standards used end-to-end across the clinical research data lifecycle, improved support for real-time data exchange, the use of EHRs for research, and the inclusion of a complete study design.
    CONCLUSIONS: ODM is being used in ways not originally anticipated, and covers a diverse array of use cases across the clinical research data lifecycle. ODM has been used as much as a study metadata standard as it has for data exchange. A significant portion of the literature addresses integrating EHR and clinical research data. The simplicity and readability of ODM has likely contributed to its success and broad implementation as a data and metadata standard. Keeping the core ODM model focused on the most fundamental use cases, while using extensions to handle edge cases, has kept the standard easy for developers to learn and use.







  • 文章类型: Journal Article
    BACKGROUND: The interoperability of the Electrocardiogram (ECG) between heterogeneous systems has been facilitated by not one, but a number of predefined open storage formats. To improve the techniques currently used, it is important to define the similarities and the differences between these ECG storage formats.
    METHODS: This paper presents a review of 9 formats used to store the ECG. Three of the predominant formats, namely, SCP-ECG, DICOM-ECG, and HL7 aECG are reviewed in detail along with the undertaking of a SWOT analysis. The remaining formats have been examined to a lesser extent as they are not as predominant in the literature.
    CONCLUSIONS: This study suggests that a plethora of open ECG formats, all aiming to promote interoperability has the opposite effect of adding more complexity. This paper discusses whether a format supporting a variety of diagnostic modalities is more advantageous than a format that only supports the ECG. It is conclusive that a general purpose format such as DICOM solves more interoperability issues, however, no general purpose format currently exists that fulfils the requirements of all users. As a result, the healthcare industry has been bombarded with custom storage formats, i.e., a format for storing the resting ECG, a format for storing the ambulatory ECG, a format for storing the ECG in clinical trials, a format for storing ECG data on mobile devices etc. This study then examines which implementation method is more suited to encode ECG data, i.e. binary or XML. Binary encoding has been used in the past to store the ECG, however, unlike binary, XML files are human readable, searchable and provide a better form of semantics. Based on analysis within this work it is speculated that XML may overtake binary as the preferred implementation method for encoding ECG data since it has already made a huge impact in the healthcare industry.
    CONCLUSIONS: It can be concluded that there is a wide range of vastly different techniques used to store the ECG. Although the specifications of these formats are openly available, neither has been internationally adopted to be used with all ECG machines. Therefore, there remains a lack of global interoperability of ECG information.






  • 文章类型: Journal Article
    One important aim within systems biology is to integrate disparate pieces of information, leading to discovery of higher-level knowledge about important functionality within living organisms. This makes standards for representation of data and technology for exchange and integration of data important key points for development within the area. In this article, we focus on the recent developments within the field. We compare the recent updates to the three standard representations for exchange of data SBML, PSI MI and BioPAX. In addition, we give an overview of available tools for these three standards and a discussion on how these developments support possibilities for data exchange and integration.






  • 文章类型: Journal Article
    InterMed is a collaboration among research groups from Stanford, Harvard, and Columbia Universities. The primary goal of InterMed has been to develop a sharable language that could serve as a standard for modeling computer-interpretable guidelines (CIGs). This language, called GuideLine Interchange Format (GLIF), has been developed in a collaborative manner and in an open process that has welcomed input from the larger community. The goals and experiences of the InterMed project and lessons that the authors have learned may contribute to the work of other researchers who are developing medical knowledge-based tools. The lessons described include (1) a work process for multi-institutional research and development that considers different viewpoints, (2) an evolutionary lifecycle process for developing medical knowledge representation formats, (3) the role of cognitive methodology to evaluate and assist in the evolutionary development process, (4) development of an architecture and (5) design principles for sharable medical knowledge representation formats, and (6) a process for standardization of a CIG modeling language.






  • 文章类型: Comparative Study
    Representation of clinical practice guidelines in a computer-interpretable format is a critical issue for guideline development, implementation, and evaluation. We studied 11 types of guideline representation models that can be used to encode guidelines in computer-interpretable formats. We have consistently found in all reviewed models that primitives for representation of actions and decisions are necessary components of a guideline representation model. Patient states and execution states are important concepts that closely relate to each other. Scheduling constraints on representation primitives can be modeled as sequences, concurrences, alternatives, and loops in a guideline\'s application process. Nesting of guidelines provides multiple views to a guideline with different granularities. Integration of guidelines with electronic medical records can be facilitated by the introduction of a formal model for patient data. Data collection, decision, patient state, and intervention constitute four basic types of primitives in a guideline\'s logic flow. Decisions clarify our understanding on a patient\'s clinical state, while interventions lead to the change from one patient state to another.






  • 文章类型: Journal Article
    In this paper the use for research purposes of an existing data management system, ADAMO (A Database Management system for Oncology), is described. The aim of this paper is to discuss the experiences, obtained with this \'home-made\' system and to describe some of the extensions that were recently made. Reasons are presented why the system is still extensively used by clinicians although a number of commercial database management systems is now available on personal computers. These database systems are more flexible than the system described here. It is concluded that it is precisely this flexibility of current systems that prevents an optimal use by busy clinicians. Clinicians need a research system that contains just the functions that they need. These functions have to be available via simple commands, so that no additional programming--even at the high level of a query language--is necessary.





