    The rise of open science and the absence of a global dedicated data repository for molecular dynamics (MD) simulations has led to the accumulation of MD files in generalist data repositories, constituting the dark matter of MD - data that is technically accessible, but neither indexed, curated, or easily searchable. Leveraging an original search strategy, we found and indexed about 250,000 files and 2000 datasets from Zenodo, Figshare and Open Science Framework. With a focus on files produced by the Gromacs MD software, we illustrate the potential offered by the mining of publicly available MD data. We identified systems with specific molecular composition and were able to characterize essential parameters of MD simulation such as temperature and simulation length, and could identify model resolution, such as all-atom and coarse-grain. Based on this analysis, we inferred metadata to propose a search engine prototype to explore the MD data. To continue in this direction, we call on the community to pursue the effort of sharing MD data, and to report and standardize metadata to reuse this valuable matter.






    Gel image analyses are often difficult to reproduce, as the most commonly used software, the ImageJ Gels plugin, does not automatically record any steps in the analysis process. This protocol provides detailed steps for image analysis using IOCBIO Gel software with western blot as an example; however, the protocol is applicable to all images obtained by electrophoresis, such as Southern blotting, northern blotting, and isoelectric focusing. IOCBIO Gel allows multiple sample analyses, linking the original image to all the operations performed on it, which can be stored in a central database or on a PC, ensuring ease of access and the possibility to perform corrections at each analysis stage. In addition, IOCBIO Gel is lightweight, with only minimal computer requirements. Key features • Free and open-source software for analyzing gel images. • Reproducibility. • Can be used with images obtained by electrophoresis, such as western blotting, Southern blotting, isoelectric focusing, and more.






    UNASSIGNED: Within the scope of the Helmholtz Metadata Collaboration (HMC), the ADVANCE project - Advanced metadata standards for biodiversity survey and monitoring data: supporting of research and conservation - aimed at supporting rich metadata generation with interoperable metadata standards and semantic artefacts that facilitate data access, integration and reuse across terrestrial, freshwater and marine realms. HMC\'s mission is to facilitate the discovery, access, machine-readability, and reuse of research data across and beyond the Helmholtz Association.
    UNASSIGNED: We revised, adapted and expanded existing metadata schemas, vocabularies and thesauri to build a FAIR metadata schema and a metadata entry form built on it for users to provide their metadata instances focused on biodiversity monitoring data. The schema is FAIR because it is both machine-interpretable and follows domain-relevant community standards. This report provides a general overview of the project results and instructions on how to access, re-use and complete the metadata form.






    UNASSIGNED: Publication of data from past field studies on invertebrate populations is of high importance, as there is much added value for them to be used as baselines to study spatiotemporal population and community dynamics in these groups. Therefore, a dataset consisting of occurrence data on epigaeic invertebrates collected in 1996 was standardised into the Darwin core format and cross-checked in order to make it publicly available following FAIR data principles. With publication, it can contribute to the biodiversity assessment of terrestrial invertebrates, thereby improving the availability and accessibility of much-needed historical datasets on macro-invertebrates.Here, we present sampling event data on invertebrates from four grasslands taken out of agricultural production over the span of several decades, effectively displaying a chronosequence on the effects of agricultural extensification. The data were collected by means of a standardised sampling design using pyramid traps, pitfall traps and soil samples.
    UNASSIGNED: The raw data presented in this data paper have not been published before. They consist of 20,000+ records of nearly 70,000 specimens from 121 taxonomic groups. The data were collected using a standardised field study set-up and specimens were identified by taxonomic specialists. Most groups were identified up to family level, with eight groups identified up to species level. The occurrence data are complemented by information on plant composition, meteorological data and soil physical characteristics. The dataset has been registered in the Global Biodiversity Information Facility (GBIF): http://doi.org/10.15468/7n499e.






    UNASSIGNED: Duchenne and Becker muscular dystrophy lack curative treatments. Registers can facilitate therapy development, serving as a platform to study epidemiology, assess clinical trial feasibility, identify eligible candidates, collect real-world data, perform post-market surveillance, and collaborate in (inter)national data-driven initiatives.
    UNASSIGNED: In addressing these facets, it\'s crucial to gather high-quality, interchangeable, and reusable data from a representative population. We introduce the Dutch Dystrophinopathy Database (DDD), a national registry for patients with DMD or BMD, and females with pathogenic DMD variants, outlining its design, governance, and use.
    UNASSIGNED: The design of DDD is based on a system-independent information model that ensures interoperable and reusable data adhering to international standards. To maximize enrollment, patients can provide consent online and participation is allowed on different levels with contact details and clinical diagnosis as minimal requirement. Participants can opt-in for yearly online questionnaires on disease milestones and medication and to have clinical data stored from visits to one of the national reference centers. Governance involves a general board, advisory board and database management.
    UNASSIGNED: On November 1, 2023, 742 participants were enrolled. Self-reported data were provided by 291 Duchenne, 122 Becker and 38 female participants. 96% of the participants visiting reference centers consented to store clinical data. Eligible patients were informed about clinical studies through DDD, and multiple data requests have been approved to use coded clinical data for quality control, epidemiology and natural history studies.
    UNASSIGNED: The Dutch Dystrophinopathy Database captures long-term patient and high-quality standardized clinician reported healthcare data, supporting trial readiness, post-marketing surveillance, and effective data use using a multicenter design that is scalable to other neuromuscular disorders.






    Open science (OS) awareness and skills are increasingly becoming an essential part of everyday scientific work as e.g., many journals require authors to share data. However, following an OS workflow can seem challenging at first. Thus, instructions by journals and other guidelines are important. But how comprehensive are they in the field of ecology and evolutionary biology (Ecol Evol)? To find this out, we reviewed 20 published OS guideline articles aimed for ecologists or evolutionary biologists, together with the data policies of 17 Ecol Evol journals to chart the current landscape of OS guidelines in the field, find potential gaps, identify field-specific barriers for OS and discuss solutions to overcome these challenges. We found that many of the guideline articles covered similar topics, despite being written for a narrow field or specific target audience. Likewise, many of the guideline articles mentioned similar obstacles that could hinder or postpone a transition to open data sharing. Thus, there could be a need for a more widely known, general OS guideline for Ecol Evol. Following the same guideline could also enhance the uniformity of the OS practices carried on in the field. However, some topics, like long-term experiments and physical samples, were mentioned surprisingly seldom, although they are typical issues in Ecol Evol. Of the journals, 15 out of 17 expected or at least encouraged data sharing either for all articles or under specific conditions, e.g. for registered reports and 10 of those required data sharing at the submission phase. The coverage of journal data policies varied greatly between journals, from practically non-existing to very extensive. As journals can contribute greatly by leading the way and making open data useful, we recommend that the publishers and journals would invest in clear and comprehensive data policies and instructions for authors.
    Avoimen tieteen ymmärrys ja taitojen hallinta on yhä tärkeämpi osa tutkijan arkea, sillä esimerkiksi monet tieteelliset lehdet odottavat aineiston avointa jakamista. Avoimen tieteen työtapojen noudattaminen voi kuitenkin tuntua alkuun haastavalta, minkä vuoksi esimerkiksi tieteellisten lehtien ja muiden tahojen laatimat ohjeet ovat tärkeitä. Mutta kuinka kattavia ne ovat ekologian ja evoluutiobiologian alalla? Kävimme läpi 20 julkaistua ekologeille tai evoluutiobiologeille suunnattua avoimen tieteen ohjeistusta sekä 17 ekologian ja evoluutiobiologian tieteellisen lehden datakäytännöt, tarkoituksenamme kartoittaa alojen avoimen tieteen ohjeiden nykytilaa, löytää mahdollisia puutteita, tunnistaa alakohtaisia esteitä avoimen tieteen käytäntöjen toteutumiselle sekä keskustella ratkaisuista, joilla nämä haasteet voitaisiin ratkaista. Havaitsimme, että monet ohjeistukset käsittelivät samankaltaisia aiheita, vaikka ne oli tarkoitettu kapealle erityisalalle tai suunnattu hyvin rajoitetulle kohderyhmälle. Samoin monissa ohjeistuksissa mainittiin samankaltaisia aineistojen avoimen jakamisen hidastamista tai estämistä aiheuttavia haasteita. Toiset aiheet, kuten pitkäaikaiskokeet ja fyysiset näytteet, sen sijaan mainittiin yllättävän harvoin, vaikka niissä on tyypillisiä ekologian ja evoluutiobiologian alojen haasteita. Tieteellisistä lehdistä 15:ssä 17:sta vaadittiin tai vähintään kannustettiin jakamaan aineisto avoimesti joko kaikkien artikkelien osalta tai tietyin edellytyksin, esim. rekisteröityjen tutkimusraporttien osalta. Lisäksi 10 näistä lehdistä edellytti aineiston avointa jakamista jo submittointivaiheessa. Tieteellisten lehtien aineisto‐ohjeiden kattavuus vaihteli suuresti lehtien välillä, käytännössä olemattomasta hyvin laajaan. Koska tieteellisillä lehdillä on suuri vaikutusvalta avoimen tieteen käytäntöjen edistämiseen, suosittelemme kustantajia ja lehtiä panostamaan selkeisiin ja kattaviin aineistolinjauksiin ja ohjeistuksiin.






    Recent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community. To facilitate this, we developed Koina, an open-source containerized, decentralized and online-accessible high-performance prediction service that enables ML/DL model usage in any pipeline. Using the widely used FragPipe computational platform as example, we show how Koina can be easily integrated with existing proteomics software tools and how these integrations improve data analysis.






    BACKGROUND: Artificial intelligence (AI) and machine learning (ML) technology design and development continues to be rapid, despite major limitations in its current form as a practice and discipline to address all sociohumanitarian issues and complexities. From these limitations emerges an imperative to strengthen AI and ML literacy in underserved communities and build a more diverse AI and ML design and development workforce engaged in health research.
    OBJECTIVE: AI and ML has the potential to account for and assess a variety of factors that contribute to health and disease and to improve prevention, diagnosis, and therapy. Here, we describe recent activities within the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) Ethics and Equity Workgroup (EEWG) that led to the development of deliverables that will help put ethics and fairness at the forefront of AI and ML applications to build equity in biomedical research, education, and health care.
    METHODS: The AIM-AHEAD EEWG was created in 2021 with 3 cochairs and 51 members in year 1 and 2 cochairs and ~40 members in year 2. Members in both years included AIM-AHEAD principal investigators, coinvestigators, leadership fellows, and research fellows. The EEWG used a modified Delphi approach using polling, ranking, and other exercises to facilitate discussions around tangible steps, key terms, and definitions needed to ensure that ethics and fairness are at the forefront of AI and ML applications to build equity in biomedical research, education, and health care.
    RESULTS: The EEWG developed a set of ethics and equity principles, a glossary, and an interview guide. The ethics and equity principles comprise 5 core principles, each with subparts, which articulate best practices for working with stakeholders from historically and presently underrepresented communities. The glossary contains 12 terms and definitions, with particular emphasis on optimal development, refinement, and implementation of AI and ML in health equity research. To accompany the glossary, the EEWG developed a concept relationship diagram that describes the logical flow of and relationship between the definitional concepts. Lastly, the interview guide provides questions that can be used or adapted to garner stakeholder and community perspectives on the principles and glossary.
    CONCLUSIONS: Ongoing engagement is needed around our principles and glossary to identify and predict potential limitations in their uses in AI and ML research settings, especially for institutions with limited resources. This requires time, careful consideration, and honest discussions around what classifies an engagement incentive as meaningful to support and sustain their full engagement. By slowing down to meet historically and presently underresourced institutions and communities where they are and where they are capable of engaging and competing, there is higher potential to achieve needed diversity, ethics, and equity in AI and ML implementation in health research.






    This paper presents the data (images, observations, metadata) of three different deployments of camera traps in the Amsterdam Water Supply Dunes, a Natura 2000 nature reserve in the coastal dunes of the Netherlands. The pilots were aimed at determining how different types of camera deployment (e.g. regular vs. wide lens, various heights, inside/outside exclosures) might influence species detections, and how to deploy autonomous wildlife monitoring networks. Two pilots were conducted in herbivore exclosures and mainly detected European rabbits (Oryctolagus cuniculus) and red fox (Vulpes vulpes). The third pilot was conducted outside exclosures, with the European fallow deer (Dama dama) being most prevalent. Across all three pilots, a total of 47,597 images were annotated using the Agouti platform. All annotations were verified and quality-checked by a human expert. A total of 2,779 observations of 20 different species (including humans) were observed using 11 wildlife cameras during 2021-2023. The raw image files (excluding humans), image metadata, deployment metadata and observations from each pilot are shared using the Camtrap DP open standard and the extended data publishing capabilities of GBIF to increase the findability, accessibility, interoperability, and reusability of this data. The data are freely available and can be used for developing artificial intelligence (AI) algorithms that automatically detect and identify species from wildlife camera images.






    FAIR Digital Object (FDO) is an emerging concept that is highlighted by European Open Science Cloud (EOSC) as a potential candidate for building an ecosystem of machine-actionable research outputs. In this work we systematically evaluate FDO and its implementations as a global distributed object system, by using five different conceptual frameworks that cover interoperability, middleware, FAIR principles, EOSC requirements and FDO guidelines themself. We compare the FDO approach with established Linked Data practices and the existing Web architecture, and provide a brief history of the Semantic Web while discussing why these technologies may have been difficult to adopt for FDO purposes. We conclude with recommendations for both Linked Data and FDO communities to further their adaptation and alignment.





