graph database

  • 文章类型: Journal Article
    背景:为了提高生物医学的多样性和包容性,国家研究指导网络(NRMN)开发了一个基于网络的国家指导平台(MyNRMN),旨在将导师和受训者联系起来,以支持在生物医学科学中代表性不足的少数群体的持续存在。截至2024年5月15日,MyNRMN平台,提供指导,网络,和专业的开发工具,促进了教师之间超过12,100个独特的指导联系,学生,和生物医学领域的研究人员。
    目的:本研究旨在研究跨机构和地理边界的学生(受训者)和教师(导师)之间的基于网络的平台所促进的大规模导师联系。使用创新的图形数据库,我们分析了生物医学科学中不同人口统计学特征的导师和受训者之间的不同指导联系.
    方法:通过MyNRMN平台,我们观察了个人资料数据,并分析了学生和教师之间按种族划分的跨机构边界的指导联系,种族,性别,机构类型,以及2016年7月1日至2021年5月31日之间的教育程度。
    结果:总计,在1625个机构中,有15,024个连接与2222名受训者和1652名导师提供数据。女学员参加连接人数最多(3996/6108,65%),而女性导师参与了58%(5206/8916)的连接。黑人受训者占连接的38%(2297/6108),而怀特导师参与了56%(5036/8916)的连接。受训者主要来自归类为研究1的机构(R1;博士大学-非常高的研究活动)和历史上的黑人学院和大学(556/2222,25%和307/2222,14%,分别),而31%(504/1652)的导师来自R1机构。
    结论:迄今为止,在整个美国的机构之间建立导师联系的效用以及导师和受训者之间的联系是未知的。本研究使用广泛的基于Web的指导网络检查了这些连接以及这些连接的多样性。
    BACKGROUND: With an overarching goal of increasing diversity and inclusion in biomedical sciences, the National Research Mentoring Network (NRMN) developed a web-based national mentoring platform (MyNRMN) that seeks to connect mentors and mentees to support the persistence of underrepresented minorities in the biomedical sciences. As of May 15, 2024, the MyNRMN platform, which provides mentoring, networking, and professional development tools, has facilitated more than 12,100 unique mentoring connections between faculty, students, and researchers in the biomedical domain.
    OBJECTIVE: This study aimed to examine the large-scale mentoring connections facilitated by our web-based platform between students (mentees) and faculty (mentors) across institutional and geographic boundaries. Using an innovative graph database, we analyzed diverse mentoring connections between mentors and mentees across demographic characteristics in the biomedical sciences.
    METHODS: Through the MyNRMN platform, we observed profile data and analyzed mentoring connections made between students and faculty across institutional boundaries by race, ethnicity, gender, institution type, and educational attainment between July 1, 2016, and May 31, 2021.
    RESULTS: In total, there were 15,024 connections with 2222 mentees and 1652 mentors across 1625 institutions contributing data. Female mentees participated in the highest number of connections (3996/6108, 65%), whereas female mentors participated in 58% (5206/8916) of the connections. Black mentees made up 38% (2297/6108) of the connections, whereas White mentors participated in 56% (5036/8916) of the connections. Mentees were predominately from institutions classified as Research 1 (R1; doctoral universities-very high research activity) and historically Black colleges and universities (556/2222, 25% and 307/2222, 14%, respectively), whereas 31% (504/1652) of mentors were from R1 institutions.
    CONCLUSIONS: To date, the utility of mentoring connections across institutions throughout the United States and how mentors and mentees are connected is unknown. This study examined these connections and the diversity of these connections using an extensive web-based mentoring network.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    医学中不一致的疾病编码标准在数据交换和分析中造成了障碍。本文提出了一种机器学习系统来应对这一挑战。系统自动匹配非结构化医疗文本(医生笔记,投诉)到ICD-10代码。它利用了一个独特的体系结构,该体系结构具有用于模型开发的训练层和一个可捕获症状与疾病之间关系的知识库。使用来自大型医学研究中心的数据进行的实验证明了该系统在疾病分类预测中的有效性。Logistic回归由于其优越的处理速度而成为最优模型,在高负载测试期间,以可接受的错误率实现81.07%的精度。这种方法通过克服编码标准的不兼容性和从非结构化医学文本自动进行代码预测,提供了一种有前途的解决方案来改善医疗保健信息学。
    Inconsistent disease coding standards in medicine create hurdles in data exchange and analysis. This paper proposes a machine learning system to address this challenge. The system automatically matches unstructured medical text (doctor notes, complaints) to ICD-10 codes. It leverages a unique architecture featuring a training layer for model development and a knowledge base that captures relationships between symptoms and diseases. Experiments using data from a large medical research center demonstrated the system\'s effectiveness in disease classification prediction. Logistic regression emerged as the optimal model due to its superior processing speed, achieving an accuracy of 81.07% with acceptable error rates during high-load testing. This approach offers a promising solution to improve healthcare informatics by overcoming coding standard incompatibility and automating code prediction from unstructured medical text.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在伊斯兰领域,哈迪斯非常重要,站在古兰经之后的关键文本。每个圣训包含三个主要部分:ISNAD(叙述者链),TARAF(起始部分,通常来自先知穆罕默德),和MATN(圣训内容)。ISNAD,涉及传输特定MATN的叙述者链。圣训学者通过ISNAD的质量来确定传输的MATN的可信度。ISNAD的数据以其原始阿拉伯语提供,叙述者的名字音译成英语。本文介绍了Multi-IsnadSet(MIS),具有被社会科学家和神学家利用的巨大潜力。多向图结构用于表示Hadith叙述者之间的复杂相互作用。MIS数据集表示由2092个节点组成的有向图,代表个人叙述者,和77,797条边缘代表Sanad-Hadith连接。MIS数据集表示基于Sahih穆斯林圣训书的圣训的多个ISNAD。数据集是使用数据抓取和网络抓取技术工具从在线多个Hadith来源中精心提取的,提供广泛的圣训细节。每个数据集条目都提供了特定Hadith的完整视图,包括原著,圣训号码,文本内容(MATN),叙述者名单,叙述者计数,一系列的叙述者,和ISNAD计数。在本文中,设计和构建了四种不同的工具来建模和分析叙事网络,如python库(NetworkX),强大的图形数据库Neo4j和两个不同的网络分析工具,名为Gephi和CytoScape。Neo4j图数据库用于表示多维图相关数据,以便于提取和建立节点之间的新关系。研究人员可以使用MIS来探索圣训的可信度,包括对圣训的分类(Sahih=Sanad中的完美/Dhaif=Sanad中的不完美),和叙述者(值得信赖/不值得信赖)。传统上,学者们专注于确定两个叙述者之间最长和最短的萨纳德,但在MIS中,重点转移到确定最佳/真实的Sanad,考虑到叙述者的素质。真实和手动策划的数据集的图形表示将为开发可以识别链和叙述者的重要性的计算模型开辟道路。该数据集允许研究人员提供Hadith叙述者和HadithISNAD,可用于与Hadith身份验证和规则提取相关的各种未来研究。此外,数据集鼓励跨学科研究,弥合伊斯兰研究之间的差距,人工智能(AI)社会网络分析(SNA)和图神经网络(GNN)。
    In the Islamic domain, Hadiths hold significant importance, standing as crucial texts following the Holy Quran. Each Hadith contains three main parts: the ISNAD (chain of narrators), TARAF (starting part, often from Prophet Muhammad), and MATN (Hadith content). ISNAD, a chain of narrators involved in transmitting that particular MATN. Hadith scholars determine the trustworthiness of the transmitted MATN by the quality of the ISNAD. The ISNAD\'s data is available in its original Arabic language, with narrator names transliterated into English. This paper presents the Multi-IsnadSet (MIS), that has great potential to be employed by the social scientist and theologist. A multi-directed graph structure is used to represents the complex interactions among the narrators of Hadith. The MIS dataset represent directed graph which consists of 2092 nodes, representing individual narrators, and 77,797 edges represent the Sanad-Hadith connections. The MIS dataset represents multiple ISNAD of the Hadith based on the Sahih Muslim Hadith book. The dataset was carefully extracted from online multiple Hadith sources using data scraping and web crawling techniques tools, providing extensive Hadith details. Each dataset entry provides a complete view of a specific Hadith, including the original book, Hadith number, textual content (MATN), list of narrators, narrator count, sequence of narrators, and ISNAD count. In this paper, four different tools were designed and constructed for modeling and analyzing narrative network such as python library (NetworkX), powerful graph database Neo4j and two different network analysis tools named Gephi and CytoScape. The Neo4j graph database is used to represent the multi-dimensional graph related data for the ease of extraction and establishing new relationships among nodes. Researchers can use MIS to explore Hadith credibility including classification of Hadiths (Sahih=perfection in the Sanad/Dhaif=imperfection in the Sanad), and narrators (trustworthy/not). Traditionally, scholars have focused on identifying the longest and shortest Sanad between two Narrators, but in MIS, the emphasis shifts to determining the optimum/authentic Sanad, considering narrator qualities. The graph representation of the authentic and manually curated dataset will open ways for the development of computational models that could identify the significance of a chain and a narrator. The dataset allows the researchers to provide Hadith narrators and Hadith ISNAD that could be used in a wide variety of future research studies related to Hadith authentication and rules extraction. Moreover, the dataset encourages cross-disciplinary research, bridging the gap between Islamic studies, artificial intelligence (AI), social network analysis (SNA), and Graph Neural Network (GNN).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目前对多病患者的管理欠佳,通过单一疾病的治疗方法或治疗指南调整,由于其复杂性而导致依从性差。尽管这导致人们呼吁采用更全面和个性化的处方方法,实现这些目标的进展仍然缓慢。随着机器学习(ML)方法的快速发展,现在也存在有希望的方法来加速多疾病中精准医学的发展。这些包括分析疾病共病网络,使用知识图整合来自不同医学领域的知识,并应用网络分析和图形ML。多症疾病网络已被用于改善疾病诊断,治疗建议,和患者预后。结合了由多个关系类型连接的不同医疗实体的知识图集成了来自不同来源的数据,允许复杂的交互并创建连续的信息流。然后,网络分析和图形ML可以提取网络的拓扑和结构,并揭示隐藏的属性,包括疾病表型,网络集线器,和途径;预测药物的再利用;并确定安全和更全面的治疗方法。在这篇文章中,我们描述了创建双向和单向疾病和患者网络的基本概念,并回顾了知识图的使用,图算法,图嵌入方法,和多发病率背景下的图形ML。具体来说,我们概述了图论在多发病率研究中的应用,从图形中提取知识的方法,以及疾病网络在确定多发病率的结构和途径方面的应用实例,识别疾病表型,预测健康结果,选择安全有效的治疗方法。在今天的现代数据需求,以ML为中心的世界,这种基于网络的技术很可能处于开发稳健的临床决策支持工具的前沿,这些工具可以更安全,更全面地治疗老年多病患者.
    The current management of patients with multimorbidity is suboptimal, with either a single-disease approach to care or treatment guideline adaptations that result in poor adherence due to their complexity. Although this has resulted in calls for more holistic and personalized approaches to prescribing, progress toward these goals has remained slow. With the rapid advancement of machine learning (ML) methods, promising approaches now also exist to accelerate the advance of precision medicine in multimorbidity. These include analyzing disease comorbidity networks, using knowledge graphs that integrate knowledge from different medical domains, and applying network analysis and graph ML. Multimorbidity disease networks have been used to improve disease diagnosis, treatment recommendations, and patient prognosis. Knowledge graphs that combine different medical entities connected by multiple relationship types integrate data from different sources, allowing for complex interactions and creating a continuous flow of information. Network analysis and graph ML can then extract the topology and structure of networks and reveal hidden properties, including disease phenotypes, network hubs, and pathways; predict drugs for repurposing; and determine safe and more holistic treatments. In this article, we describe the basic concepts of creating bipartite and unipartite disease and patient networks and review the use of knowledge graphs, graph algorithms, graph embedding methods, and graph ML within the context of multimorbidity. Specifically, we provide an overview of the application of graph theory for studying multimorbidity, the methods employed to extract knowledge from graphs, and examples of the application of disease networks for determining the structure and pathways of multimorbidity, identifying disease phenotypes, predicting health outcomes, and selecting safe and effective treatments. In today\'s modern data-hungry, ML-focused world, such network-based techniques are likely to be at the forefront of developing robust clinical decision support tools for safer and more holistic approaches to treating older patients with multimorbidity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    知识图是人工智能的重要基础设施之一。为多源异构数据构建高质量的领域知识图谱是知识工程面临的挑战。我们提出了一个完整的过程框架,用于构建结合了结构化数据和非结构化数据的知识图,其中包括数据处理,信息提取,知识融合,数据存储,并更新策略,旨在提高知识图谱的质量,延长其生命周期。具体来说,我们以企业知识图谱的构建过程为例,整合企业注册信息,诉讼相关信息,和企业公告信息,丰富企业知识图谱。对于非结构化文本,我们改进现有模型以提取三元组,我们模型的F1得分达到72.77%。在我们构建的企业知识图中,节点和边的数量分别达到1,430,000和3,170,000。此外,对于每种类型的多源异构数据,我们对信息抽取和数据存储应用了相应的方法和策略,并对图数据库进行了详细的对比分析。从实际使用的角度来看,信息化的企业知识图谱及其及时更新可以服务于许多实际的业务需求。我们提出的企业知识图谱已部署在华荣融通(北京)科技有限公司,有限公司,并被员工用作企业尽职调查的有力工具。在案例研究中报告和分析了关键特征。总的来说,本文为领域知识图谱构建提供了一种易于遵循的解决方案和实践,以及展示其在企业尽职调查中的应用。
    The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    虽然基因分型和测序成本的持续下降在很大程度上有利于植物研究,一些应对农业挑战的关键物种仍未得到充分研究。因此,不同性状的异质数据集可用于大量这些物种。由于基因结构和功能在进化过程中在某种程度上是保守的,比较基因组学可用于将可用的知识从一个物种转移到另一个物种。然而,由于数据源的多样性和数据的非协调描述,这种翻译研究方法很复杂。这里,我们提供两条管道,称为结构和功能管道,为NoSQL图形数据库(Neo4j)创建一个框架,以集成和查询来自多个物种的异构数据。我们将此框架称为“矫形学驱动的转化研究知识库框架”(Ortho_KB)。结构管道根据正交学建造跨物种的桥梁。功能管道整合了生物信息,包括QTL,和RNA测序数据集,并使用结构管道中的主干连接数据库中的直系同源物。查询可以使用Neo4jCypher语言编写,并且可以,例如,导致识别控制物种共同性状的基因。为了探索这种框架提供的可能性,我们填充了Ortho_KB以获得OrthoLegKB,专用于豆类的实例。通过研究开花促进基因的保守性来评估所提出的模型。通过一系列的查询,我们已经证明,我们的知识图谱库提供了一个直观而强大的平台来支持研究和开发计划。
    While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    通过真实世界地图的导航可以用双向图表示,其中一组节点表示交叉点,并且边缘表示它们之间的道路。骑自行车,我们可以将训练计划为运动员必须覆盖的一组节点和边缘。使用人工智能优化路线是一个经过充分研究的现象。在找到两点之间最快和最短的路径方面已经做了很多工作。骑自行车,解决方案不一定是最短和最快的路径。然而,最佳路径是骑车人覆盖适当距离的路径,上升,并根据他/她的训练参数下降。本文介绍了一个基于Neo4j图的斯洛文尼亚自行车路线数据集。它由152,659个节点代表各个道路交叉口和410,922个边缘代表它们之间的道路。该数据集允许研究人员开发和优化自行车训练生成算法,距离,上升,下降,下降并考虑道路类型。
    Navigating through a real-world map can be represented in a bi-directed graph with a group of nodes representing the intersections and edges representing the roads between them. In cycling, we can plan training as a group of nodes and edges the athlete must cover. Optimizing routes using artificial intelligence is a well-studied phenomenon. Much work has been done on finding the quickest and shortest paths between two points. In cycling, the solution is not necessarily the shortest and quickest path. However, the optimum path is the one where a cyclist covers the suitable distance, ascent, and descent based on his/her training parameters. This paper presents a Neo4j graph-based dataset of cycling routes in Slovenia. It consists of 152,659 nodes representing individual road intersections and 410,922 edges representing the roads between them. The dataset allows the researchers to develop and optimize cycling training generation algorithms, where distance, ascent, descent, and road type are considered.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    语义互操作性建立了相互通信,并实现了跨不同系统的数据共享。在这项研究中,我们为医疗保健信息系统提出了一个明示的信息架构,以减少由于在不同的上下文中使用符号用于不同的目的而引起的歧义。明示的信息体系结构采用从信息系统重新设计的角度发起的基于共识的方法,可以应用于异构系统之间需要信息交换的其他领域。在FHIR(快速健康互操作性资源)实施中的问题的驱动下,提出了一种补充当前语义交换中词汇方法的明示方法。使用Neo4j构建了以FHIR知识图为核心的语义引擎,以提供语义解释和示例。MIMICIII(重症监护医疗信息集市)数据集和糖尿病数据集已用于证明所提出的信息架构的有效性。我们从信息系统设计的角度进一步讨论了语义解释和数据存储分离的好处,以及以语义引擎为基础的以患者为中心的护理的语义推理。
    Semantic interoperability establishes intercommunications and enables data sharing across disparate systems. In this study, we propose an ostensive information architecture for healthcare information systems to decrease ambiguity caused by using signs in different contexts for different purposes. The ostensive information architecture adopts a consensus-based approach initiated from the perspective of information systems re-design and can be applied to other domains where information exchange is required between heterogeneous systems. Driven by the issues in FHIR (Fast Health Interoperability Resources) implementation, an ostensive approach that supplements the current lexical approach in semantic exchange is proposed. A Semantic Engine with an FHIR knowledge graph as the core is constructed using Neo4j to provide semantic interpretation and examples. The MIMIC III (Medical Information Mart for Intensive Care) datasets and diabetes datasets have been employed to demonstrate the effectiveness of the proposed information architecture. We further discuss the benefits of the separation of semantic interpretation and data storage from the perspective of information system design, and the semantic reasoning towards patient-centric care underpinned by the Semantic Engine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    德国医学信息学计划(MII)旨在提高用于研究目的的临床常规数据的互操作性和重用性。MII工作的一个重要成果是德国范围内的通用核心数据集(CDS),这将由超过31个数据集成中心(DIZ)提供严格的规范。数据共享的一种标准格式是HL7/FHIR。本地,经典数据仓库通常用于数据存储和检索。我们有兴趣在此设置中研究图形数据库的优势。在将MIICDS转换为图形后,将其存储在图形数据库中,然后用附带的元信息丰富它,我们看到了更复杂的数据探索和分析的巨大潜力。在这里,我们描述了提取-转换-加载过程,我们将其设置为概念证明,以实现转换并使通用的核心数据集可作为图访问。
    The German Medical Informatics Initiative (MII) aims to increase the interoperability and reuse of clinical routine data for research purposes. One important result of the MII work is a German-wide common core data set (CDS), which is to be provided by over 31 data integration centers (DIZ) following a strict specification. One standard format for data sharing is HL7/FHIR. Locally, classical data warehouses are often in use for data storage and retrieval. We are interested to investigate the advantages of a graph database in this setting. After having transferred the MII CDS into a graph, storing it in a graph database and subsequently enriching it with accompanying meta-information, we see a great potential for more sophisticated data exploration and analysis. Here we describe the extract-transform-load process which we set up as a proof of concept to achieve the transformation and to make the common set of core data accessible as a graph.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号