knowledge base

  • 文章类型: Journal Article
    With the rapid development of deep learning techniques, the applications have become increasingly widespread in various domains. However, traditional deep learning methods are often referred to as \"black box\" models with low interpretability of their results, posing challenges for their application in certain critical domains. In this study, we propose a comprehensive method for the interpretability analysis of sentiment models. The proposed method encompasses two main aspects: attention-based analysis and external knowledge integration. First, we train the model within sentiment classification and generation tasks to capture attention scores from multiple perspectives. This multi-angle approach reduces bias and provides a more comprehensive understanding of the underlying sentiment. Second, we incorporate an external knowledge base to improve evidence extraction. By leveraging character scores, we retrieve complete sentiment evidence phrases, addressing the challenge of incomplete evidence extraction in Chinese texts. Experimental results on a sentiment interpretability evaluation dataset demonstrate the effectiveness of our method. We observe a notable increase in accuracy by 1.3%, Macro-F1 by 13%, and MAP by 23%. Overall, our approach offers a robust solution for enhancing the interpretability of sentiment models by combining attention-based analysis and the integration of external knowledge.






  • 文章类型: Journal Article
    Standardized nomenclature for genes, gene products, and isoforms is crucial to prevent ambiguity and enable clear communication of scientific data, facilitating efficient biocuration and data sharing. Standardized genotype nomenclature, which describes alleles present in a specific strain that differ from those in the wild-type reference strain, is equally essential to maximize research impact and ensure that results linking genotypes to phenotypes are Findable, Accessible, Interoperable, and Reusable (FAIR). In this publication, we extend the fission yeast clade gene nomenclature guidelines to support the curation efforts at PomBase (, the Schizosaccharomyces pombe Model Organism Database. This update introduces nomenclature guidelines for noncoding RNA genes, following those set forth by the Human Genome Organisation Gene Nomenclature Committee. Additionally, we provide a significant update to the allele and genotype nomenclature guidelines originally published in 1987, to standardize the diverse range of genetic modifications enabled by the fission yeast genetic toolbox. These updated guidelines reflect a community consensus between numerous fission yeast researchers. Adoption of these rules will improve consistency in gene and genotype nomenclature, and facilitate machine-readability and automated entity recognition of fission yeast genes and alleles in publications or datasets. In conclusion, our updated guidelines provide a valuable resource for the fission yeast research community, promoting consistency, clarity, and FAIRness in genetic data sharing and interpretation.






  • 文章类型: Journal Article
    The choice of treatment and prognosis evaluation depend on the accurate early diagnosis of brain tumors. Many brain tumors go undiagnosed or are overlooked by clinicians as a result of the challenges associated with manually evaluating magnetic resonance imaging (MRI) images in clinical practice. In this study, we built a computer-aided diagnosis (CAD) system for glioma detection, grading, segmentation, and knowledge discovery based on artificial intelligence algorithms. Neuroimages are specifically represented using a type of visual feature known as the histogram of gradients (HOG). Then, through a two-level classification framework, the HOG features are employed to distinguish between healthy controls and patients, or between different glioma grades. This CAD system also offers tumor visualization using a semi-automatic segmentation tool for better patient management and treatment monitoring. Finally, a knowledge base is created to offer additional advice for the diagnosis of brain tumors. Based on our proposed two-level classification framework, we train models for glioma detection and grading, achieving area under curve (AUC) of 0.921 and 0.806, respectively. Different from other systems, we integrate these diagnostic tools with a web-based interface, which provides the flexibility for system deployment.






  • 文章类型: Journal Article
    In clinics, a radiology report is crucial for guiding a patient\'s treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distill and restore medical knowledge from textual embedding without manual labor; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics. Code is available at






  • 文章类型: Journal Article
    Safety issues have always been of great concern to the metro construction industry. Numerous studies have shown that safety issues are closely related to the design phase. Many safety problems can be solved or improved by developing the design. This study proposes a structured identification method for safety risks based on the metro design specifications, journal literature, and expert experience. A safety knowledge base (KB) for the design was established to realize safety knowledge sharing and reusing. The KB has been developed into Building Information Modeling (BIM) software as an inspection plug-in to achieve automated analysis and retrieval of safety risks. The designers are provided with a visualization of risk components to locate and improve the pre-control measures of the design. Subsequently, the process of design for safety (DFS) database creation was demonstrated with a metro station project, and the feasibility of applying the KB to safety checking in BIM was verified. In response to the inspection results, safety risks in the construction phases can be eliminated or avoided by standardizing and improving the design.






  • 文章类型: Journal Article
    Colorectal cancer (CRC) is a heterogeneous disease with different responses to targeted therapies due to various factors, and the treatment effect differs significantly between individuals. Personalize medical treatment (PMT) is a method that takes individual patient characteristics into consideration, making it the most effective way to deal with this issue. Patient similarity and clustering analysis is an important aspect of PMT. This paper describes how to build a knowledge base using formal concept analysis (FCA), which clusters patients based on their similarity and preserves the relations between clusters in hierarchical structural form.
    Prognostic factors (attributes) of 2442 CRC patients, including patient age, cancer cell differentiation, lymphatic invasion and metastasis stages were used to build a formal context in FCA. A concept was defined as a set of patients with their shared attributes. The formal context was formed based on the similarity scores between each concept identified from the dataset, which can be used as a knowledge base.
    A hierarchical knowledge base was constructed along with the clinical records of the diagnosed CRC patients. For each new patient, a similarity score to each existing concept in the knowledge base can be retrieved with different similarity calculations. The ranked similarity scores that are associated with the concepts can offer references for treatment plans.
    Patients that share the same concept indicates the potential similar effect from same clinical procedures or treatments. In conjunction with a clinician\'s ability to undergo flexible analyses and apply appropriate judgement, the knowledge base allows faster and more effective decisions to be made for patient treatment and care.






  • 文章类型: Journal Article
    Aiming at the problems of single detection target of existing distributed denial of service (DDoS) attacks, incomplete detection datasets and privacy caused by shared datasets, we propose a trusted multi-domain DDoS detection method based on federated learning. Firstly, we divide the types of DDoS attacks into different sub-attacks, design the federated learning dataset for DDoS detection in each domain, and use them to realize a more comprehensive detection method of DDoS attacks on the premise of protecting the data privacy of each domain. Secondly, in order to improve the robustness of federated learning and alleviate poisoning attack, we propose a reputation evaluation method based on blockchain, which estimates interaction reputation, data reputation and resource reputation of each participant comprehensively, so as to obtain the trusted federated learning participants and identify the malicious participants. In addition, we also propose a combination scheme of multi-domain detection and distributed knowledge base and design a feature graph of malicious behavior based on a knowledge graph to realize the memory of multi-domain feature knowledge. The experimental results show that the accuracy of most categories of the multi-domain DDoS detection method can reach more than 95% with the protection of datasets, and the reputation evaluation method proposed in this paper has a higher ability to identify malicious participants against the data poisoning attack when the threshold is set to 0.6.






  • 文章类型: Journal Article
    In the field of neuroscience, the core of the cohort study project consists of collection, analysis, and sharing of multi-modal data. Recent years have witnessed a host of efficient and high-quality toolkits published and employed to improve the quality of multi-modal data in the cohort study. In turn, gleaning answers to relevant questions from such a conglomeration of studies is a time-consuming task for cohort researchers. As part of our efforts to tackle this problem, we propose a hierarchical neuroscience knowledge base that consists of projects/organizations, multi-modal databases, and toolkits, so as to facilitate researchers\' answer searching process. We first classified studies conducted for the topic \"Frontiers in Neuroinformatics\" according to the multi-modal data life cycle, and from these studies, information objects as projects/organizations, multi-modal databases, and toolkits have been extracted. Then, we map these information objects into our proposed knowledge base framework. A Python-based query tool has also been developed in tandem for quicker access to the knowledge base, (accessible at Finally, based on the constructed knowledge base, we discussed some key research issues and underlying trends in different stages of the multi-modal data life cycle.






  • 文章类型: Journal Article
    Neurodegenerative diseases (NDDs) are a series of chronic diseases, which are associated with progressive loss of neuronal structure or function. The complex etiologies of the NDDs remain unclear, thus the prevention and early diagnosis of NDDs are critical to reducing the mortality and morbidity of these diseases.
    To provide a systematic understanding of the heterogeneity of the risk factors associated with different NDDs (pan-neurodegenerative diseases or pan-NDDs), the knowledgebase is established to facilitate the personalized and knowledge-guided diagnosis, prevention and prediction of NDDs.
    Before data collection, the medical, lifescienceand informatics experts as well as the potential users of the database were consulted and discussed for the scope of data and the classification of risk factors. The PubMed database was used as the resource of the data and knowledge extraction. Risk factors of NDDs were manually collected from literature published between 1975 and 2020.
    The comprehensive risk factors database for NDDs (NDDRF) was established including 998 single or combined risk factors, 2293 records and 1071 articles relevant to the 14 most common NDDs. The single risk factors are classified into 3 categories, i.e. epidemiological factors (469), genetic factors (324) and biochemical factors (153). Among all the factors, 179 factors are positive and protective, while 880 factors have negative influence for NDDs. The knowledgebase is available at
    NDDRF provides the structured information and knowledge resource on risk factors of NDDs. It could benefit the future systematic and personalized investigation of pan-NDDs genesis and progression. Meanwhile it may be used for the future explainable artificial intelligence modeling for smart diagnosis and prevention of NDDs.






  • 文章类型: Journal Article
    Updated and expert-quality knowledge bases are fundamental to biomedical research. A knowledge base established with human participation and subject to multiple inspections is needed to support clinical decision making, especially in the growing field of precision oncology. The number of original publications in this field has risen dramatically with the advances in technology and the evolution of in-depth research. Consequently, the issue of how to gather and mine these articles accurately and efficiently now requires close consideration. In this study, we present OncoPubMiner (, a free and powerful system that combines text mining, data structure customisation, publication search with online reading and project-centred and team-based data collection to form a one-stop \'keyword in-knowledge out\' oncology publication mining platform. The platform was constructed by integrating all open-access abstracts from PubMed and full-text articles from PubMed Central, and it is updated daily. OncoPubMiner makes obtaining precision oncology knowledge from scientific articles straightforward and will assist researchers in efficiently developing structured knowledge base systems and bring us closer to achieving precision oncology goals.





