genotype to phenotype

  • 文章类型: Journal Article
    Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at The web-based tool is available at






  • 文章类型: Journal Article
    A central goal of biology is to understand how genetic variation produces phenotypic variation, which has been described as a genotype to phenotype (G to P) map. The plant form is continuously shaped by intrinsic developmental and extrinsic environmental inputs, and therefore plant phenomes are highly multivariate and require comprehensive approaches to fully quantify. Yet a common assumption in plant phenotyping efforts is that a few pre-selected measurements can adequately describe the relevant phenome space. Our poor understanding of the genetic basis of root system architecture is at least partially a result of this incongruence. Root systems are complex 3D structures that are most often studied as 2D representations measured with relatively simple univariate traits. In prior work, we showed that persistent homology, a topological data analysis method that does not pre-suppose the salient features of the data, could expand the phenotypic trait space and identify new G to P relations from a commonly used 2D root phenotyping platform. Here we extend the work to entire 3D root system architectures of maize seedlings from a mapping population that was designed to understand the genetic basis of maize-nitrogen relations. Using a panel of 84 univariate traits, persistent homology methods developed for 3D branching, and multivariate vectors of the collective trait space, we found that each method captures distinct information about root system variation as evidenced by the majority of non-overlapping QTL, and hence that root phenotypic trait space is not easily exhausted. The work offers a data-driven method for assessing 3D root structure and highlights the importance of non-canonical phenotypes for more accurate representations of the G to P map.






  • 文章类型: Preprint
    UNASSIGNED: Accurate models are crucial to estimate the phenotypes from high throughput genomic data. While the genetic and phenotypic data are sensitive, secure models are essential to protect the private information. Therefore, construct an accurate and secure model is significant in secure inference of phenotypes.
    UNASSIGNED: We propose a secure inference protocol on homomorphically encrypted genotype data with encrypted linear models. Firstly, scale the genotype data by feature importance with Xgboost or Adaboost then train linear models to predict the phenotypes in plaintext. Secondly, encrypt the model parameters and test data with CKKS scheme for secure inference. Thirdly, predict the phenotypes under CKKS homomorphically encryption computation. Finally, decrypt the encrypted predictions by client to compute the 1-NRMSE/AUC for model evaluation.
    UNASSIGNED: 5 phenotypes of 3000 samples with 20390 variants are used to validate the performance of the secure inference protocol. The protocol achieves 0.9548, 0.9639, 0.9673 (1-NRMSE) for 3 continuous phenotypes and 0.9943, 0.99290 (AUC) for 2 category phenotypes in test data. Moreover, the protocol shows robust in 100 times of random sampling. Furthermore, the protocol achieves 0.9725 (the average accuracy) in an encrypted test set with 198 samples, and it only takes 4.32s for the overall inference. These help the protocol rank top one in the iDASH-2022 track2 challenge.
    UNASSIGNED: We propose an accurate and secure protocol to predict the phenotype from genotype and it takes seconds to obtain hundreds of predictions for all phenotypes.






  • 文章类型: Journal Article
    Differences in codon frequency between genomes, genes, or positions along a gene, modulate transcription and translation efficiency, leading to phenotypic and functional differences. Here, we present a multiscale analysis of the effects of synonymous codon recoding during heterologous gene expression in human cells, quantifying the phenotypic consequences of codon usage bias at different molecular and cellular levels, with an emphasis on translation elongation. Six synonymous versions of an antibiotic resistance gene were generated, fused to a fluorescent reporter, and independently expressed in HEK293 cells. Multiscale phenotype was analyzed by means of quantitative transcriptome and proteome assessment, as proxies for gene expression; cellular fluorescence, as a proxy for single-cell level expression; and real-time cell proliferation in absence or presence of antibiotic, as a proxy for the cell fitness. We show that differences in codon usage bias strongly impact the molecular and cellular phenotype: (i) they result in large differences in mRNA levels and protein levels, leading to differences of over 15 times in translation efficiency; (ii) they introduce unpredicted splicing events; (iii) they lead to reproducible phenotypic heterogeneity; and (iv) they lead to a trade-off between the benefit of antibiotic resistance and the burden of heterologous expression. In human cells in culture, codon usage bias modulates gene expression by modifying mRNA availability and suitability for translation, leading to differences in protein levels and eventually eliciting functional phenotypic changes.






  • 文章类型: Journal Article
    Familial cardiomyopathy is a precursor of heart failure and sudden cardiac death. Over the past several decades, researchers have discovered numerous gene mutations primarily in sarcomeric and cytoskeletal proteins causing two different disease phenotypes: hypertrophic (HCM) and dilated (DCM) cardiomyopathies. However, molecular mechanisms linking genotype to phenotype remain unclear. Here, we employ a systems approach by integrating experimental findings from preclinical studies (e.g., murine data) into a cohesive signaling network to scrutinize genotype to phenotype mechanisms. We developed an HCM/DCM signaling network model utilizing a logic-based differential equations approach and evaluated model performance in predicting experimental data from four contexts (HCM, DCM, pressure overload, and volume overload). The model has an overall prediction accuracy of 83.8%, with higher accuracy in the HCM context (90%) than DCM (75%). Global sensitivity analysis identifies key signaling reactions, with calcium-mediated myofilament force development and calcium-calmodulin kinase signaling ranking the highest. A structural revision analysis indicates potential missing interactions that primarily control calcium regulatory proteins, increasing model prediction accuracy. Combination pharmacotherapy analysis suggests that downregulation of signaling components such as calcium, titin and its associated proteins, growth factor receptors, ERK1/2, and PI3K-AKT could inhibit myocyte growth in HCM. In experiments with patient-specific iPSC-derived cardiomyocytes (MLP-W4R;MYH7-R723C iPSC-CMs), combined inhibition of ERK1/2 and PI3K-AKT rescued the HCM phenotype, as predicted by the model. In DCM, PI3K-AKT-NFAT downregulation combined with upregulation of Ras/ERK1/2 or titin or Gq protein could ameliorate cardiomyocyte morphology. The model results suggest that HCM mutations that increase active force through elevated calcium sensitivity could increase ERK activity and decrease eccentricity through parallel growth factors, Gq-mediated, and titin pathways. Moreover, the model simulated the influence of existing medications on cardiac growth in HCM and DCM contexts. This HCM/DCM signaling model demonstrates utility in investigating genotype to phenotype mechanisms in familial cardiomyopathy.






  • 文章类型: Editorial






  • 文章类型: Journal Article
    Functional-structural plant models (FSPMs) have been evolving for over 2 decades and their future development, to some extent, depends on the value of potential applications in crop science. To date, stabilizing crop production by identifying valuable traits for novel cultivars adapted to adverse environments is topical in crop science. Thus, this study will examine how FSPMs are able to address new challenges in crop science for sustainable crop production. FSPMs developed to simulate organogenesis, morphogenesis, and physiological activities under various environments and are amenable to downscale to the tissue, cellular, and molecular level or upscale to the whole plant and ecological level. In a modeling framework with independent and interactive modules, advanced algorithms provide morphophysiological details at various scales. FSPMs are shown to be able to: (i) provide crop ideotypes efficiently for optimizing the resource distribution and use for greater productivity and less disease risk, (ii) guide molecular design breeding via linking molecular basis to plant phenotypes as well as enrich crop models with an additional architectural dimension to assist breeding, and (iii) interact with plant phenotyping for molecular breeding in embracing three-dimensional (3D) architectural traits. This study illustrates that FSPMs have great prospects in speeding up precision breeding for specific environments due to the capacity for guiding and integrating ideotypes, phenotyping, molecular design, and linking molecular basis to target phenotypes. Consequently, the promising great applications of FSPMs in crop science will, in turn, accelerate their evolution and vice versa.






  • 文章类型: Journal Article
    Antimicrobial resistance (AMR) is among the gravest threats to human health and food security worldwide. The use of antimicrobials in livestock production can lead to emergence of AMR, which can have direct effects on humans through spread of zoonotic disease. Pigs pose a particular risk as they are a source of zoonotic diseases and receive more antimicrobials than most other livestock. Here we use a large-scale genomic approach to characterise AMR in Streptococcus suis, a commensal found in most pigs, but which can also cause serious disease in both pigs and humans.
    We obtained replicated measures of Minimum Inhibitory Concentration (MIC) for 16 antibiotics, across a panel of 678 isolates, from the major pig-producing regions of the world. For several drugs, there was no natural separation into \'resistant\' and \'susceptible\', highlighting the need to treat MIC as a quantitative trait. We found differences in MICs between countries, consistent with their patterns of antimicrobial usage. AMR levels were high even for drugs not used to treat S. suis, with many multidrug-resistant isolates. Similar levels of resistance were found in pigs and humans from regions associated with zoonotic transmission. We next used whole genome sequences for each isolate to identify 43 candidate resistance determinants, 22 of which were novel in S. suis. The presence of these determinants explained most of the variation in MIC. But there were also interesting complications, including epistatic interactions, where known resistance alleles had no effect in some genetic backgrounds. Beta-lactam resistance involved many core genome variants of small effect, appearing in a characteristic order.
    We present a large dataset allowing the analysis of the multiple contributing factors to AMR in S. suis. The high levels of AMR in S. suis that we observe are reflected by antibiotic usage patterns but our results confirm the potential for genomic data to aid in the fight against AMR.






  • 文章类型: Journal Article
    The phenotype of an individual can be affected not only by the individual\'s own genotypes, known as direct genetic effects (DGE), but also by genotypes of interacting partners, indirect genetic effects (IGE). IGE have been detected using polygenic models in multiple species, including laboratory mice and humans. However, the underlying mechanisms remain largely unknown. Genome-wide association studies of IGE (igeGWAS) can point to IGE genes, but have not yet been applied to non-familial IGE arising from \"peers\" and affecting biomedical phenotypes. In addition, the extent to which igeGWAS will identify loci not identified by dgeGWAS remains an open question. Finally, findings from igeGWAS have not been confirmed by experimental manipulation.
    We leverage a dataset of 170 behavioral, physiological, and morphological phenotypes measured in 1812 genetically heterogeneous laboratory mice to study IGE arising between same-sex, adult, unrelated mice housed in the same cage. We develop and apply methods for igeGWAS in this context and identify 24 significant IGE loci for 17 phenotypes (FDR < 10%). We observe no overlap between IGE loci and DGE loci for the same phenotype, which is consistent with the moderate genetic correlations between DGE and IGE for the same phenotype estimated using polygenic models. Finally, we fine-map seven significant IGE loci to individual genes and find supportive evidence in an experiment with a knockout model that Epha4 gives rise to IGE on stress-coping strategy and wound healing.
    Our results demonstrate the potential for igeGWAS to identify IGE genes and shed light into the mechanisms of peer influence.







  • 文章类型: Journal Article
    The colorful phenotypes of birds have long provided rich source material for evolutionary biologists. Avian plumage, beaks, skin, and eggs-which exhibit a stunning range of cryptic and conspicuous forms-inspired early work on adaptive coloration. More recently, avian color has fueled discoveries on the physiological, developmental, and-increasingly-genetic mechanisms responsible for phenotypic variation. The relative ease with which avian color traits can be quantified has made birds an attractive system for uncovering links between phenotype and genotype. Accordingly, the field of avian coloration genetics is burgeoning. In this review, we highlight recent advances and emerging questions associated with the genetic underpinnings of bird color. We start by describing breakthroughs related to 2 pigment classes: carotenoids that produce red, yellow, and orange in most birds and psittacofulvins that produce similar colors in parrots. We then discuss structural colors, which are produced by the interaction of light with nanoscale materials and greatly extend the plumage palette. Structural color genetics remain understudied-but this paradigm is changing. We next explore how colors that arise from interactions among pigmentary and structural mechanisms may be controlled by genes that are co-expressed or co-regulated. We also identify opportunities to investigate genes mediating within-feather micropatterning and the coloration of bare parts and eggs. We conclude by spotlighting 2 research areas-mechanistic links between color vision and color production, and speciation-that have been invigorated by genetic insights, a trend likely to continue as new genomic approaches are applied to non-model species.





