关键词: GWAS SNP arrays bioinformatic methods genomic structures software structural variants GWAS SNP arrays bioinformatic methods genomic structures software structural variants

Mesh : Genome Genome-Wide Association Study / methods Genomics / methods Genotype Humans Linkage Disequilibrium Polymorphism, Single Nucleotide

来  源:   DOI:10.1093/bib/bbac043

Abstract:
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
摘要:
单核苷酸多态性(SNP)是最丰富的基因组变异类型,并且在大型队列中最容易获得基因型。然而,他们单独解释了个体之间一小部分的表型差异。祖先,SNP集体效应,结构变体,体细胞突变或甚至历史重组的差异可能解释高百分比的基因组差异。这些遗传差异可能不常见或难以表征;然而,他们中的许多人在整个基因组的SNP上留下了独特的标记,从而可以在大量人群样本中进行研究。因此,在过去的十年中,已经开发了几种方法来使用SNP阵列检测和分析不同的基因组结构,补充全基因组关联研究,并确定这些结构对解释个体之间表型差异的贡献。我们提供了可用的生物信息学工具的最新集合,可用于从SNP阵列数据中提取相关的基因组信息,包括群体结构和祖先;多基因风险评分;血统同一性片段;连锁不平衡;遗传力和结构变异,如倒位,拷贝数变体,遗传镶嵌和重组历史。通过对这些方法最近发表的应用的系统回顾,我们描述了R包的主要特征,命令行工具和桌面应用程序,自由和商业,以帮助充分利用大量公开可用的SNP数据。
公众号