Genotyping error

  • 文章类型: Journal Article
    黑脚猫(Felisnigripes)是南部非洲干旱地区的特有猫。世界上最小的野生猫科动物之一,该物种以低密度存在,是秘密和难以捉摸的,这使得生态学研究变得困难。遗传数据可以提供关键信息,例如对人口规模的估计,性别比例,和遗传多样性。在这项研究中,我们测试是否可以从可以从现场非侵入性收集的scat样本中成功扩增微卫星位点。使用从同一个人收集的21个血液和粪便样本,我们在统计上测试了以前设计用于家猫的九颗微卫星是否可以用来识别黑脚猫。从血液和scat样本中回收的基因型进行比较,以评估杂合性的损失,等位基因脱落,和DNA降解产生的假等位基因或scat样品中存在的PCR抑制剂。微卫星标记还用于从野外收集的不与任何血液样品相关的粪便中鉴定个体。本研究中使用的所有9个微卫星均已成功扩增,并且具有多态性。发现微卫星基因座具有足够的辨别能力来区分个体和识别克隆。总之,这些分子标记可用于无创监测野生黑足猫的种群。遗传数据将能够提供重要信息,可用于指导未来的保护计划。
    The black-footed cat (Felis nigripes) is endemic to the arid regions of southern Africa. One of the world\'s smallest wild felids, the species occurs at low densities and is secretive and elusive, which makes ecological studies difficult. Genetic data could provide key information such as estimates on population size, sex ratios, and genetic diversity. In this study, we test if microsatellite loci can be successfully amplified from scat samples that could be noninvasively collected from the field. Using 21 blood and scat samples collected from the same individuals, we statistically tested whether nine microsatellites previously designed for use in domestic cats can be used to identify individual black-footed cats. Genotypes recovered from blood and scat samples were compared to assess loss of heterozygosity, allele dropout, and false alleles resulting from DNA degradation or PCR inhibitors present in scat samples. The microsatellite markers were also used to identify individuals from scats collected in the field that were not linked to any blood samples. All nine microsatellites used in this study were amplified successfully and were polymorphic. Microsatellite loci were found to have sufficient discriminatory power to distinguish individuals and identify clones. In conclusion, these molecular markers can be used to monitor populations of wild black-footed cats noninvasively. The genetic data will be able to contribute important information that may be used to guide future conservation initiatives.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    连锁图谱对于表型性状的遗传作图至关重要,基因图谱克隆,和标记辅助选择在育种中的应用。高质量饱和图谱的构建需要大量分子标记的高质量基因型数据。基因分型错误不能完全避免,无论使用什么平台。当基因分型错误达到阈值水平时,这将严重影响所构建图谱的准确性和后续遗传研究的可靠性。在这项研究中,对扬小迈×中优9507和京双16×百农64杂交的两个重组自交系(RIL)种群进行重复基因分型,以研究基因分型错误对连锁图谱构建的影响。两次重复之间不一致的数据点被认为是基因分型错误,分为三种类型。基因分型错误被视为缺失值,因此产生了非错误的数据集。首先,使用两个重复以及非错误数据集构建了连锁图谱。其次,在软件包QTLIciMapping(EC)和基因型校正(GC)中实施的错误校正方法被应用于两个重复实验。因此,基于校正的基因型构建连锁图,然后将其与来自非错误数据集的连锁图进行比较。通过考虑不同水平的基因分型错误来进行模拟研究,以研究错误的影响和错误校正方法的准确性。结果表明,在两个RIL群体中,两个重复和非错误数据集之间的图谱长度和标记顺序不同。对于实际和模拟种群,地图长度随着错误率的增加而扩大,连锁与物理图谱的相关系数降低。通过重复基因分型和纠错算法可以提高地图质量。当不可能重复对整个作图群体进行基因型时,在重复基因分型中推荐30%。在不同错误率下,EC方法的假阳性率远低于GC方法。本研究系统地阐述了基因分型错误对连锁分析的影响,为在存在基因分型错误的情况下提高连锁图的准确性提供潜在的指导。
    Linkage maps are essential for genetic mapping of phenotypic traits, gene map-based cloning, and marker-assisted selection in breeding applications. Construction of a high-quality saturated map requires high-quality genotypic data on a large number of molecular markers. Errors in genotyping cannot be completely avoided, no matter what platform is used. When genotyping error reaches a threshold level, it will seriously affect the accuracy of the constructed map and the reliability of consequent genetic studies. In this study, repeated genotyping of two recombinant inbred line (RIL) populations derived from crosses Yangxiaomai × Zhongyou 9507 and Jingshuang 16 × Bainong 64 was used to investigate the effect of genotyping errors on linkage map construction. Inconsistent data points between the two replications were regarded as genotyping errors, which were classified into three types. Genotyping errors were treated as missing values, and therefore the non-erroneous data set was generated. Firstly, linkage maps were constructed using the two replicates as well as the non-erroneous data set. Secondly, error correction methods implemented in software packages QTL IciMapping (EC) and Genotype-Corrector (GC) were applied to the two replicates. Linkage maps were therefore constructed based on the corrected genotypes and then compared with those from the non-erroneous data set. Simulation study was performed by considering different levels of genotyping errors to investigate the impact of errors and the accuracy of error correction methods. Results indicated that map length and marker order differed among the two replicates and the non-erroneous data sets in both RIL populations. For both actual and simulated populations, map length was expanded as the increase in error rate, and the correlation coefficient between linkage and physical maps became lower. Map quality can be improved by repeated genotyping and error correction algorithm. When it is impossible to genotype the whole mapping population repeatedly, 30% would be recommended in repeated genotyping. The EC method had a much lower false positive rate than did the GC method under different error rates. This study systematically expounded the impact of genotyping errors on linkage analysis, providing potential guidelines for improving the accuracy of linkage maps in the presence of genotyping errors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:测序基因分型(GBS)提供了使用数百万个标记对数百个个体进行基因分型的负担得起的方法。然而,这对生物信息学程序提出了挑战,这些程序必须克服可能的伪影,例如聚合酶链反应重复产生的偏倚和测序错误。基因分型错误导致数据偏离常规减数分裂的预期。这个,反过来,导致标记的分组和排序困难,导致膨胀和不正确的链接图。因此,基因分型错误可以很容易地通过连锁图质量评估来检测。
    结果:我们开发并使用Reads2Map工作流程,利用二倍体异交种群的模拟和经验GBS数据构建连锁图。工作流运行GATK,Stacks,TASSEL,和Freebayes的单核苷酸多态性呼唤和向上发展,polyRAD,和SuperMASSA用于基因型调用,以及OneMap和GUSMap来构建链接图。使用模拟数据,我们观察到哪种基因型调用软件无法识别GBS测序数据中的常见错误,并提出了特定的过滤器来更好地处理这些错误.我们测试了是否可以使用每个软件的基因型概率或全局错误率来克服连锁图中的错误,以使用更新版本的OneMap来估计遗传距离。我们还评估了偏析变形的影响,污染物样本,和最终连锁图谱中基于单倍型的多等位基因标记。通过我们的评估,我们观察到,一些方法根据数据集(数据集相关)产生不同的结果,而另一些方法在它们之间产生一致的有利结果(数据集无关)。
    结论:根据我们的结果,我们在Reads2Map工作流程中将显示对GBS数据集独立于数据集的方法设置为默认值。这减少了为其他经验数据集确定最佳管道和参数所需的测试数量。使用Reads2Map,用户可以选择最适合其数据上下文的管道和参数。Reads2MapApp闪亮的应用程序提供了结果的图形表示,以方便他们的解释。
    Genotyping-by-sequencing (GBS) provides affordable methods for genotyping hundreds of individuals using millions of markers. However, this challenges bioinformatic procedures that must overcome possible artifacts such as the bias generated by polymerase chain reaction duplicates and sequencing errors. Genotyping errors lead to data that deviate from what is expected from regular meiosis. This, in turn, leads to difficulties in grouping and ordering markers, resulting in inflated and incorrect linkage maps. Therefore, genotyping errors can be easily detected by linkage map quality evaluations.
    We developed and used the Reads2Map workflow to build linkage maps with simulated and empirical GBS data of diploid outcrossing populations. The workflows run GATK, Stacks, TASSEL, and Freebayes for single-nucleotide polymorphism calling and updog, polyRAD, and SuperMASSA for genotype calling, as well as OneMap and GUSMap to build linkage maps. Using simulated data, we observed which genotype call software fails in identifying common errors in GBS sequencing data and proposed specific filters to better handle them. We tested whether it is possible to overcome errors in a linkage map using genotype probabilities from each software or global error rates to estimate genetic distances with an updated version of OneMap. We also evaluated the impact of segregation distortion, contaminant samples, and haplotype-based multiallelic markers in the final linkage maps. Through our evaluations, we observed that some of the approaches produce different results depending on the dataset (dataset dependent) and others produce consistent advantageous results among them (dataset independent).
    We set as default in the Reads2Map workflows the approaches that showed to be dataset independent for GBS datasets according to our results. This reduces the number of required tests to identify optimal pipelines and parameters for other empirical datasets. Using Reads2Map, users can select the pipeline and parameters that best fit their data context. The Reads2MapApp shiny app provides a graphical representation of the results to facilitate their interpretation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    连锁作图是一种基于重组事件对标记进行排序的方法。映射算法无法轻松处理基因分型错误,这在高通量基因分型数据中是常见的。为了解决这个问题,已经制定了战略,主要目的是识别和消除这些错误。一个这样的策略是平滑,一种检测基因分型错误的迭代算法。与其他方法不同,SMOOTH也可用于估算最可能的替代基因型,但是它的应用仅限于二倍体物种和仅在一个亲本中杂合的标记。在这项研究中,我们采用了SMOOTH,将其使用范围扩大到任何标记类型,并使用按下降的身份概率将其应用到自多倍体,命名更新的算法平滑下降(SD)。我们将SD应用于真实和模拟数据,表明,在存在基因分型错误的情况下,该方法在标记顺序和图谱长度方面产生了更好的遗传图谱。SD对于5%至20%之间的错误率以及标记或个体之间的错误率不均匀时特别有用。启动错误率为10%,SD将其在二倍体中降低至〜5%,四倍体为7%,六倍体为8.5%。相反,真实和估计的遗传图谱之间的相关性在四倍体中增加了0.03,在六倍体中增加了0.2,而在二倍体中略有恶化(~0.0011)。我们还表明,基因型管理和图谱重新估计的结合使我们能够在纠正错误基因型的同时获得更好的遗传图谱。我们已经在R包中实现了这个算法平滑下降。
    Linkage mapping is an approach to order markers based on recombination events. Mapping algorithms cannot easily handle genotyping errors, which are common in high-throughput genotyping data. To solve this issue, strategies have been developed, aimed mostly at identifying and eliminating these errors. One such strategy is SMOOTH, an iterative algorithm to detect genotyping errors. Unlike other approaches, SMOOTH can also be used to impute the most probable alternative genotypes, but its application is limited to diploid species and to markers heterozygous in only one of the parents. In this study we adapted SMOOTH to expand its use to any marker type and to autopolyploids with the use of identity-by-descent probabilities, naming the updated algorithm Smooth Descent (SD). We applied SD to real and simulated data, showing that in the presence of genotyping errors this method produces better genetic maps in terms of marker order and map length. SD is particularly useful for error rates between 5% and 20% and when error rates are not homogeneous among markers or individuals. With a starting error rate of 10%, SD reduced it to ∼5% in diploids, ∼7% in tetraploids and ∼8.5% in hexaploids. Conversely, the correlation between true and estimated genetic maps increased by 0.03 in tetraploids and by 0.2 in hexaploids, while worsening slightly in diploids (∼0.0011). We also show that the combination of genotype curation and map re-estimation allowed us to obtain better genetic maps while correcting wrong genotypes. We have implemented this algorithm in the R package Smooth Descent.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    关于商业生产蜜蜂管理的综合决策,很大程度上取决于遗传多样性的相关知识。在这项研究中,我们提出了新的微卫星标记来支持育种,管理,以及蓝色果园蜜蜂的保护,金翅目说(膜翅目:Megachilidae)。原产于北美,O.木质素已从荒地中捕获并在作物上繁殖,并用于授粉某些水果,螺母,和浆果作物。利用O.lignaria基因组组装,我们在计算机上鉴定了59,632个候选微卫星位点,其中22个使用分子技术进行了测试。在22个基因座中,12个基因座处于哈代-温伯格平衡(HWE),没有连锁不平衡(LD),在爱达荷州和犹他州的两个山间北美野生种群中实现了低基因分型误差,美国。我们发现两个种群之间的种群遗传多样性没有差异,但有证据表明人口分化较低但显著。此外,为了确定这些标记是否在其他Osmia中扩增,我们评估了Apicata进化枝的23个物种,bicornis,emarginata,还有ribifloris.在三个物种/亚种中扩增了9个基因座,在11种/亚种中扩增了22个基因座,在emarginata的七个物种/亚种中扩增了11个基因座,和22个基因座在两个物种/亚种中扩增。有必要进一步测试,以确定这些微卫星基因座在HWE和LD假设下表征O.lignaria以外物种的遗传多样性和结构的能力。这些标记将为农业和非农业系统中被困和管理的O.lignaria和其他Osmia物种的保护和商业用途提供信息。
    Comprehensive decisions on the management of commercially produced bees, depend largely on associated knowledge of genetic diversity. In this study, we present novel microsatellite markers to support the breeding, management, and conservation of the blue orchard bee, Osmia lignaria Say (Hymenoptera: Megachilidae). Native to North America, O. lignaria has been trapped from wildlands and propagated on-crop and used to pollinate certain fruit, nut, and berry crops. Harnessing the O. lignaria genome assembly, we identified 59,632 candidate microsatellite loci in silico, of which 22 were tested using molecular techniques. Of the 22 loci, 12 loci were in Hardy-Weinberg equilibrium (HWE), demonstrated no linkage disequilibrium (LD), and achieved low genotyping error in two Intermountain North American wild populations in Idaho and Utah, USA. We found no difference in population genetic diversity between the two populations, but there was evidence for low but significant population differentiation. Also, to determine if these markers amplify in other Osmia, we assessed 23 species across the clades apicata, bicornis, emarginata, and ribifloris. Nine loci amplified in three species/subspecies of apicata, 22 loci amplified in 11 species/subspecies of bicornis, 11 loci amplified in seven species/subspecies of emarginata, and 22 loci amplified in two species/subspecies of ribifloris. Further testing is necessary to determine the capacity of these microsatellite loci to characterize genetic diversity and structure under the assumption of HWE and LD for species beyond O. lignaria. These markers will inform the conservation and commercial use of trapped and managed O. lignaria and other Osmia species for both agricultural and nonagricultural systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    估计个体之间的关系是许多领域的基本挑战之一。特别是,关系。知识产权估计可以为失踪人员案件提供有价值的信息。最近开发的调查遗传谱系方法使用高密度单核苷酸多态性(SNP)来确定密切和更远的关系,其中通过微阵列基因分型或全基因组测序产生数十万至数千万个SNP。当前的研究通常假设SNP谱以最小的误差生成。然而,在失踪人口案件中,DNA样本可以高度降解,并且从这些样本生成的SNP谱通常包含大量错误。在这项研究中,开发了一种机器学习方法来估计与高误差SNP谱的关系。在这种方法中,首先采用分层分类策略,按程度对关系进行分类,然后分别对每个程度内的关系类型进行分类。至于每个分类,实现了特征选择以获得更好的性能。在评估这种方法时,使用了具有各种基因分型错误率的模拟和真实数据集。这种方法的准确性高于单独的措施;即,这种方法比具有基因分型错误的SNP谱的单独测量更准确和可靠.此外,通过在训练集和测试集中提供相同的基因分型错误率,可以获得最高的准确性,因此,估计SNP谱的基因分型误差对于获得高精度的关系估计至关重要。
    Estimating the relationships between individuals is one of the fundamental challenges in many fields. In particular, relationship.ip estimation could provide valuable information for missing persons cases. The recently developed investigative genetic genealogy approach uses high-density single nucleotide polymorphisms (SNPs) to determine close and more distant relationships, in which hundreds of thousands to tens of millions of SNPs are generated either by microarray genotyping or whole-genome sequencing. The current studies usually assume the SNP profiles were generated with minimum errors. However, in the missing person cases, the DNA samples can be highly degraded, and the SNP profiles generated from these samples usually contain lots of errors. In this study, a machine learning approach was developed for estimating the relationships with high error SNP profiles. In this approach, a hierarchical classification strategy was employed first to classify the relationships by degree and then the relationship types within each degree separately. As for each classification, feature selection was implemented to gain better performance. Both simulated and real data sets with various genotyping error rates were utilized in evaluating this approach, and the accuracies of this approach were higher than individual measures; namely, this approach was more accurate and robust than the individual measures for SNP profiles with genotyping errors. In addition, the highest accuracy could be obtained by providing the same genotyping error rates in train and test sets, and thus estimating genotyping errors of the SNP profiles is critical to obtaining high accuracy of relationship estimation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因型数据的临床使用需要高阳性预测值(PPV)和对基因分型平台特征的透彻理解。BeadChip阵列,例如全球筛查阵列(GSA),可能提供高通量,低成本的已知变异的临床筛选。我们假设质量评估以及与全基因组序列和基准数据的比较建立了GSA基因分型的分析有效性。
    为了检验这一假设,我们从Coriell中选了263个样本,一式三份产生GSA基因型,产生的全基因组序列(rWGS)基因型,评估了每组基因型的质量,并将每组基因型相互比较,并与1000个基因组3期(1KG)基因型进行比较,业绩基准。对于59个基因(MAP59),我们还对被视为医学上可操作的倾向的变异进行了理论和实证评估.
    质量分析检测到样品污染和沿芯片边缘增加的测定失败。与基准数据的比较表明>82%的GSA测定具有1的PPV。GSA测定靶向转换,高度复杂的基因组区域,常见变异的表现比那些靶向转化的更好,低复杂度区域,和罕见的变体。GSA数据与rWGS和1KG数据的比较显示在所有测量参数中>99%的性能。与先前研究的预测一致,MAP59基因内变异的GSA检测为3/261。
    我们使用质量分析以及与基准和rWGS数据的比较来建立GSA测定的分析有效性。GSA测定符合临床筛选的标准,尽管检测罕见的变异,颠覆性,和低复杂度区域内的变体需要仔细评估。
    Clinical use of genotype data requires high positive predictive value (PPV) and thorough understanding of the genotyping platform characteristics. BeadChip arrays, such as the Global Screening Array (GSA), potentially offer a high-throughput, low-cost clinical screen for known variants. We hypothesize that quality assessment and comparison to whole-genome sequence and benchmark data establish the analytical validity of GSA genotyping.
    To test this hypothesis, we selected 263 samples from Coriell, generated GSA genotypes in triplicate, generated whole genome sequence (rWGS) genotypes, assessed the quality of each set of genotypes, and compared each set of genotypes to each other and to the 1000 Genomes Phase 3 (1KG) genotypes, a performance benchmark. For 59 genes (MAP59), we also performed theoretical and empirical evaluation of variants deemed medically actionable predispositions.
    Quality analyses detected sample contamination and increased assay failure along the chip margins. Comparison to benchmark data demonstrated that > 82% of the GSA assays had a PPV of 1. GSA assays targeting transitions, genomic regions of high complexity, and common variants performed better than those targeting transversions, regions of low complexity, and rare variants. Comparison of GSA data to rWGS and 1KG data showed > 99% performance across all measured parameters. Consistent with predictions from prior studies, the GSA detection of variation within the MAP59 genes was 3/261.
    We establish the analytical validity of GSA assays using quality analytics and comparison to benchmark and rWGS data. GSA assays meet the standards of a clinical screen although assays interrogating rare variants, transversions, and variants within low-complexity regions require careful evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    We have entered an era of direct-to-consumer (DTC) genomics. Patients have relayed many success stories of DTC genomics about finding causal mutations of genetic diseases before showing any symptoms and taking precautions. However, consumers may also take unnecessary medical actions based on false alarms of \"pathogenic alleles\". The severity of this problem is not well known. Using publicly available data, we compared DTC microarray genotyping data with deep-sequencing data of 5 individuals and manually checked each inconsistently reported single nucleotide variants (SNVs). We estimated that, on average, a person would have ~5 \"pathogenic\" alleles reported due to wrongly reported genotypes if using a 23andMe genotyping microarray. We also found that the number of wrongly classified \"pathogenic\" alleles per person is at least as significant as those due to wrongly reported genotypes. We show that the scale of the false alarm problem could be large enough that the medical costs will become a burden to public health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    This study aimed to determine the effect of different rates of marker genotyping error on the accuracy of genomic prediction that was examined under distinct marker and quantitative trait loci (QTL) densities and different heritability estimates using a stochastic simulation approach. For each scenario of simulation, a reference population with phenotypic and genotypic records and a validation population with only genotypic records were considered. Marker effects were estimated in the reference population, and then their genotypic records were used to predict genomic breeding values in the validation population. The prediction accuracy was calculated as the correlation between estimated and true breeding values. The prediction bias was examined by computing the regression of true genomic breeding value on estimated genomic breeding value. The accuracy of the genomic evaluation was the highest in a scenario with no marker genotyping error and varied from 0.731 to 0.934. The accuracy of the genomic evaluation was the lowest in a scenario with marker genotyping error equal to 20% and changed from 0.517 to 0.762. The unbiased regression coefficients of true genomic breeding value on estimated genomic breeding value were obtained in the reference and validation populations when the rate of marker genotyping error was equal to zero. The results showed that marker genotyping error can reduce the accuracy of genomic evaluations. Moreover, marker genotyping error can provide biased estimates of genomic breeding values. Therefore, for obtaining accurate results it is recommended to minimize the marker genotyping errors to zero in genomic evaluation programs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Case Reports
    BACKGROUND: Alpha-1-antitrypsin (A1AT) deficiency is a hereditary condition caused by mutations in the SERPINA1 gene and associated with lung emphysema and liver disease. Laboratory testing in suspected A1AT deficiency involves quantifying serum A1AT concentration and identification of specific alleles by genotyping and phenotyping. The aim of this report was to present a case of the null allele carrier with consequent genotype/phenotype/concentration discrepancies and potential misclassification of the Z variant in a 42-year-old white man presenting with symptoms of chronic obstructive pulmonary disease (COPD).
    METHODS: Serum A1AT concentration was measured using an immunoturbidimetric assay. A1AT phenotype was determined using isoelectric focusing followed with immunofixation (IEF-IF). Genotyping specifically for the S and Z allele was performed by melting curve analysis using real-time PCR and checked by an alternative PCR-RFLP method. Genotype/phenotype ambiguity and discrepancy were amended using gene sequencing.
    RESULTS: Laboratory testing revealed highly reduced A1AT concentration (less than 0.30 g/L), mild to moderate deficient genotype (Pi*Z allele: M/Z and Pi*S allele: M/M) and severe deficient Z homozygous phenotype (Pi ZZ). After repeated sampling, the same discordant results were verified by these tests. Further sequencing revealed two clinically relevant and defective variants: rs199422210 (a rare null allele) and rs28929474 (the Z allele).
    CONCLUSIONS: Due to inability of genotyping kit probes to detect null/Z allele combination (which mimics the Pi ZZ phenotype), our patient was misclassified as mild to moderate deficient Pi*MZ heterozygote. In all unclear cases, whole-gene sequencing is highly recommended in order to determine definitive cause of A1AT deficiency.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号