Biogeographic ancestry

  • 文章类型: Journal Article
    Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) techniques seem to be especially well suited for analyzing large datasets obtained using MPS. The Slavic populations constitute about a third of the population of Europe and inhabit a large area of the continent, while being relatively closely related in population genetics terms. In this proof-of-concept study, various ML techniques were used to classify DNA samples from Slavic and non-Slavic individuals. The primary objective of this study was to empirically evaluate the feasibility of discerning the genetic provenance of individuals of Slavic descent who exhibit genetic similarity, with the overarching goal of categorizing DNA specimens derived from diverse Slavic population representatives. Raw sequencing data were pre-processed, to obtain a 1200 character-long binary vector. A total of three classifiers were used-Random Forest, Support Vector Machine (SVM), and XGBoost. The most-promising results were obtained using SVM with a linear kernel, with 99.9% accuracy and F1-scores of 0.9846-1.000 for all classes.






  • 文章类型: Journal Article
    DNA methylation based age prediction is a new method in the toolbox of forensic genetics. Typically, the method is applied in the course of police investigation e.g. to predict the age of an unknown person that has left a biological trace at a crime scene. The method can also be used to answer other forensic questions, for example to estimate the age of unknown human bodies in the course of the identification process. In the present study, we tested for a potential impact of biogeographic ancestry (BGA) on age predictions using five age dependent methylated CpG sites within the genetic regions of ELOVL2, MIR29B2CHG, FHL2, KLF14 and TRIM59. We collected 102 blood samples each from donors living in Iraq, Middle East (ME) and Germany, Central Europe (EU). Both sample sets were matched in sex and age ranging from 18 to 68 years with exactly one male and female sample per year of age. All samples were analyzed by bisulfite pyrosequencing applying a multiplex pre-amplification strategy based on a single input of 35 ng converted DNA in the PCR. For the CpGs in MIR29B2CHG, FHL2 and KLF14, we observed significantly different methylation levels between the two populations. While we were able to train two highly accurate prediction models for the respective population with mean absolute deviations between predicted and actual ages (MAD) of 3.34 years for the ME model, and 2.72 years for the EU model, we found an absolute prediction difference between the two population specific models of more than 4 years. A combined model for both populations compensated the methylation difference between the two populations, providing MADs of prediction of only 3.81 years for ME and 3.31 years for EU samples. In total, the results of the present study strongly support the benefit of BGA information for more reliable methylation based age predictions.






  • 文章类型: Review
    Forensic DNA Phenotyping (FDP) comprises the prediction of a person\'s externally visible characteristics regarding appearance, biogeographic ancestry and age from DNA of crime scene samples, to provide investigative leads to help find unknown perpetrators that cannot be identified with forensic STR-profiling. In recent years, FDP has advanced considerably in all of its three components, which we summarize in this review article. Appearance prediction from DNA has broadened beyond eye, hair and skin color to additionally comprise other traits such as eyebrow color, freckles, hair structure, hair loss in men, and tall stature. Biogeographic ancestry inference from DNA has progressed from continental ancestry to sub-continental ancestry detection and the resolving of co-ancestry patterns in genetically admixed individuals. Age estimation from DNA has widened beyond blood to more somatic tissues such as saliva and bones as well as new markers and tools for semen. Technological progress has allowed forensically suitable DNA technology with largely increased multiplex capacity for the simultaneous analysis of hundreds of DNA predictors with targeted massively parallel sequencing (MPS). Forensically validated MPS-based FDP tools for predicting from crime scene DNA i) several appearance traits, ii) multi-regional ancestry, iii) several appearance traits together with multi-regional ancestry, and iv) age from different tissue types, are already available. Despite recent advances that will likely increase the impact of FDP in criminal casework in the near future, moving reliable appearance, ancestry and age prediction from crime scene DNA to the level of detail and accuracy police investigators may desire, requires further intensified scientific research together with technical developments and forensic validations as well as the necessary funding.






  • 文章类型: Journal Article
    Forensic DNA phenotyping (FDP) includes biogeographic ancestry (BGA) inference and externally visible characteristics (EVCs) prediction directly from an evidential DNA sample as alternatives to provide valuable intelligence when conventional DNA profiling fails to achieve identification. In this context, the application of Massively Parallel Sequencing (MPS) methodologies, which enables simultaneous typing of multiple samples and hundreds of forensic markers, has been gradually implemented in forensic genetic casework. The Precision ID Ancestry Panel (Thermo Fisher Scientific, Waltham, USA) is a forensic multiplex assay consisting of 165 autosomal SNPs designed to provide biogeographic ancestry information. In this work, a sample of 250 individuals from Rio Grande do Sul (RS) State, southern Brazil, apportioned into four main population groups (African-, European-, Amerindian-, and Admixed-derived Gauchos), was evaluated with this panel, to assess the feasibility of this approach in a highly heterogeneous population. Forensic descriptive parameters estimated for each population group revealed that this panel has enough polymorphic and informative SNPs to be used as a supplementary instrument in forensic individual identification and kinship testing regardless of ethnicity. No statistically significant deviation from Hardy-Weinberg equilibrium was observed after Bonferroni correction. However, seven loci pairs displayed linkage disequilibrium in pairwise LD testing (p < 3.70 × 10-6). Interpopulation comparisons by FST analysis, MDS plot, and STRUCTURE analysis among the four RS population groups apart and along with 89 reference worldwide populations demonstrated that Admixed- and African-derived Gauchos present the highest levels of admixture and population stratification, whereas European- and Amerindian-derived exhibit a more homogeneous genetic conformation.






  • 文章类型: Journal Article
    Massively parallel sequencing can provide genetic data for hundreds to thousands of loci in a single assay for various types of forensic testing. However, available commercial kits require an initial PCR amplification of short-to-medium sized targets which limits their application for highly degraded DNA. Development and optimisation of large PCR multiplexes also prevents creation of custom panels that target different suites of markers for identity, biogeographic ancestry, phenotype, and lineage markers (Y-chromosome and mtDNA). Hybridisation enrichment, an alternative approach for target enrichment prior to sequencing, uses biotinylated probes to bind to target DNA and has proven successful on degraded and ancient DNA. We developed a customisable hybridisation capture method, that uses individually mixed baits to allow tailored and targeted enrichment to specific forensic questions of interest. To allow collection of forensic intelligence data, we assembled and tested a custom panel of hybridisation baits to infer biogeographic ancestry, hair and eye colour, and paternal lineage (and sex) on modern male and female samples with a range of self-declared ancestries and hair/eye colour combinations. The panel correctly estimated biogeographic ancestry in 9/12 samples (75%) but detected European admixture in three individuals from regions with admixed demographic history. Hair and eye colour were predicted correctly in 83% and 92% of samples respectively, where intermediate eye colour and blond hair were problematic to predict. Analysis of Y-chromosome SNPs correctly assigned sex and paternal haplogroups, the latter complementing and supporting biogeographic ancestry predictions. Overall, we demonstrate the utility of this hybridisation enrichment approach to forensic intelligence testing using a combined suite of biogeographic ancestry, phenotype, and Y-chromosome SNPs for comprehensive biological profiling.






  • 文章类型: Journal Article
    Differences between self-perceived biogeographic ancestry and estimates derived from DNA are potentially informative about the formation of ethnic identities in different sociohistorical contexts. Here, we compared self-estimates and DNA-estimates in New Mexico, where notions of shared ancestry and ethnic identity have been shaped by centuries of migration and admixture.
    We asked 507 New Mexicans of Spanish-speaking descent (NMS) to list their ethnic identity and to estimate their percentages of European and Native American ancestry. We then compared self-estimates to estimates derived from 291,917 single nucleotide polymorphisms (SNPs), and we examined how differences between the estimates varied by ethnic identity.
    Most NMS (94%) predicted that they had non-zero percentages of European and Native American ancestry. Self-estimates and SNP-estimates were positively correlated (rEuropean  = 0.38, rNative-American  = 0.36, p < 0.001). The correlations belie systematic patterns of underestimation and overestimation based on ethnic identity. NMS with ancestral ties to 20th century immigrants, who identified as Mexican or Mexican American, often underestimated their European ancestry (self-estimate < SNP-estimate) and overestimated their Native American ancestry. The pattern was reversed for NMS who emphasized deep connections to colonial New Mexico and identified as Spanish or Spanish American.
    While NMS accurately predicted that they had European and Native American ancestry, they predicted ancestry percentages with only moderate accuracy. Differences between self-estimated and SNP-estimated ancestry were associated with ethnic identities that were shaped by migration to the region over the past 400 years. We connect ethnic identities and patterns of ancestry estimation to resistance to colonial hegemony and discuss the implications of our results for the construction of ethnic identities, now and in the past.






  • 文章类型: Journal Article
    Skin pigmentation is one of the most prominent and variable phenotypes in humans. We compared the alleles of 163 SNPs and indels from the Human Pigmentation (HuPi) AmpliSeq™ Custom panel, and biogeographic ancestry with the quantitative skin pigmentation levels on the upper arm, lower arm, and forehead of 299 Pakistani individuals from three subpopulations: Baloch, Pashtun, and Punjabi. The biogeographic ancestry of each individual was estimated using the Precision ID Ancestry Panel. All individuals were mainly of mixed South-Central Asian and European ancestry. However, the Baloch individuals also had an average proportion of Sub-Saharan African ancestry of approximately 10%, whereas it was <1% in the Punjabi and Pashtun individuals. The pairwise genetic distances between the Pashtun, Punjabi, and Baloch subpopulations based on the ancestry markers were statistically significantly different. Individuals from the Pashtun subpopulation had statistically significantly lower skin pigmentation than individuals from the Punjabi and Baloch subpopulations (p < 0.05). The proportions of European and Sub-Saharan African ancestry and five SNPs (rs1042602, rs10831496, rs1426654, rs16891982, and rs12913832) were statistically significantly associated with skin pigmentation at either the upper arm, lower arm or forehead in the Pakistani population after correction for multiple testing (p < 10-3). A model based on four of these SNPs (rs1426654, rs1042602, rs16891982, and rs12913832) explained 33% of the upper arm skin pigmentation. The four SNPs and the proportions of European and Sub-Saharan African ancestry explained 37% of the upper arm skin pigmentation. Our results indicate that the four likely causative SNPs, rs1426654, rs1042602, rs16891982, and rs12913832 located in SLC24A5, TYR, SLC45A2, and HERC2, respectively, are essential for skin color variation in the admixed Pakistani subpopulations.







  • 文章类型: Journal Article
    In 1932, seven burials were discovered on a Texas plantation that was originally the site of a 17th-century Caddo Indian village. Of the seven excavated graves, one set of remains (an adult male) was notably buried in a manner inconsistent with traditional Caddoan burial practices and has long been purported to be the remains of Sieur de Marle (a member of the French explorer La Salle\'s last expedition). Diary accounts of La Salle\'s expedition scribe report that Sieur de Marle died along a river near an Indian village during a trek to Canada to find help for colonists left behind at the ill-fated Fort St. Louis. Additionally, two lead projectiles recovered from the grave were ballistically analyzed and determined to be consistent with ammunition used in 17th-century weaponry. In the 1980s, anthropologists requested access to the remains for study, but the skull was missing. Cranial measurements recorded in 1940 and 1962 (by two independent anthropologists) were used to investigate the ancestry of this individual; and the Giles-Elliot (G-E) discriminant function was calculated to be 18.1, within the Anglo-European range. Dietary isotope testing on non-cranial skeletal elements determined that this unknown male\'s diet was rich in animal/marine protein sources, which differs appreciably from Caddo Indian populations of that time period. In order to genetically assess this individual\'s biogeographic ancestry and to provide further support that this individual is of European descent, mitochondrial DNA (mtDNA) sequencing was performed using the Applied Biosystems™ Precision ID mtDNA Whole Genome Panel. mtDNA sequencing of multiple sections from two different long bones yielded compiled results consistent with either Haplogroup H or R, both predominantly European mtDNA haplogroups. Further anthropological calculations were conducted using cranial measurements, FORDISC™ software, and discriminant function analysis. Two-way, four-way, and multigroup discriminant function analyses further classify this set of unidentified remains as being White (European) in origin, with posterior probabilities of 0.999, 0.881 and 0.986, respectively. Combined with historical records of Sieur de Marle\'s death, as well as overlays of historical and contemporary maps which demonstrate that the plantation site aligns with Joutel\'s diary accounts of de Marle\'s burial, these collective results support that these remains are of a European male and may possibly belong to this prominent member of La Salle\'s expedition team.






  • 文章类型: Historical Article
    In 1995, the historical shipwreck of La Belle was discovered off the coast of Texas. One partial human skeleton was recovered from alongside cargo in the rear portion of the ship; a second (complete) skeleton was found atop coiled anchor rope in the bow. In late 2015, comprehensive forensic genetic testing began on multiple samplings from each set of remains. For the partial skeleton recovered from the ship\'s rear cargo area, results were obtained for 26/27 Y-STRs using traditional CE; with MPS technology, results were obtained for 18/24 Y-STRs, 56/56 ancestry-informative SNPs (aiSNPs), 22/22 phenotype-informative SNPs (piSNPs), 22/27 autosomal STRs, 4/7 X-STRs, and 94/94 identity-informative SNPs (iiSNPs). For the complete skeleton of the second individual, results were obtained for 7/17 Y-STRs using traditional CE; with MPS technology, results were obtained for 5/24 Y-STRs, 49/56 aiSNPs, 18/22 piSNPs, 15/27 autosomal STRs, 1/7 X-STRs, and 66/94 iiSNPs. Biogeographic ancestry for each set of skeletal remains was predicted using the ancestry feature and metapopulation tool of the Y-STR Haplotype Reference Database (YHRD), Haplogroup Predictor, and the Forensic Research/Reference on Genetics knowledge base (FROG-kb). Phenotype prediction was performed using piSNP data and the HIrisplex eye color and hair color DNA phenotyping webtool. mtDNA whole genome sequencing also was performed successfully. This study highlights the sensitivity of current forensic laboratory methods in recovering DNA from historical and archaeological human remains. Using advanced sequencing technology provided by MiSeq™ FGx (Verogen) and Ion S5™ (Thermo Fisher Scientific) instrumentation, degraded skeletal remains can be characterized using a panel of diverse and highly informative markers, producing data which can be useful in both forensic and genealogical investigations.






  • 文章类型: Journal Article
    Inference of biogeographic origin is an important factor in clinical, population and forensic genetics. The information provided by AIMs (Ancestry Informative Markers) can allow the differentiation of major continental population groups, and several AIM panels have been developed for this purpose. However, from these major population groups, Eurasia covers a wide area between two continents that is difficult to differentiate genetically. These populations display a gradual genetic cline from West Europe to South Asia in terms of allele frequency distribution. Although differences have been reported between Europe and South Asia, Middle East populations continue to be a target of further investigations due to the lack of genetic variability, therefore hampering their genetic differentiation from neighboring populations. In the present study, a custom-built ancestry panel was developed to analyze North African and Middle Eastern populations, designated the \'NAME\' panel. The NAME panel contains 111 SNPs that have patterns of allele frequency differentiation that can distinguish individuals originating in North Africa and the Middle East when combined with a previous set of 126 Global AIM-SNPs.





