Target sequence capture

    OBJECTIVE: The geographical origin and evolutionary mechanisms underpinning the rich and distinctive New Caledonian flora remain poorly understood. This is attributable to the complex geological past of the island and to the scarcity of well-resolved species-level phylogenies. Here, we infer phylogenetic relationships and divergence times of New Caledonian palms, which comprise 40 species. We use this framework to elucidate the biogeography of New Caledonian palm lineages and to explore how extant species might have formed.
    METHODS: A phylogenetic tree including 37 New Caledonian palm species and 77 relatives from tribe Areceae was inferred from 151 nuclear genes obtained by targeted sequencing. Fossil-calibrated divergence times were estimated and ancestral ranges inferred. Ancestral and extant ecological preferences in terms of elevation, precipitation and substrate were compared between New Caledonian sister species to explore their possible roles as drivers of speciation.
    RESULTS: New Caledonian palms form four well-supported clades, inside which relationships are well resolved. Our results support the current classification but suggest that Veillonia and Campecarpus should be resurrected and fail to clarify whether Rhopalostylidinae is sister to or nested in Basseliniinae. New Caledonian palm lineages are derived from New Guinean and Australian ancestors, which reached the island through at least three independent dispersal events between the Eocene and Miocene. Palms then dispersed out of New Caledonia at least five times, mainly towards Pacific islands. Geographical and ecological transitions associated with speciation events differed across time and genera. Substrate transitions were more frequently associated with older events than with younger ones.
    CONCLUSIONS: Neighbouring areas and a mosaic of local habitats shaped the palm flora of New Caledonia, and the island played a significant role in generating palm diversity across the Pacific region. This new spatio-temporal framework will enable population-level ecological and genetic studies to unpick the mechanisms underpinning New Caledonian palm endemism.






    The tribe Astereae (Asteraceae) includes 36 subtribes and 252 genera, and is distributed worldwide in temperate and tropical regions. One of the subtribes, Celmisiinae Saldivia, has been recently circumscribed to include six genera and ca. 160 species, and is restricted to eastern Australia, New Zealand, and New Guinea. The species show an impressive range of growth habit, from small herbs and ericoid subshrubs to medium-sized trees. They live in a wide range of habitats and are often dominant in subalpine and alpine vegetation. Despite the well-supported circumscription of Celmisiinae, uncertainties have remained about their internal relationships and classification at genus and species levels. This study exploited recent advances in high-throughput sequencing to build a robust multi-gene phylogeny for the subtribe Celmisiinae. The target enrichment Angiosperms353 bait set and the hybpiper-nf and paragone-nf pipelines were used to retrieve, infer, and assemble orthologous loci from 75 taxa representing all the main putative clades within the subtribe. Because of the diploidised ploidy level in Celmisiinae, as well as missing data in the assemblies, uncertainty remains surrounding the inference of orthology detection. However, based on a variety of gene-family sets, coalescent and concatenation-based phylogenetic reconstructions recovered similar topologies. Paralogy and missing data in the gene-families caused some problems, but the estimated phylogenies were well-supported and well-resolved. The phylogenomic evidence supported Celmisiinae and three main clades: the Pleurophyllum clade (Pleurophyllum, Macrolearia and Damnamenia), mostly in the New Zealand Subantarctic Islands, Celmisia of mainland New Zealand and Australia, and Shawia (including \'Olearia pro parte\' and Pachystegia) of New Zealand, Australia and New Guinea. The results presented here add to the accumulating support for the Angiosperms353 bait set as an efficient method for documenting plant diversity.






    Eugenia is one of the most taxonomically challenging lineages of flowering plants, in which morphological delimitation has changed over the last few years resulting from recent phylogenetic study based on molecular data. Efforts, until now, have been limited to Sanger sequencing of mostly plastid markers. These phylogenetic studies indicate 11 clades formalized as infrageneric groups. However, relationships among these clades are poorly supported at key nodes and inconsistent between studies, particularly along the backbone and within Eugenia sect. Umbellatae encompasses ca. 700 species. To resolve and better understand systematic discordance, 54 Eugenia taxa were subjected to phylogenomic Hyb-Seq using 353 low-copy nuclear genes. Twenty species trees based on coding and non-coding loci of nuclear and plastid datasets were recovered using coalescent and concatenated approaches. Concordant and conflicting topologies were assessed by comparing tree landscapes, topology tests, and gene and site concordance factors. The topologies are similar except between nuclear and plastid datasets. The coalescent trees better accommodate disparity in the intron dataset, which contains more parsimony informative sites, while concatenated trees recover more conservative topologies, as they have narrower distribution in the tree landscape. This suggests that highly supported phylogenetic relationships determined in previous studies do not necessarily indicate overwhelming concordant signal. Congruence must be interpreted carefully especially in concatenated datasets. Despite this, the congruence between the multi-species coalescent (MSC) approach and concatenated tree topologies found here is notable. Our analysis does not support Eugenia subg. Pseudeugenia or sect. Pilothecium, as currently circumscribed, suggesting necessary taxonomic reassessment. Five clades are further discussed within Eugenia sect. Umbellatae progress toward its division into workable clades. While targeted sequencing provides a massive quantity of data that improves phylogenetic resolution in Eugenia, uncertainty still remains in Eugenia sect. Umbellatae. The general pattern of higher site coefficient factor (CF) than gene CF in the backbone of Eugenia suggests stochastic error from limited signal. Tree landscapes in combination with concordance factor scores, as implemented here, provide a comprehensive approach that incorporates several phylogenetic hypotheses. We believe the protocols employed here will be of use for future investigations on the evolutionary history of Myrtaceae.






    The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.






    The carrot family (Apiaceae) comprises 466 genera, which include many well-known crops (e.g., aniseed, caraway, carrots, celery, coriander, cumin, dill, fennel, parsley, and parsnips). Higher-level phylogenetic relationships among subfamilies, tribes, and other major clades of Apiaceae are not fully resolved. This study aims to address this important knowledge gap.
    Target sequence capture with the universal Angiosperms353 probe set was used to examine phylogenetic relationships in 234 genera of Apiaceae, representing all four currently recognized subfamilies (Apioideae, Azorelloideae, Mackinlayoideae, and Saniculoideae). Recovered nuclear genes were analyzed using both multispecies coalescent and concatenation approaches.
    We recovered hundreds of nuclear genes even from old and poor-quality herbarium specimens. Of particular note, we placed with strong support three incertae sedis genera (Platysace, Klotzchia, and Hermas); all three occupy isolated positions, with Platysace resolved as sister to all remaining Apiaceae. We placed nine genera (Apodicarpum, Bonannia, Grafia, Haplosciadium, Microsciadium, Physotrichia, Ptychotis, Tricholaser, Xatardia) that have never previously been included in any molecular phylogenetic study.
    We provide support for the maintenance of the four existing subfamilies of Apiaceae, while recognizing that Hermas, Klotzschia, and the Platysace clade may each need to be accommodated in additional subfamilies (pending improved sampling). The placement of the currently apioid genus Phlyctidocarpa can be accommodated by the expansion of subfamily Saniculoideae, although adequate morphological synapomorphies for this grouping are yet to be defined. This is the first phylogenetic study of the Apiaceae using high-throughput sequencing methods and represents an unprecedented evolutionary framework for the group.






    Resolving relationships within order Commelinales has posed quite a challenge, as reflected in its unstable infra-familial classification. Thus, we investigated (1) relationships across families and genera of Commelinales; (2) phylogenetic placement of never-before sequenced genera; (3) how well off-target plastid data integrate with other plastid-based data sets; and (4) how the novel inferences coincide with the infra-familial classification.
    We generated two large data sets (nuclear and plastome) by means of target sequence capture using the Angiosperms353 probe set, with additional sequences mined from publicly available transcriptomes and full plastomes. A third extended-plastid data set was considered, including all species with sequences in public repositories. Species trees were inferred under a multispecies coalescent framework from individual gene trees and also using maximum likelihood analyses from concatenated and partitioned data.
    The nuclear, plastome, and extended-plastid data sets include 52, 53, and 58 genera, respectively, and up to 290 species of Commelinales, representing the most comprehensive molecular sampling for the order to date, which includes seven never-before sequenced genera.
    We inferred robust phylogenies supporting the monophyly of Commelinales and its five constituent families, and we recovered the clades Pontederiaceae-Haemodoraceae and Hanguanaceae-Commelinaceae, as previously reported. The placement of Philydraceae remains contentious. Relationships within the two largest families, Commelinaceae and Haemodoraceae, are resolved. Based on the latter results, we confirm the subfamilial classification of Haemodoraceae and propose a new classification for Commelinaceae, which includes the synonymization of Tapheocarpa in Commelina.






    Comprising five families that vastly differ in species richness-ranging from Gelsemiaceae with 13 species to the Rubiaceae with 13,775 species-members of the Gentianales are often among the most species-rich and abundant plants in tropical forests. Despite considerable phylogenetic work within particular families and genera, several alternative topologies for family-level relationships within Gentianales have been presented in previous studies.
    Here we present a phylogenomic analysis based on nuclear genes targeted by the Angiosperms353 probe set for approximately 150 species, representing all families and approximately 85% of the formally recognized tribes. We were able to retrieve partial plastomes from off-target reads for most taxa and infer phylogenetic trees for comparison with the nuclear-derived trees.
    We recovered high support for over 80% of all nodes. The plastid and nuclear data are largely in agreement, except for some weakly to moderately supported relationships. We discuss the implications of our results for the order\'s classification, highlighting points of increased support for previously uncertain relationships. Rubiaceae is sister to a clade comprising (Gentianaceae + Gelsemiaceae) + (Apocynaceae + Loganiaceae).
    The higher-level phylogenetic relationships within Gentianales are confidently resolved. In contrast to recent studies, our results support the division of Rubiaceae into two subfamilies: Cinchonoideae and Rubioideae. We do not formally recognize Coptosapelteae and Luculieae within any particular subfamily but treat them as incertae sedis. Our framework paves the way for further work on the phylogenetics, biogeography, morphological evolution, and macroecology of this important group of flowering plants.






    The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this article are to (i) document our methods, (ii) describe our first data release, and (iii) present a novel open data portal, the Kew Tree of Life Explorer ( We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic data set for angiosperms to date, comprising 3099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96$\\%$) and 2333 genera (17$\\%$). A \"first pass\" angiosperm tree of life was inferred from the data, which totaled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated data set, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone toward a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardized nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world\'s natural history collections. [Angiosperms; Angiosperms353; genomics; herbariomics; museomics; nuclear phylogenomics; open access; target sequence capture; tree of life.].







    Despite the global importance of tropical ecosystems, few studies have identified how natural selection has shaped their megadiversity. Here, we test for the role of adaptation in the evolutionary success of the widespread, highly abundant Neotropical palm Mauritia flexuosa. We used a genome scan framework, sampling 16,262 single-nucleotide polymorphisms (SNPs) with target sequence capture in 264 individuals from 22 populations in rainforest and savanna ecosystems. We identified outlier loci as well as signal of adaptation using Bayesian correlations of allele frequency with environmental variables and detected both selective sweeps and genetic hitchhiking events. Functional annotation of SNPs with selection footprints identified loci affecting genes related to adaptation to environmental stress, plant development, and primary metabolic processes. The strong differences in climatic and soil variables between ecosystems matched the high differentiation and low admixture in population Bayesian clustering. Further, we found only small differences in allele frequency distribution in loci putatively under selection among widespread populations from different ecosystems, with fixation of a single allele in most populations. Taken together, our results indicate that adaptive selective sweeps related to environmental stress shaped the spatial pattern of genetic diversity in M. flexuosa, leading to high similarity in allele frequency among populations from different ecosystems.







    OBJECTIVE: Until recently, most phylogenetic studies of ferns were based on chloroplast genes. Evolutionary inferences based on these data can be incomplete because the characters are from a single linkage group and are uniparentally inherited. These limitations are particularly acute in studies of hybridization, which is prevalent in ferns; fern hybrids are common and ferns are able to hybridize across highly diverged lineages, up to 60 million years since divergence in one documented case. However, it not yet clear what effect such hybridization has on fern evolution, in part due to a paucity of available biparentally inherited (nuclear-encoded) markers.
    METHODS: We designed oligonucleotide baits to capture 25 targeted, low-copy nuclear markers from a sample of 24 species spanning extant fern diversity.
    RESULTS: Most loci were successfully sequenced from most accessions. Although the baits were designed from exon (transcript) data, we successfully captured intron sequences that should be useful for more focused phylogenetic studies. We present phylogenetic analyses of the new target sequence capture data and integrate these into a previous transcript-based data set.
    CONCLUSIONS: We make our bait sequences available to the community as a resource for further studies of fern phylogeny.






