Navigating landscapes of admixture
Admixture, the genetic exchange between differentiated populations, has become a topic of great interest. By now, we know that humans and their extinct relatives experienced a complex history of admixture, and present-day human genomes contain DNA fragments from these archaic populations. The availability of thousands of modern and ancient human genomes allows to computationally infer introgressed fragments, even in the absence of genomes from the source population. We find similar patterns in the genomes of our closest living relatives, the great apes, and also in other primates and other mammalian species. However, the genomic landscapes of introgression have not been studied as thoroughly in other species as in humans, while from a comparative perspective it is highly desirable to strive for a matching picture of this phenomenon. Such events have been described in different species or clades, each time with different tools and conceptual approaches, but an analysis of admixture has not been attempted in a comparative manner yet. Due to factors like differing qualities of genome assemblies or varying amounts of available sequencing data, the current picture is far from coherent, meaning that patterns of natural selection after admixture and its functional impact are not contextualized across different mammalian species. Here, I want to characterize patterns of admixture in mammals, including humans and other primates. For this, I will analyse large-scale datasets on mammalian variation using computational genomics. At the core of my proposal, I will develop methods to detect gene flow and its genomic distribution in different species in the same manner. The major challenges are a) to identify such fragments in heterogeneous datasets, with varying sample sizes, varying demographic histories, varying amounts of and times since admixture; b) to identify them in non-human data, on different genome assemblies; and c) to meaningfully interpret the landscapes of introgression between species. The goal is to apply such methods for events at a very low extent (below 1% of the individual genomes), which is not feasible with the existing approaches, and in the absence of genomes from the source population. The large-scale datasets I will use are variation data from present-day and ancient humans and hominins, present-day and historical data from great apes, large-scale primate diversity data, and possibly data from other mammalian species. This will build up on strategies employed in my previous work, and the methods will involve genome-wide statistics, tree topology-based methods or local genealogies for gene flow detection. Most importantly, I want to advance statistical methods for determining introgressed fragments under the described heterogeneous circumstances. Demographic modelling will be extremely relevant for my work and based, among other methods, on Approximate Bayesian Computation (ABC). This ABC-based modelling, in combination with novel approaches in Deep Learning, will be a major strategy for determining the landscape of introgression. Following the detection of admixture and determination of the genomic distribution, it will be important to computationally study signatures of selection. Again, the scope will be a comparison of these signatures across multiple species, which will be highly informative from an evolutionary perspective. This will involve analysing positive selection using statistics specifically developed for the framework of adaptive introgression, and the development of tools to be used for a comparison across species. The distribution of introgressed fragments along the chromosomes will inform on introgression deserts, for which no formal framework exists yet, and which will be an important goal of this project, especially taking into account the differences in the datasets. This has the potential to define what makes each lineage special, and also to reveal recurrent speciation factors. Finally, functional inferences can be made, but need novel strategies for the heterogeneous types of data used here. The design of this proposal will allow a generalization of patterns of admixture across mammals, particularly the dynamics of adaptive benefits and speciation. This is a novel perspective, which will significantly contribute to our understanding of mammalian evolution and provide new tools to the field.