Theory and Bioinformatics

            The general subject of our theoretical research, based on bioinformatical methods, is the action of natural selection at the level of genomes. Over the last two years, we mainly focused on five Specific Aims (SA) that were formulated in the grant application. These aims, and the results obtained, are listed below:

SA1: to develop a method for ascertaining past positive selection through ongoing negative selection and to apply it to recent ancestry of humans and fruit flies

            We developed and implemented a method to study past positive selection on the basis of ongoing negative selection, using divergence and polymorphism data. Application of this method to human genome showed that approximately 50% of the amino acid substitutions in the human lineage before the Ponginae–Homininae divergence were driven by positive selection, while this fraction was significantly reduced in the more recent human evolution. In Drosophila fruit fly, 50% of substitutions were driven by positive selection throughout the evolution of this genus (Bazykin and Kondrashov 2011). We also extended the analysis of positive selection through double subsequent substitutions, showing that the role of positive selection is more prominent in the regions of proteins that possessed high overall conservation (Bazykin and Kondrashov 2012).


SA2: to investigate the geometry of fitness landscapes over the space of sequences using data on fixations of adaptive mutations within recently inserted long sequences

            We studied the “adaptive walks” associated with accumulation of amino acid substitutions near the recently gained insertions or deletions (collectively, indels) in the evolution of fruit flies. We showed that an average indel led to subsequent accumulation of several amino acid substitutions in its vicinity, and that these substitutions were driven by positive selection (Leushkin et al. 2012a). This pattern of selection in the indel-flanking regions of proteins overlays the convoluted interaction of deletion-biased mutation, insertion-biased selection (Leushkin et al. 2012b) and insertion-biased gene conversion (Leushkin and Bazykin 2012) processes experienced by the indels themselves.


SA3: to study correlated occurrence of allele replacements at near-by sites within coding and non-coding sequences, and to use it for analysis of molecular compensation

            We showed that the rates of point mutations have a strong small-scale heterogeneity in metazoan genomes, and that this heterogeneity affects not only the mutation rates themselves, but also the relative frequencies of different mutation types (transitions vs. transversions; Seplyarskiy et al. 2012). While controlling for this heterogeneity, we studied the phenomenon of correlated occurrence of replacements at nearby nucleotide sites within non-coding sequences of vertebrates and Drosophila. We showed that this phenomenon is more prevalent than previously estimated, and that it is largely due to simultaneous or near-simultaneous replacements of multiple nucleotides. The majority of the multinucleotide replacements span regions of no more than 10 nucleotides; still, many of them involve simultaneous replacements of more than two nucleotides. This pattern can have either a mutational or a selectional basis; evidence seems to favor the former (Terekhanova et al. 2012).


SA4: to develop a pipeline for discovery of orthologs of selectively constrained unalignable sequences and to use them for the analysis of molecular function

            We developed and applied a method for detection of selective constraint between unalignable nucleotide sequences. As long as the orthology of unalignable segments can be established (e.g., for unalignable introns, through the orthology of alignable exonic regions spanning them), we can measure the conservation of their features even in the absence of textual conservation. We show, for example, that the presence of regions known to have a regulatory function and the presence of selectively constrained regions are two features of introns that change more slowly than the intronic sequence itself, and can be conserved even between non-coding sequences that have diverged beyond recognition (Vakhrusheva et al. 2012).


SA5: to study the phenomenon of the increase of the strength of selective constraint in the course of evolution of a protein, and to use it for the analysis of fitness landscapes

            We studied how the fitness of a particular amino acid at a site depends on the time since replacement that gave rise to it, and showed that this dependence is very strong. Data on the rates of amino acid reversals show that recently lost amino acids are easily restored; conversely, a reversal is unlikely when another amino acid has occupied the site for a long time. We were able to attribute this phenomenon to loss of fitness conferred by an amino acid with time since its replacement, probably due to accumulation of replacements elsewhere in the genome that, through epistatic interactions, have made the old amino acid deleterious (Naumenko et al. 2012a). As a side result, we showed that the site-specific rate of amino acid evolution and the size of the set of permitted variants are only weakly correlated, and are thus two distinct characteristics of the evolutionary process (Naumenko et al. 2012b).


            In addition to our work on these five Specific Aims, a number of theoretical projects on other subjects in evolutionary genetics were implemented. In particular, we showed that bacterial genes often evolve by means of stop codon shift (Vakhrusheva et al. 2010), and analyzed the minimal value of genetic load consistent with a given variance in relative fitness (Shnol et al. 2011).


            Also, we begun to work on the following three problems:


6. Epistasis in evolution of influenza A virus

            We study the phenomenon of accelerated evolution of influenza A virus proteins after reassortment events, which combine in the same genotype protein-coding genes coming from different, and often phylogenetically distant, genotypes. This acceleration is likely due to epistatic interactions between fitness landscapes of different proteins.


7. Population genomics of a hypervariable species Ciona savignyi

            We perform comparative analysis of genotypes of 6 Ciona savignyi individuals, collected in 3 different regions on the Pacific Ocean. Due to its record genotype diversity, this species is an ideal system for studies of natural selection.


8. Neutral evolution of loss-of-function alleles in Drosophila melanogaster populations

            We study accumulation of nonsynonymous and synonymous substitutions in alleles of protein-coding genes of Drosophila melanogster, which have been inactivated by nonsense substitutions. Neutrality of all substitutions within such alleles makes it possible to determine their age and, therefore, to estimate the strength of negative selection which protects the functional alleles.



Acquisition and analysis of our own data


            On top of purely theoretical and bioinformatical research, we also initiated a variety of projects which involve acquisition and analysis of our own data:


9. Sequencing, annotation, and analysis of the minimal angiosperm genome

            The 64M genomes of Genlisea margaretae and Genlisea aurea are the smallest among angiosperms. We study the evolutionary mechanisms responsible for the small size of the genome, by comparing it to genomes of related species.


10. Analysis of adaptation of three-spined stickleback Gasterosteus aculeatus to fresh water

            We are comparing genotypes of Gasterosteus aculeatus from populations living in the White Sea and in the fresh-water lakes in its basin. This comparison makes it possible to identify genes which are involved in adaptation to fresh water and to estimate the strength of the corresponding natural selection. This is a collaborative project with Dr. N. Muguet (Laboratory of Population Biology of Russian Institute of Fisheries and Oceanography).


11. Fine structure of recombination sites in the genome of a basidiomycete Schizophyllum commune

            We crossed two haploid Schizophyllum commune individuals, from Russia and from the USA, whose genetic distance at the synonymous sites exeeds 20%, and and sequenced the genotypes of 20 haploid offspring. The detailed analysis of 35 sites of recombination reveals a number of patterns which cannot be detected in crosses between closer parents.


12. Studies of biodiversity of the White Sea and search for new hypervariable species

            We perform a systematic study of intraspecies genetic variation of macroscopic animals and algae which inhabit the White Sea. So far, we discovered 7 pair of sibling species, as well as a case of mitochondrial introgression. This is a collaborative project with the staff of the MSU White Sea Marine Station.


13. Genomic variation of an ascomycete Geomyces pannorum in extreme habitats

            We sequences 14 haploid genotypes of Geomyces pannorum, which is an ascomycete which lost sexual reproduction a long time ago and which inhabits low-temperature habitats, including permafrost. Comparison of these genotypes revealed a complex pattern of relatedness between individuals from different habitats, as well as a number of cases of horizontal gene transfer. This is a collaborative project with Dr. S. Ozerskaya (Russian Collection of Microorganisms, G. K. Skryabin Institute for Biochemistry and Physiology of Microorganisms).


14. Sequencing, annotation, and analysis of genomes of domesticated buckwheat and their wild progenitors.

            We sequenced the genome of the wild progenitor of one of the two species of domesticated buckwheat, Fagopyrum tataricum ssp. potanini, and are analyzing it. We will soon perform a similar analysis for domesticated varieties of F. tataricum, as well as for the wild progenitor and cultivars of the other species of domesticated buckwheat, F. esculentum.


15. Population genomics of a hypervariable basidiomycete Schizophyllum commune

            We have found that genotype diversity of a wood-decaying basidiomycete Schizophyllum commune is extremely high, being about 7%. We sequenced 42 haploid individuals, from several populations in Russia and the USA and re performing their comparative analysis.


16. Sequencing, annotation, and analysis of the genome of a tetraploid plant Capsella bursa-pastoris

            Capsella bursa-pastoris is a product of a very recent whole-genome duplication. We sequenced its genome are are studying the early phases of its post-duplication evolution. This is a collaborative project with Drs. V. Makeev and A. Kasyanov (Laboratory of Systems Biology and Computational Genetics, Institute of General Genetics).


17. Genomics of adaptation of an ascomycete Podospora anserina to a new environment

            Dr. O. Kudryavtseva (Department of Mycology and Algology, MSU) performs a long-term experiment studying adaptation of an ascomycete Podospora anserina to cultivation under conditions which are far from the natural environment of this species. We sequenced the ancestral genotype and the genotypes of cultivated lineages and perform their comparative analysis.


18. Evolution of proteins in the course of speciation in amphipods from the lake Baikal

            Lake Baikal amphipods respresent one of the most spectacular species flocks. We are sequencing transcriptomes of many species, in order to study patterns of evolution which accompanied anagenesis and cladogenesis which produced this flock. This is a collaborative project with Dr. L. Yampolsky (Department of Biology, East Tennessee State University).