WP 7 - PhD (I)

Large-scale data analysis and machine-learning across microbial ecosystems

Duration of contract: 4 years 
Planned starting date: Fall 2024 
Place of work: University of Vienna

Main supervisor: Shaul Pollak (Lab homepage)
 

Project description:

The goal of this project is to develop efficient computational approaches that rely on modern machine-learning concepts to extract biological, ecological, and evolutionary knowledge from the large amount of sequencing data produced by the different work packages in the “MicroPlanet” Cluster of Excellence. Since the early 2000s petabytes of sequencing data have been produced, but the biological insights promised at the onset of the sequencing revolution have not been delivered yet. The sheer diversity of microbial life, with estimated billions of species and trillions of gene families limits our quest for insights, as most genes and species are uncharacterized, and it would take many decades of laborious experimentation to close this gap. Recent advances in Machine learning such as the advent of Large Language Models and tools like AlphaFold have exposed the utility of large-scale data analysis in breaking traditional barriers in biological research. The cluster of excellence offers a unique opportunity to synthesize large volumes of data from across ecosystems to answer fundamental biological questions, but efficient computational tools that go beyond the state-of-the-art are still needed for this task. This project will be in close collaboration with the experimental labs in the Cluster of Excellence and will develop new approaches and tools to analyze time-series data, bacterial genomic data, protein variation and evolution, and making theory-guided inferences about ecology from the underlying genetic and taxonomic structure of microbial communities. Candidates from Computer Science / Physics / Mathematics background with experience in machine-learning are encouraged to apply.

 

Links:

Projects and work packages