Leveraging Large Datasets to Discover Protistan Diversity Across Scales
Arianna Krinos, Massachusetts Institute of Technology
In environmental microbiology, protists (microbial eukaryotes) are notoriously recalcitrant. The complex genomes of these organisms are peppered with largely unknown repetitive sequences. Increasingly complicated pangenomes have been identified within species of eukaryotes, whereby individual genetic subvariants within a metapopulation may harbor substantial genetic diversity with phenotypic consequences. One such organism is the haptophyte Emiliania huxleyi, a calcifying phytoplankter (coccolithophore) that is globally-distributed and intraspecifically diverse. In order to explore the diverse genetic content of Emiliania huxleyi and its physiological repercussions, we thermally acclimated 12 strains of Emiliania huxleyi originally isolated from all over the world to four to five environmentally-relevant temperatures each for a period of no less than two months. We sequenced transcriptomes for seven of these strains at three to five temperatures, which generated 240 gigabytes of raw sequence data.
I will discuss the vast diversity of differentiated genes possessed by these strains, and show how high-throughput pangenomic analysis can illuminate key insights about thermal acclimation and tolerance to changing environmental conditions. I will also show some implications of my work as implemented in the Darwin model, an ecosystem layer on the MIT general circulation model (MITgcm). I will discuss how our new high-quality transcriptome references can be used in conjunction with in situ metatranscriptomic and metagenomic data in the oceans. Further, I will present computational tools and a new algorithm I have developed for improving the accurate identification of protistan diversity in situ using meta-omic datasets. I will discuss the computational challenges of manipulating these massive multi-omic datasets on high-performance computing systems, and the benefit that algorithm development can have on making accurate assumptions. These computational advances are critical for making inference about theimportant ecosystem roles of ecologically-essential and globally-ubiquitousprotists on relevant scales.
Authors: Arianna Krinos1,2,3,4, Margaret Mars Brisbin1,5, Natalie Cohen6, Sarah Hu7, Weixuan Li2, Sara Shapiro1, Stephanie Dutkiewicz2, Frederik Schulz4, Michael Follows2, Harriet Alexander1
1Department of Biology, Woods Hole Oceanographic Institution, USA
2Department of Earth, Atmospheric, and Planetary Science, Massachusetts Institute of Technology, USA
3MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, USA
4Joint Genome Institute, Lawrence Berkeley National Laboratory, USA
5Department of Biological Oceanography, University of South Florida, USA
6Skidaway Institute of Oceanography, University of Georgia, USA
7Department of Oceanography, Texas A&M University, USA
Abstract Author(s): (see above entries)