Modeling Pre-mRNA Splicing With a Stochastic Grammar Framework
Kayla McCue, Massachusetts Institute of Technology
One of the key steps in eukaryotic gene expression is the excision of sequences, known as introns, from the pre-mRNA transcript and the joining of the remaining sequences, known as exons, to form the mature mRNA. This process, known as splicing, lets cells use the same genomic sequence to produce multiple distinct proteins, as different transcripts of the same sequence can have alternate splicing patterns. Short sequences known as splicing regulatory elements (SREs) can play an important role in determining a particular splicing pattern by recruiting trans-factors that can either enhance or suppress splicing locally. Varying levels of these trans-factors across tissues contribute to different cell fates and behaviors. However, the rules underlying the effects of SREs on splicing choices are still incompletely understood. We have developed a model of splicing based on the framework of a stochastic context-free grammar, which can predict a splicing pattern for an arbitrary human transcript, and an associated gradient-descent approach to learn parameters describing the activity of different SREs from the splicing patterns observed in a training data set. Additionally, this model lets us explore potential exonization events, wherein intronic sequences could be induced to become exonic through outside intervention.
Abstract Author(s): Kayla McCue, Chris Burge