Modeling Pre-mRNA Splicing With a Stochastic Grammar Framework
Kayla McCue, Massachusetts Institute of Technology
Pre-mRNA splicing is a key process in the expression of eukaryotic genes wherein sequences known as introns are removed from the pre-mRNA transcript, leaving behind sequences called exons that are joined to form the final mRNA sequence. This can occur in a constitutive manner, where the same splicing pattern is used for every transcript from the gene, or in an alternative manner, where the splicing pattern can vary for different transcripts. The splicing pattern of a gene can be set by initially defining either the introns as the sequence to be removed, in a process known as intron definition, or by defining the exons as the sequence to be retained, a process known as exon definition. In intron definition the boundary elements known as splice sites are initially paired across the intron, whereas in exon definition the splice sites are initially paired across exons before the splicing machinery rearranges to remove the intron between them. The particular factors that determine the usage of intron and exon definition are unknown, though the lengths of exons and introns have been shown to play a role. To further explore this question and work toward the more general problem of splicing simulation, we developed a hidden semi-Markov model (HSMM) of splicing. Applying this to constitutively spliced genes from the model organism Drosophila melanogaster supported the existing theory that fly genes use predominantly intron definition. Additionally, we have developed an extension to the O(L2) HSMM algorithms (where L is the sequence length) that allows for incorporating a joint probability distribution over the splice sites without increasing the time order. We are working on incorporating splicing regulatory elements into this framework and on application to other organisms, including mammals.
Abstract Author(s): Kayla McCue, Chris Burge