Structure-Aware Annotation of Leucine-Rich Repeat Domains
Boyan Xu, University of California, Berkeley
Protein domain annotation is typically done by predictive models such as hidden Markov models (HMMs) trained on sequence motifs. However, sequence-based annotation methods are prone to error, particularly in calling domain boundaries and motifs within them. These methods are limited by a lack of structural information accessible to the model. With the advent of deep learning-based protein structure prediction, we aim to leverage the geometry of protein structures to assist in domain annotation and enhance existing sequence-based annotation. We develop dimensionality reduction methods to annotate repeat units of the Leucine Rich Repeat (LRR) solenoid domain. The methods are able to correct mistakes made by existing machine learning-based annotation tools, and enable the automated detection of hairpin loops and structural anomalies in the solenoid. The methods are applied to and tested on 127 predicted structures of LRR-containing intracellular innate immune proteins in the model plant Arabidopsis thaliana.
Abstract Author(s): Boyan Xu, Alois Cerbu, Daven Lim, Christopher J. Tralie, Ksenia Krasileva