Optimizing Protein Representations with Information Theory
Julian Mintseris, Boston University
The problem of describing a protein representation by breaking up the amino acids atoms into functionally similar atom groups has been addressed by many researchers in the past 25 years. They have used a variety of physical, chemical and biological criteria of varying degrees of rigor to essentially impose our understanding of protein structures onto various atom-typing schemes used in studies of protein folding, protein-protein and protein-ligand interactions, and others. Here, instead, we have chosen to rely primarily on the data and use information-theoretic techniques to dissect it. We show that we can obtain an optimized protein representation for a given alphabet size from protein monomers or protein interface datasets that are in agreement with general concepts of protein energetics. Closer inspection of the atom partitions led to interesting observations pointing to the greater importance of the hydrophobic interactions in protein monomers compared to interfaces and, conversely, greater importance of polar/charged interaction in protein interfaces. Comparing the atom partitions from the two datasets we show that the two are strikingly similar at alphabet size of five, proving that despite some differences, the general energetic concepts are very similar for folding and binding. Implications for further structural studies are discussed.
Abstract Author(s): Julian Mintseris & Zhiping Weng