Ab Initio Enhancement of Machine Learning on Complex Chemistries

Santiago Vargas, University of California, Los Angeles

Photo of Santiago Vargas

Machine Learning (ML) promises a paradigm shift for predictive computational routines in chemistry. Given a trained ML model, scientists have an instant oracle for molecular and reaction properties, while traditional experimental or computational routines can struggle to evaluate properties in hours or days. These algorithms also open the door for cutting-edge high-throughput and generative schemes in chemical design. So why hasn’t the revolution come? Several issues remain: poor bonding descriptions, the cost to produce datasets, and the simplicity of most existing datasets render ML models unusable for most pertinent chemical domains. For these reasons, difficult chemistries involving coordination bonding, varying spins/charges, and heavy elements remain relatively-untapped areas of machine learning in chemistry. We explore the use of quantum descriptors, coupled to cutting-edge graph neural networks (GNNs), to enable predictive models with less data and improved generalizability on complex chemistries. Our quantum descriptor of choice, the Quantum Theory of Atoms-in-Molecules (QTAIM), has a storied history of functionalizing electron densities to provide interpretable and rigorous definitions of atomic and bonding characteristics. QTAIM not only generates a host of useful atom and bond properties, but provides molecular interaction graphs that supersede the notion of distance cutoffs used for bonding in cheminformatics. We leverage QTAIM to train GNNs on a diverse range of datasets, including battery species, organometallic complexes, and reaction barriers, to show that QTAIM can push ML models towards out-of-domain (OOD) predictions, with less data, into applications where no benchmarks exist.