Simulation and Machine Learning for Atomic-Scale Characterization of Nanomaterials With Transmission Electron Microscopy
Luis Rangel DaCosta, University of California, Berkeley
Modern aberration-corrected transmission electron microscopes (TEMs) are one of the most powerful tools available for the characterization of materials. TEMs offer imaging at the sub-Angstrom scale and spectroscopic analysis of materials with sub-eV resolution–often, simultaneously, and through a plethora of complementary experimental modalities. However, extracting quantitative data directly from experiments is difficult due to the complex interaction of the electron beam used for imaging and the sample and the low signal-to-noise present in experimental data, which can be limited by the damage any given sample can withstand due to electron dose effects. High-quality quantitative analyses have historically relied on extremely manual techniques, which further limit studies to small sample sizes and do not scale to the data throughput of our modern, soon-to-be-automated microscopes. In this talk, I will discuss the work I have done in developing state-of-the-art machine-learning tools for quantitative analysis of TEM experiments to aid in the atomic-scale characterization of nanomaterials. Specifically, I will discuss how we can use high-throughput simulation techniques to generate large, ground-truth datasets for machine learning model training. These synthetic datasets closely mimic real experimental data and offer us an opportunity to use well-established supervised learning techniques to train performant neural network models which can provide accurate measurements of experimental data. Further, our simulation framework gives us full control of both the structure generation and TEM imaging processes, letting us generate realistic synthetic data under practically diverse conditions. By fully specifying the data curation process, we can dig deeper into the performance of our machine learning models and provide a detailed qualification of our models, their performance capacities, and their in- and out-of-distribution generalization behavior.