Inverse Design of Upconverting Nanoparticles via Deep Learning on Physics-Infused Heterogeneous Graphs

Lucas Attia, Massachusetts Institute of Technology

Photo of Lucas Attia

Heterostructured core-shell lanthanide-doped upconverting nanoparticles (UCNPs) have unique optical properties, capable of near-infrared excitation to yield visible and ultraviolet emissions. UCNPs have broad applications ranging from biosensing and super-resolution microscopy to 3D printing. Factors affecting the nonlinear photophysical properties of UCNPs include number of shells, shell thicknesses, dopant concentrations, and surface ligands, defining a vast chemical design space. While kinetic Monte Carlo (kMC) simulations allow for reasonably accurate in silico prediction of optical properties, calculations scale poorly with particle size and dopant loading, constraining the search for UCNPs with desirable properties to be fundamentally Edisonian. Despite the potential to use deep learning (DL) to navigate this space more efficiently, UCNPs previously had neither a viable structural representation for DL (they are unlike molecules, crystals, proteins, text, or images) nor sufficient data for DL (individual photophysical kMC simulations can take weeks). Here, we report efforts to overcome these challenges by combining high-throughput data generation with nanoparticle representation learning. We construct the first large dataset of over 10,000 simulated UCNP spectra with bespoke high-performance lanthanide energy transfer kMC driven by automated workflows on HPC resources. We investigate random forest, MLP, CNN, and GNN ML architectures, eventually converging on a physics-infused heterogeneous GNN as our best-performing model. We then use the trained GNN to perform inverse design of UCNP heterostructure via gradient-based optimization - maximizing UV emission under 800 nm illumination as a function of number of shells and maximum nanoparticle size, identifying novel structures with order-of-magnitude increases in predicted emission compared to any in our training data. To the best of our knowledge, this is the first time that data generation, representation development, and DL-enabled optimization have been performed in a novel space end-to-end.