Learning from Kernels: HPC in Plant Breeding

Mary LaPorte, University of California, Davis

Photo of Mary LaPorte

Plant breeders use high-performance computing to select crop varietal candidates (or parents for future crossing) that have advantageous qualities for farmers or consumers. These qualities (or priority traits) could include nutritional quality for humans and/or livestock, agronomic factors like high yield or optimized maturation times, disease resistance, and climate resilience. Genomic Prediction (GP) is a technique that uses the DNA sequence to predict breeding values for plant traits more accurately; using genetic information ascertained from seedlings, GP allows breeders to make decisions about traits (like disease susceptibility or yield) that can only be measured months or years in the future. Efficient usage of HPC is critical to use genomic prediction in a breeding context. Because genetic data can be high-dimensional (hundreds to billions of features) and trait data are expensive to collect (often leading to sparse datasets and n >> p), thoughtful mathematical and computational approaches are required for complex experimental applications. For example, genomic prediction can be a beneficial strategy to aid breeders in increasing grain carotenoid concentrations in maize (Zea mays L. ssp. mays) to help alleviate Vitamin A deficiency in parts of the world where maize constitutes a high percentage of the diet. This project assessed the accuracy of several Genomic Prediction (GP) strategies for nutritionally relevant grain carotenoid traits in tropical/subtropical inbred maize lines. Methods employing Ridge Regression-Best Linear Unbiased Prediction, Elastic Net, or Reproducing Kernel Hilbert Spaces had similarly high accuracies in predicting all tested provitamin A carotenoid traits, and all outperformed the regression-based technique Least Absolute Shrinkage and Selection Operator (LASSO). Additionally, we found that models using genome-wide markers outperformed models that only included genetic features proximal to previously identified carotenoid-related genes, suggesting that GP is worthwhile for these traits despite key genes having already been identified.