Yong Zhao

Date of Award

Spring 2022

Document Type

Open Access Dissertation


Computer Science and Engineering

First Advisor

Jianjun Hu


Discovery of novel functional materials is playing an increasingly important role in many key industries such as lithium batteries for electric vehicles and cell phones. However experimental tinkering of existing materials or Density Functional Theory (DFT) based screening of known crystal structures, two of the major current materials design approaches, are both severely constrained by the limited scale (around 250,000 in ICSD database) and diversity of existing materials and the lack of a sufficient number of materials with annotated properties. How to generate a large number of physically feasible, stable, and synthesizable crystal materials and build accurate property prediction models for screening are the two major unsolved challenges in modern materials science.

This dissertation is focused on addressing these two fundamental tasks in material science using deep learning/machine learning models. Deep learning and machine learning have already made tremendous progress in computer vision and natural language processing, as shown by autonomous driving cars and Google’s translators, and have the potential to greatly transform the research of materials science. Compared to conventional tinkering based materials discovery methods, data-driven approaches have been increasingly used in material informatics due to their significantly faster screening speeds for new materials. In this dissertation, we design and develop novel deep learning-based algorithms to learn the hidden intricate chemical rules that assemble atoms into stable crystal structures from known crystals and to generate new crystal structures . We also explore and develop novel representation learning methods upon materials compositions and structures for high performance prediction of materials structural characteristics and elastic properties.

In the first topic, we propose CubicGAN, a generative adversarial network (GAN) based deep neural network model for large-scale generative design of novel cubic materials. When trained on 375 749 ternary materials from the OQMD database, we show that the model can not only rediscover most of the currently known cubic materials but also generate hypothetical materials of new structure prototypes. A total of 506 such materials have been verified by DFT based phonon dispersion calculation. Our technique allows to generate tens of thousands of new materials given sufficient computing resources.

In the second topic, we propose a Physics Guided Crystal Generative Model (PGCGM) for new materials generation, which significantly expands the structural scope of CubicGAN by bringing the capability of generating crystals of 20 space groups. This is achieved by capturing and exploiting the pairwise atomic distance constraints among neighbor atoms, symmetric geometric constraints, and a novel data augmentation strategy using the base atom sites of materials. With atom clustering and merging on generated crystal structures, our method increases the generator’s validity 8 times when compared to one of the baselines and by 143% compared to the previous CubicGAN, along with its superiority in properties distribution and diversity. We further validated our generated candidates by DFT calculations, which successfully optimized/relaxed 1869 materials out of 2000 generated ones, of which 39.6% had negative formation energy, indicating their stability.

In the third topic, we propose and evaluate machine-learning algorithms for determining the structure type of materials, given only their compositions. We couple random forest (RF) and multiple-layer perceptron (MLP) neural network models with three types of features: Magpie, atom vectors, and one-hot encoding (atom frequency) for the crystal system and space group prediction of materials. Four types of models for predicting crystal systems and space groups are proposed, trained, and evaluated including one-versus-all binary classifiers, multiclass classifiers, polymorphism predictors, and multilabel classifiers. The synthetic minority over-sampling technique (SMOTE) is conducted to mitigate the effects of imbalanced data sets. Our results demonstrate that RF with Magpie features generally outperforms other algorithms for binary and multiclass prediction of crystal systems and space groups, while MLP with atom frequency features is the best method for structural polymorphism prediction.

Finally, we propose using electronic charge density (ECD) as a generic unified 3D descriptor for materials property prediction due to its advantage of possessing a close relation with the physical and chemical properties of materials. We develop an ECD-based 3D convolutional neural network (CNN) to predict the elastic properties of materials in which CNNs can learn effective hierarchical features with multiple convolving and pooling operations. Our experiments show that our method can achieve good performance for elasticity prediction over 2170 Fm-3m materials.