**DELTA PROJECT**

Learn about our three Research Thrust areas.

The project “Descriptors of Energy Landscapes Using Topological Data Analysis (DELTA)” is an National Science Foundation (NSF) funded HDR Institute Frameworks that is advancing topological data analysis (TDA) and machine learning algorithms for the study of intensive and complex data sets of the energy landscapes (EL) of chemical systems. TDA as an innovative approach to solve long-standing challenges in Chemistry, that include but are not limited to "the real-time optimization and control of complex chemical systems." Fundamentally these challenges derive from an unsophisticated understanding of the EL of a chemical system, which dictates the outcomes of all chemical transformation. Chemists generally do not know how the EL changes as a function of system conditions, nor are there quantifiable relationships between intra- and intermolecular interactions and EL topological features. The EL contains much more information than has been interpreted and TDA has the potential to extract new knowledge that fundamentally changes research paradigms. In allegory to the machine learning field a decade ago, fundamental research is needed to learn how to adapt TDA for chemistry applications and new tools must be developed that are accessible to domain experts.

Many topological data analysis tools require data with a small number of dimensions, but raw chemical energy landscapes tend to naturally have very large dimensions: 3N for N atoms. The primary goal of thrust 1 is to investigate techniques that can reduce the dimensionality of chemical energy landscapes, such as those from molecular dynamics simulations, while preserving topology. Using simulations of model chemical reactions, thrust 1 will investigate statistical methods like principal component analysis, diffusion map and active subspace, as well as nonlinear manifold learning techniques such as Isomap and Umap, to extract a low-dimensional subspace of the chemical energy landscape. Applying topological analysis tools from thrust 2 to a sequence of reduced energy landscapes will be used to determine the extent of "dimensional compression" achievable by each technique. Additionally, thrust 1 will also investigate topological descriptors such as chemical reaction networks, social permutation invariant and PageRank as collective variables that reduce the dimensionality of the energy landscape. As above, applying topological data analysis tools to data projected onto various combinations of collective variables will determine the capability of preserving chemical energy landscape topology within spaces of lowest possible dimension.

Submitted by:

**Ravishankar Sundararaman**, Assistant Professor, Materials Science and Engineering, Rensselaer Polytechnic Institute

Using the reduced energy landscape produced in Thrust 1, we will compute its topological and geometric features. The topology and geometry of an energy landscape encode how stable its local minima, its basins of attraction, and its minimal energy paths will be under perturbations of the chemical system. In particular, these topological and geometric descriptors will be used as input for machine learning talks in which we predict the chemical outcomes based upon how the energy landscape transforms as a result of modifications to the chemical environment.Persistent homology provides a global summary of the number of holes of each dimension in the energy landscape, and furthermore, a measure of the robustness of each such topological feature as the energy barrier increases. Much of the popularity of topological data analysis relies on the fact that persistent homology is computable.Morse theory is the standard tool in mathematics to study the shape of an energyfunction in terms of its critical points: minima, saddle points, and maxima. An assumption of classical Morse theory is that the energy function has only non-degenerate critical points. If this assumption is not fulfilled, another area from (differential) topology and singularity theory can be applied, namely catastrophe theory. Catastrophe theory is used to study of how points of stability change under perturbations by external parameters.

Submitted by:

**Henry Adams**, Mathematics, Colorado State University

**Markus Pflaum**, Mathematics, University of Colorado Boulder

Text

Submitted by:

(Image)

Please contact us with your questions and/or concerns by e-mailing the project PI, Aurora Clark at auclark@wsu.edu.

This project is part of the National Science Foundation's Harnessing the Data Revolution Big Idea activity. The effort is jointly funded by the Division of Chemistry within the NSF Directorate for Mathematical and Physical Sciences.