Network Embeddings for Data Clustering, Transition State Identification, and Energy Landscape Analysis
Many chemical and biochemical systems can be intuitively modeled using networks. Dueto the size and complexity of many biochemical networks, we require tools for efficient network analysis. Of particular interest are techniques that embed network vertices into vector spaces while preserving important properties of the original graph. In this article, we investigate several aspects of node embedding, propose a novel method of generating node embeddings, and explore applications to biochemical systems. We introduce a new method for generating low-dimensional node embeddings for directed graphs using random walk sampling and demonstrate the usefulness of this method for identifying transition states of stochastic chemical reaction systems, detecting relationships between nodes, and studying the structure of the original network. In addition, we propose an efficient scheme for numerical implementation of network embedding based on deterministic computations of commute times rather than random walk trials. We develop a novel implementation of stochastic gradient descent (SGD) based on a low-dimensional sparse approximation of the original random walk on the graph, and show that this approach can improve the performance of node embeddings. This method can be further extended for entropy-sensitive adaptive network embedding by incorporating principles from metadynamics and hierarchical network embedding, allowing for applications to the analysis of molecular structures. By adjusting the edge weights of the network by a Gaussian term, similarly to the metadynamics approach, we ensure that areas that have already been explored extensively by the random walk (i.e., the edges with the largest weights) will be de-emphasized over time, allowing additional iterations of the embedding process to reveal details about other areas of the graph. We show that this approach lends itself well to systems that are influenced by entropy or temperature effects and biochemical systems where the potential energy landscape depends on the system’s configuration at a given time, either by itself or in conjunction with transition path theory. We demonstrate the effectiveness and performance of each of our methods on several datasets and biochemical examples.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Mercurio, Paula
- Thesis Advisors
-
Liu, Di
- Committee Members
-
Wei, Guowei
Chan, Christina
Zhou, Jiayu
- Date
- 2022
- Subjects
-
Mathematics
- Program of Study
-
Mathematics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 99 pages
- Permalink
- https://doi.org/doi:10.25335/gcrr-mh71