Download PDF
Research Article  |  Open Access  |  26 Mar 2024

Clustering molecular energy landscapes by adaptive network embedding

Views: 138 |  Downloads: 15 |  Cited:   0
J Mater Inf 2024;4:3.
10.20517/jmi.2023.40 |  © The Author(s) 2024.
Author Information
Article Notes
Cite This Article


In order to efficiently explore the chemical space of all possible small molecules, a common approach is to compress the dimension of the system to facilitate downstream machine learning tasks. Towards this end, we present a data-driven approach for clustering potential energy landscapes of molecular structures by applying recently developed Network Embedding techniques to obtain latent variables defined through the embedding function. To scale up the method, we also incorporate an entropy sensitive adaptive scheme for hierarchical sampling of the energy landscape, based on Metadynamics and Transition Path Theory. Taking into account the kinetic information implied by the energy landscape of a system, we can interpret dynamical node-node relationships in reduced dimensions. We demonstrate the framework through Lennard-Jones clusters and a human DNA sequence.


Network embedding, metadynamics, transition path theory, energy landscapes


The motivation of the project is the fundamental question of chemical spaces: how many organic molecules can be formed, and of these, how can we identify molecules with useful properties that can be chemically synthesized. Understanding how such molecules function in biological systems will have a tremendous impact on developing new drugs and new treatment of diseases[1]. For example, the generated database (GDB)-17 dataset[2] considers only molecules allowed by valency rules, excluding those unstable or unsynthesizable due to strained topologies or reactive functional groups, thereby reducing the enumeration to a manageable database size of 166.4 billion molecules formed of up to 17 atoms of C, N, O, S, and halogens. Fast nearest neighbor searching of large generated datasets such as GDBs has led to methods for virtual screening and visualization of druglike molecules, with early success in neurotransmitter receptor and transporter ligands.

Most applications in biology and chemistry, such as protein folding, involve systems that behave according to some potential energy landscape of complex structure with a large number of local minima, saddle points (transition states), entropic plateaus, and deep energy wells. Existing methods for energy landscape analysis focus on identifying local minima via geometric optimization and finding transition states connecting them, often using the steepest descent pathways[3]. The gentlest ascent approach[4] takes an alternative perspective, instead examining the dynamics by which a process escapes its local minima along pathways through the neighborhoods of saddle points. Another is the string method[5], which identifies the transition rates and most probable transition pathways of a system by constructing smooth parameterized curves that are guaranteed to lie along the transition pathway between two states. This string method has since been applied to the energy landscapes of magnetic elements[6]. To understand the dynamics over multiple magnitudes of space and time scales, we can take the viewpoint of the system as a network of local energy minima and entropic basins, connected by edges weighted according to the energy and entropy barriers that must be crossed for transitions between metastable states.

The proposed research is to apply recent Network Embedding techniques[7-13] to develop a data-driven approach for clustering potential energy landscapes and identifying latent variables of molecular structures, further facilitating sampling and optimization of the chemical spaces and developing generative models for druglike small molecules. The latent variables are given by the output of the embedding function. By incorporating energetic information, we will be able to interpret node-node relationships in reduced dimensions that are consistent with chemical kinetics and are more likely to be aligned with synthesizability.

One multiscale challenge is due to the presence of deep potential wells. Metadynamics[14] uses a non-Markovian random walk to explore an energy landscape, which is smoothed by additive Gaussian terms after each step, and so are the transition probabilities of the associated random walk. As this happens, the process is discouraged from revisiting the lowest energy states repeatedly. The eventual output is to have a flattened energy landscape and more efficient random walk samplings. The original potential can be recreated by subtracting the additive Gaussian terms.

To study transition processes in complex systems with rugged energy landscapes dominated by entropic effects, such that transitions involving a flat region on the potential surface that is favorable entropically and the necessity to decrease entropy to exit from this region, it is necessary to examine the ensemble of all the transition paths as a probability space. The Transition Path Theory (TPT)[15] studies statistical properties of the reactive trajectories such as rates and dominant reaction pathways through probability currents between adjacent states. In[16], the definition of probability current in TPT was generalized from edges to individual nodes and networks for characterizing transition states in the form of subnetworks.

In this article, we use Network Embedding techniques in combination with Metadynamics and TPT to produce adaptive embeddings that hierarchically convey information about the behavior of systems at different scales. We adjust the edge weights of the network in a way that parallels Metadynamics to encourage exploration away from the local energy minima and adopt TPT to capture microdynamical features of interest. It is shown that these embeddings provide an effective way to understand and visualize inter-node relationships.

The rest of this article is structured as follows. In Section “BACKGROUND”, we provide some background on Network Embedding, Metadynamics, and TPT. In Section “NETWORK EMBEDDING WITH METADYNAMICS”, we discuss more details in the implementations and demonstrate our method through Lennard-Jones (LJ) clusters. Section “MULTISCALE EMBEDDING WITH TPT” contains an application to a less homogeneous system: DNA folding in a human telomere.


Network embedding

Real-world networks, particularly those representing possible molecular structures and other biological and chemical systems, are often large and complex, making them difficult to conceptualize. Network Embedding maps nodes of a given network into a low-dimensional continuous vector space to facilitate downstream machine learning tasks. Using the sparsity of networks, recently developed Network Embedding methods can scale up linearly with regard to the number of edges. Major techniques include factorization of functions of the adjacency matrix, random walk samplings of node neighborhoods, and deep network learning techniques. The idea is that nodes with close proximities in the network should have similar embeddings in the latent space.

The basic setup is an undirected network G (S, E) with node set S and edge set E, while the generalization to directed graphs is straightforward. Let |S| = n and A ∈ Rn×n be the weighted adjacency matrix of the network with weight aij ≥ 0 between nodes vi and vj. The output will be a map determined by

$$ z_{i}=f(v_{i}):S\longrightarrow \mathrm{\boldsymbol{R}}^{d},\quad \mathrm{with}\quad d\ll n. $$

For methods discussed in this paper, the encoding function (1), referred as direct encoding, is simply a lookup matrix: zi = f (vi) = Zei, such that Z ∈ Rd×n contains the embedding vectors for all nodes viS, and ei is an indicator vector; therefore, f (vi) simply gives the ith column of Z. The set of trainable parameters for direct encoding approaches is the embedding matrix Z, which is to be optimized directly. Vector ai = {aik}nk=1 denotes the first-order proximity between node vi and other nodes. The second-order proximity between vi and vj can be determined by the similarity between ai and aj, which compares the neighborhood structures of pairs.

In DeepWalk and Node2vec[7,8], short random walk simulations are used to determine proximity for each pair of nodes. More specifically, random walk runs through nodes of the network, with transition rates determined by the edge weights. Two nodes are close to each other if there is a high probability that a random walk simulation containing one node will also include the other. Similarities between embedded nodes are given by the following conditional probabilities based on the SkipGram model[9,10], expressed as

$$ P(j\mid i)=\frac{\mathrm{exp}(z_{i}\cdot z_{j})}{\sum _{l}\mathrm{exp}(z_{i}\cdot z_{l})} $$

Where P (j|i) denotes the conditional probability that a random walk starting at node vi will include node vj and zi is the embedding of vi. The learning is achieved by minimizing the following cross entropy loss using a Stochastic Gradient Descent (SGD) method[11]:

$$ \mathcal{L}(\mathcal{G})=\sum _{v_{i}}\sum_{j\in R(v_{i})}-\mathrm{log}(P(j\mid i)) $$

Where R (i) represents a k-step random walk trial starting from node i. The efficiency of evaluating (2) can be greatly improved using negative sampling[10] that randomly selects edges favoring less frequent ones.

In this paper, we will adopt a Network Embedding scheme introduced in[13], where the embeddings are produced via a sparse approximation of random walks on networks. For a given undirected network G (A) with adjacency matrix A = (aij), let D be the diagonal matrix such that Dii = ∑jaij. The volume of the graph will be given by v = ∑iDii. The Laplacian L = I - D-1A has eigendecomposition L = ΦΛΦT, where Λ represents the diagonal matrix of ordered eigenvalues so that 0 = λ1λ2 ≤ ... λn, and the eigenvectors are given by columns of Φ denoted by $$ \phi _{1} $$, $$ \phi _{2} $$, ..., $$ \phi _{n} $$. Assuming the network is connected, the discrete Green function satisfies

$$ G(i,j)=\sum_{k=2}^{n}\frac{1}{\lambda _{k}}\phi _{k}(i)\phi _{k}(j). $$

We can further define the commute time ct (i, j) to be the mean time for the Markov process prescribed by transition probability matrix T = D-1A = I - L on the network to travel from node vi to node vj, and back to vi. It is shown in[17] that the coordinate matrix for embeddings that preserve commute times has the following form:

$$ \Theta =\sqrt{\mathrm{v}}\Lambda ^{-1/2}\Phi ^{T}. $$

To produce an efficient approximation of Θ, we can assume that T is local in the sense that, at least asymptotically, its columns have small support, and high powers of T will be of low rank, which can be justified for potential driven systems by disparate transition rates between different neighboring metastable states. Taking higher powers of T is equivalent to running the Markov chain forward in time, which allows for representations of the random walk, i.e., reaction pathways, at distinct time scales. Making use of this sparsity, we can produce compressed approximations to the dyadic powers of T with its principal components employing fast algorithms such as Lanczos Bidiagonalization for singular value decomposition (SVD) with the complexity depending linearly on the number of nonzero elements.

The following scheme is a modification of the Diffusion Wavelet algorithm[18]. Starting with T0 = T, at each iteration, taking Uk and ∑k to be the top left singular vectors and singular values of Tk$$ T^{2^{k}} $$, let Tk: = $$ U_{k}^{T} $$Tk-1Uk. From this, we can have a low-rank approximation to the Green function of the random walk using the Schultz method, as given in

$$ G=\sum_{k=1}^{\infty} T^{k}=\prod_{k=0}^{\infty} (I+T^{2^{k}}). $$

The embedding matrix Θ thus satisfies ΘTΘ= vG, where v is the volume of the network defined above. Denoting the leading singular values and left singular vectors of the matrix vG by ∑G and UG, we take Θ: = (∑G)1/2UGT.

We can further use Θ as a starting point, introduce parameters by multiplying its jth column by a weight cj, and optimize the {cj}’s to minimize cross entropy loss (3) via SGD. Moreover, for robustness of the algorithm, with certain probability, we can reintroduce singular vectors removed in previous truncations, as a residual correction technique.


Metadynamics was introduced in[14] as a technique to aid in exploring energy landscapes. The scheme is to create a non-Markovian, and approximately self-avoiding, random walk by adjusting the gradient of the energy landscape after each step with the addition of a derivative of a Gaussian term. Over time, these Gaussians eventually fill up the valleys in the energy potential, which allows the random walk to explore other areas of the landscape and leads to a more complete picture of the system dynamics. Specifically, after each step, the parameter $$ \phi _{i} $$, which represents the derivative of the energy with respect to the ith parameter -$$ \frac{\partial E}{\partial xi} $$, is adjusted according to

$$ \phi _{i}^{t+1}=\phi_{i}^{t}-\frac{\partial }{\partial x_{i}}W\prod _{i}\mathrm{exp}(-\frac{\mid x_{i}-x_{i}^{t}\mid ^{2}}{2\delta ^{2}}), $$

where W, δ and $$ x_{i}^{t} $$ are the height, width and center of the Gaussian, respectively, to be chosen based on prior knowledge of the energy landscape.

Transition path theory

The TPT[15] studies statistical properties of the reactive trajectories such as rates and dominant pathways through probability currents between adjacent states. In the simplest setting, given reactant state A and product state B, any equilibrium path X (t) oscillates infinitely many times between A and B, with each oscillation from A to B being a reaction event. The reactive trajectories are successive pieces of X (t) during which it has left A and is on its way to B next, without coming back to A.

The discrete forward committor $$ q_{i}^{+} $$ is defined as the probability that the process starting in node i will first reach B rather than A, and the discrete backward committor $$ q_{i}^{-} $$ is defined as the probability that the process arriving in node vi last came from A rather than B. For Markov processes with infinitesimal generator T = (tij), the forward committor satisfies the discrete Dirichlet equations:

$$ \sum_{v_{j}\in S}t_{ij}q_{j}^{+}=0,\quad \mathrm{for}\quad v_{i}\in S \setminus (A\cup B), $$

with the boundary condition $$ q_{i}^{+} $$ = 0, if viA ; $$ q_{i}^{+} $$ = 1, if viB. The backward committor satisfies a similar equation. For time reversible processes, $$ q_{i}^{+} $$ = 1 - $$ q_{i}^{-} $$.

The probability current of reactive trajectories is the average rate at which they flow from one state to another when the process is at statistical equilibrium with distribution π, and can be obtained by

$$ f_{ij}^{AB}=\pi _{i}q_{i}^{-}t_{ij}q_{j}^{+}, \quad \mathrm{if} \quad i\ne j. $$

To deal with the fact that transitions between any two states can go forward and backward, the effective current can be introduced as $$ f_{ij}^{+}=\mathrm{max}(f_{ij}^{AB}-f_{ji}^{AB},0) $$.


In this section, we want to introduce the Network Embedding techniques for energy landscapes with Metadynamics adjustments through a few examples.

8-atom LJ cluster

LJ clusters are often used as a model for atomic or molecular dynamics within a fluid, in which the potential energy E of a given configuration of atoms depends only on distances between atoms, as formulated in

$$ E(r)=4\in \sum [(\frac{\sigma }{r_{ij}})^{12}-(\frac{\sigma }{r_{ij}})^{6}], $$

where rij denotes the distance between atoms i and j, and the parameters σ and ϵ represent pair equilibrium separation and well depth. In the experiments below, we adopt reduced units (e.g., σ = ϵ = 1).

To apply the Network Embedding techniques to the LJ clusters, we first generate a database of local energy minima using the Pele software[19]. The database was produced using a basin-hopping run of 500 steps and consists of eight local minima connected by ten transition states. It also contains thermodynamic information, including the potential energy at each local minimum. The embeddings here are based on a network constructed with nodes given by the local minima, and edges located between each pair of nodes where a transition state has been identified. The adjacency matrix has entries given by the energy barriers between metastable states.

Initially, every node is embedded into the vector space. Since most points near the global minimum will be close to each other in terms of commute time, this typically results in these nodes being embedded around the global minimum. Then, removing nodes that are further away from the global minimum and re-embedding only those in the residual cluster reveals new, more detailed information about the remaining nodes. We do this by creating a new, smaller adjacency matrix including only re-embedded points, and adjusting the edge weights with Gaussian terms according to Metadynamics

$$ a_{ij}^{t+1}=a_{ij}^{t}-W\prod _{i}\mathrm{exp}(-\frac{\mid \mid \theta _{i}-\theta_{j}\mid \mid^{2}}{2\delta ^{2}}), $$

where θi is the coordinate of the ith node (energy minimum). Adaptively choosing the center node and removing distant nodes, the process will produce a series of hierarchical embeddings that provide information about the full energy landscape. Each level provides a representation of the energy landscape at a different scale.

Figure 1 gives the disconnectivity tree of the original potential and embeddings after applying the Metadynamics adjustment by equation 11 for the 8-atom LJ cluster. The colors on each minimum on the disconnectivity tree are matched to embeddings of those nodes in Figure 1, with some of the nodes colored red, embedded into the same place. In the embeddings, nodes with closer dynamic relationships are clustered together as a result of the shorter commute time distance between them.

Clustering molecular energy landscapes by adaptive network embedding

Figure 1. Disconnectivity tree and Metadynamics-based embeddings for the Lennard-Jones cluster with eight atoms. Left: Disconnectivity tree of all local minima; Right: Embeddings for the local minima after applying the Metadynamics adjustment. Color scheme represents the potential energy; e.g., dark blue denotes the lowest, and red indicates the highest. Closely related minima have very similar or identical embeddings; e.g., both yellow minima are embedded at the yellow point on the right.

On a larger scale, we can also see how the higher energy nodes relate to the nodes with the two lowest energies. Specifically, the global minimum (in dark blue) is closer to the nodes in the middle of the tree (in orange), while the second lowest local minimum is much closer to the three highest energy nodes. This allows us to conclude the lowest commute time path through these states: one of the orange-colored states might transition directly to the global minimum, while one of the higher energy states would be more likely to transition to the second lowest energy state first and then either remain there or transition on to the global minimum.

For comparison, we also investigated the inter-node relationships by directly computing the commute times between each pair of nodes, as provided in Table 1. The commute time between nodes vi and vj, or mean time for the Markov process on the graph prescribed by the Laplacian L to travel from vi to vj and back to vi, is given by

$$ ct(i,j)=v\sum _{k=2}^{N}\lambda _{k}(\phi _{k}(i)-\phi_{k}(j))^{2}, $$

Table 1

Commute times between nodes, 8-atom LJ cluster


where λk are the eigenvalues of L, and the $$ \phi _{k} $$ are the corresponding eigenvectors as defined in the introduction. The off-diagonal entries for the Laplacian for a LJ cluster with k degrees of freedom are given by

$$ L(i,j)=\sum_{k}\frac{O(i)}{O_{k}(k)}\frac{v(i)}{v_{k}(k)}^{k-1}v(i)\mathrm{exp}(-\beta(E_{k}(k)-E(i))), $$

where O and Ok represent point group orders of the local minima and transition states, respectively; similarly, v and vk indicate mean vibrational frequencies, and E and Ek stand for potential energy levels at each configuration. It can be seen that the commute time diagram is consistent with the Network Embedding output.

Multi-level embeddings of LJ 38 cluster

To scale up this result, we also apply this framework to the 38-atom LJ cluster. Using a 500-step basin-hopping run, we produced a database of 8,797 local minima connected by 8,099 transition states. Figure 2 shows the disconnectivity tree of the 38-atom LJ cluster. For simplicity, we construct the adjacency matrix for this network with entries given by the energy barriers between states. Figure 3 gives the hierarchical embedding of the LJ-38 cluster. The left image shows the initial embeddings, colored by their commute time distances from the global minimum. We re-embedded the points of interest with the Metadynamics adjustment to the potential with two iterations and obtain the right image.

Clustering molecular energy landscapes by adaptive network embedding

Figure 2. Disconnectivity tree for the 38-atom LJ cluster. The structures of the two lowest-energy configurations are also pictured. LJ: Lennard-Jones.

Clustering molecular energy landscapes by adaptive network embedding

Figure 3. Hierarchical embeddings for the LJ cluster with 38 atoms. Pictured are the embeddings before (left) and after (right) applying Metadynamics. Color scheme denotes commute time from the global minimum, with dark blue being shortest distances, and red as furthest distances. LJ: Lennard-Jones.

The hierarchical structure of the embedding process is organized as follows. In the first level embeddings [Figure 3, left], the nodes are embedded consistently with their commute time distances from the global minimum, which correspond loosely to the potential energy level. In particular, the nodes with energy E > -170, which are the highest energy clusters in the tree, are embedded further from the global minimum. In the next embedding, these nodes are removed due to their further distances from the global minimum, and more central clusters will be re-embedded. As we consider the second level and later embeddings, this pattern repeats: each re-embedding reveals a new “layer” of nodes embedded closer to the global minimum, which corresponds to nodes on the disconnectivity tree that are of lower energy than nodes removed in the previous level.

In particular, at the third level [Figure 3, right], the embeddings have four small “spokes” originating from a central cluster containing the global minimum. However, these embeddings provide additional context: nodes embedded within a particular “spoke” are more closely related, which means the system is more likely to transition between these states. Since the nodes do not all come from the same group on the disconnectivity tree, these embeddings can also reveal interactions between nodes not indicated on the disconnectivity tree.

Additionally, since the spokes are connected to the cluster containing the global minimum, we can conclude that each spoke represents a potential transition pathway from the outer edge of the cluster to the center. In other words, if the system is at a state represented by the outer point of one of the spokes, its most likely path toward the global minimum will be to travel through the states represented by other nodes in the same spoke.

We can apply similar reasoning to higher level embeddings. Each level reveals a more detailed picture of the dynamics of a different part of the energy landscape of the system. The first levels give a coarse-grained picture, only identifying broad groups of high and low energy nodes, while later levels give a more fine-grained visualization of the nodes most closely related to the global minimum.

Often, we want a more detailed, finer-grained visualization of the energy landscape than the disconnectivity tree in Figure 2 can provide. It is informative, therefore, to re-embed parts of the network of greatest interest to gain further insight. Here, we focus on the lowest energy parts of the LJ energy landscape. We repeated the above process using the subnetwork consisting only of the nodes with potential energy < -170.9, that is, the 163 lowest energy nodes. Figure 4 presents the results of this experiment.

Clustering molecular energy landscapes by adaptive network embedding

Figure 4. Embeddings of the local minima of the 38-atom LJ cluster with potential energies less than -170.9. The figure shows output of the second level embeddings with Metadynamics adjustment. Color scheme denotes commute time from the global minimum. LJ: Lennard-Jones.

Now, we want to provide a closer inspection on the information flow along the hierarchical sampling with Metadynamics. In the first level without the Metadynamics adjustment, nodes in these embeddings [Figures 3 and 4] are clustered according to their similarity in terms of commute time. In other words, nodes that can be quickly and frequently reached from either the global minimum or second lowest energy node will be grouped near them. As a result, most of the nodes we are most interested in end up in the same cluster, and the embeddings obtained from the first application of the embedding method only give useful clusters for nodes more distant in terms of commute time from the part of the graph of interest. As we progress through additional levels, we pull apart the cluster containing the global minimum, positioned near the origin in the embeddings of the first level, until the embeddings of the final level give details of the dynamics of the process within this cluster.

After the second level of Embedding with Metadynamics adjustment, if two nodes share a cluster or are close in the embedding space, it indicates that the system can easily transition between those nodes, with relatively low energy barriers. As a result, the groupings seen in these embeddings correspond to the groupings in the disconnectivity tree for this system[3,19]. For instance, one of the clusters in the second level embeddings corresponds to the global minimum and its nearest neighbors, pictured directly right of center in the disconnectivity tree in Figure 2. Clusters can be mapped to the disconnectivity tree by comparing the potential energies of the nodes within the cluster to the tree.

The difference is after the third level, some of the clusters instead represent combinations of multiple disconnectivity tree groups; this is a result of re-embedding the nodes to spread out those previously near the origin. Such nodes ended up being embedded in or near the clusters they are most closely related to, even though they are not actually members of the respective tree groupings. For instance, the global minimum is embedded directly next to a node from a neighboring tree group.

In other words, the embeddings of the first level tell us about higher energy nodes and those that are more distant from the global minima, and further levels reveal information about parts of the network that are closer to the global minimum. Additionally, these embeddings are useful for identifying transition paths. The structure of the third level embeddings, in particular, reveals four transition paths: if the system is initialized from a node near the outside of one of these “spokes”, its lowest energy path to the global minima will involve following the spoke into the center cluster.

We can also use these embeddings to observe the results of entropic changes to the system. In Figure 5, the embeddings for this cluster under two additional temperature conditions are shown. The temperature change results are reasonable according to what was observed previously with the 8-atom cluster. Namely, at lower temperatures [Figure 5, left], closely related nodes are more likely to be embedded much nearer each other, creating the impression that there are fewer embeddings, while at higher temperatures [Figure 5, right], there is greater variation in the node embeddings.

Clustering molecular energy landscapes by adaptive network embedding

Figure 5. Metadynamics-based embeddings for the 38-atom cluster at the temperatures T = 0.08 (left) and T =1 (right). Color scheme denotes commute time from the global minimum.

Human telomere folding

Next, we want to further develop the Network Embedding technique for organic molecular structures and apply it to a more complex problem: DNA folding in a human telomere. More specifically, we consider a sequence of 22 nucleotide bases A(G3TTA)3G3 which repeats within human telomeres. This sequence is known to form a G-quadruplex [Figure 6], a type of secondary structure formed by groups of four guanine bases called G-tetrads. Its structure and potential energy landscape were previously investigated in[21], where the potential energy landscape was calculated using the HiRE-RNA model for coarse-grained DNA[22] with six or seven atoms considered for each of the 22 nucleotides in the telomere. We use this database as a starting point. In particular, we construct a network with nodes given by the 4,000 lowest-energy local minima, and edges between nodes determined by the transition states connecting them.

Clustering molecular energy landscapes by adaptive network embedding

Figure 6. The four-strand G-quadruplex structure (PDB structure 1KF1), with guanine nucleotides colored green. Image produced with Chimera[20].

For the initial experiments, we used a random walk based on the energy barriers between states. Figure 7 shows the results of the initial embeddings and fourth levels with Metadynamics adjustments, based on an energy landscape adjusted by a Gaussian term with width 0.75 and height 1 after each successive embedding. As in the LJ cluster experiment, the first embedding includes all nodes in the network, while each successive embedding shows a re-embedding of the subnetwork of nodes most closely related to the global minimum (that is, all nodes whose previous embeddings lie within a small tolerance of the global minimum).

Clustering molecular energy landscapes by adaptive network embedding

Figure 7. Embeddings of the local minima network for the human telomere sequence, based on the adjacency matrix and the Metadynamics adjustment. Color scheme denotes commute time from the global minimum.

As with the LJ clusters, each level of embedding represents “zooming in” on the part of the network around the global minimum. The first level gives us a global view- the global minimum and its nearest neighbors (with respect to commute times) are clustered at the origin, represented by a dark blue dot. The red and orange dots furthest away from the origin represent local minima which are more distantly related, requiring multiple steps or higher energies to transition to the global minimum. Potential transition paths can be identified by starting at one of these points and moving toward the origin along nearby points. At the second and each of the following levels, the nodes embedded closest to the local minimum are re-embedded to give us a more detailed inspection into the relationships of those nodes. The likely transition paths here can be constructed similarly.


Now, we want to use the multiscale nature of the molecular dynamics to speed up and scale up the computation. When a subset of the variables evolves more quickly than the others, dimension reduction can be achieved by the quasi-equilibrium on the fast variables such that averaged macroscopic effect can replace microscopic details. For molecular configurations where the energy landscape is “flatter” and state transitions occur faster, this averaging principle applies. From the space perspective, if the transition paths between nodes representing different molecular configurations are of relatively low energy barriers, they should be closely related in terms of mean commute time. We can expect these low barrier transitions to describe subtler changes likely involving position changes for only a small number of atoms. In other words, there is a time-space scale separation; i.e., on a global scale, transitions between two states tend to require larger, higher dimensional changes (associated with greater energy expenditures), while locally, transitions within some subnetwork around a point of interest require far fewer degrees of freedom.

For energy landscapes with a large number of local minima, the corresponding networks contain large numbers of nodes. In fact, for the LJ clusters, the number of minima increases exponentially with the count of atoms. In these situations, as we have seen, it is beneficial to have a method for Network Embedding that focuses on locally embedding network regions of particular interest, for example, the subnetwork consisting of the global minimum and its nearest neighbors. Since the commute times between states in this subnetwork are short, the time required for the system to move between these configurations is fast and the nodes tend to be embedded near each other. The states in these subnetworks often represent similar molecular configurations, differing only by a simple conformational change; therefore, these subnetworks can have dramatically reduced dimensions compared with the full network.

We want to present an alternate formulation, which replaces the adjacency matrix with the probability current matrix from TPT. For large networks, particularly those with irregular structures, the committor functions needed to apply TPT to the full network may prove difficult or impractical to compute. Focusing our application of TPT onto a smaller, localized subnetwork avoids this difficulty, allowing us to take advantage of the additional information TPT offers. In the following experiments, we first embed all the local minima using the adjacency matrix constructed with energy barriers and apply the Network Embedding with Metadynamics. At each level, we remove nodes with longer commute times to the global minimum, until the system is reduced to a subnetwork such that we can apply TPT and compute the committor functions (8) and probability currents (9).

Then, each subnetwork is embedded into R3 using the effective probability current in place of the adjacency matrix. Here, we apply the hierarchical embedding procedure to the subnetwork of nodes surrounding the global minimum as in previous sections, but the same process could be used to examine other domains of the network aside from the global minimum to obtain a complete picture of the energy landscape.

The distances between nodes within each subnetwork preserve the commute times. Since the cross entropy loss minimization is applied to the overall network at each step, we can expect that the distances between node embeddings to be consistent with likelihoods of a transition, similar to the previous examples, and therefore, we can interpret the embeddings and identify possible transition paths in the same way.

We can demonstrate its efficacy on a smaller system - the 8-atom LJ cluster [Figure 8]. We still see that the two nodes with potential energies near -19.2 (colored yellow in Figure 1) are closely connected, and the nodes represented in red and orange are closer to the lowest energy nodes than to each other, but in this case, the short distance between the two lowest energy nodes reflects a higher transition rate between them. The highest energy nodes, embedded in red, are also embedded separately in this case, whereas their adjacency matrix-based embeddings were identical. These embeddings indicate that a transition path [Figure 8] from the highest energy node to the slightly lower energy node labeled 5 might pass through nodes 1 and 2 on the way. Direct simulations of this system confirm this transition path. Hence, these embeddings can be useful for predicting mechanisms by which a molecule changes between two configurations.

Clustering molecular energy landscapes by adaptive network embedding

Figure 8. Hierarchical embeddings of the local minima network for the 8-atom LJ cluster, based on a TPT-based subnetwork consisting of the nodes clustered around the global minimum (in dark blue), and the metadynamic adjustment. Colors of embeddings denote potential energy same as in previous illustrations. LJ: Lennard-Jones; TPT: Transition Path Theory.

We now return to the human telomere molecule. Figure 9 shows the embeddings produced via the TPT process described above. The colors of local minima in the figure are determined by the commute times between those nodes and the global minimum, which is embedded in dark blue. It is immediately apparent that the local minima are grouped according to these commute times, with separate clusters containing most of the red, yellow, and light blue nodes. Within each cluster, the nodes are relatively close regarding commute times, indicating that these molecular configurations are similar up to some simple molecular change. As with the 8-atom LJ cluster, these embeddings suggest possible transition paths. For example, if the system starts at one of the states furthest from the global minimum (colored in red, clustered in the upper left of the right figure), one would expect a transition path to the global minimum to travel through the light blue and yellow clusters to reach the node in dark blue. It is also worth noting that the embeddings given by the TPT approach appear to retain more nodes compared to the adjacency matrix-based embeddings in Figure 7, which results from the greater number of nonzero edge weights in the probability current matrix.

Clustering molecular energy landscapes by adaptive network embedding

Figure 9. Hierarchical embeddings of the local minima network for the human telomere sequence. Left: the embeddings of the full network based on the adjacency matrix. Right: embeddings of the subnetwork near the global minimum using Metadynamics and TPT. Colors of embeddings denote commute time distance from the global minimum. TPT: Transition Path Theory.


This article presents a framework for analyzing energy landscape data. Adaptive network embeddings that combine the ideas of Metadynamics and TPT with Node Embedding techniques can be employed to aid in interpreting and simplifying these data. In this embedding scheme, the network itself - both its edge weights and the set of nodes under consideration - can be adjusted to more effectively focus on particular areas of the graph. Future research will involve applying and developing the method for functions of specific molecules in the context of certain chemical or biological reacting networks, along with the energy landscapes of large nanoparticles. As we apply this approach to more complex systems, it will become necessary to consider the effects of more sophisticated physical processes, such as magnetic moments, alongside the effects of temperature and other environmental variables on the energy landscape.

In order to apply this embedding scheme efficiently and accurately on these higher-dimensional systems, we also plan to investigate the most optimal implementation. Some possible directions for this investigation include examining the effects of different optimizers in the model (such as SGD[11], Adam[23], and others), other methods for identifying transition paths and states (such as the string method[5] or transition path sampling[24]), and the relationship between different implementations of the network embeddings.

We anticipate that these energy landscape-based network embeddings would be used to advance the models currently used to explore the space of small molecules and identify potential new drugs. For example, the latent variables learned from molecular energy landscapes could be incorporated into Variational Autoencoders or other generative models as node attributes[25], in much the same way 3D representations of molecules are already used. The inclusion of these additional latent variables would lead to a multi-modal generative method that considers kinetic information of the molecular systems for generating more realistic, chemically viable molecules.


Authors’ contributions

Contributed substantially to study conception, design, and data analysis and interpretation: Mercurio P

Contributed substantially to study conception, design, data analysis and interpretation and provided administrative and material support: Liu D

Availability of data and materials

Molecular graphics and analyses were performed with UCSF Chimera[17], developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311.
Databases of local minima and transition states were computed using the Pele software[19] and PATHSAMPLE[26], and the disconnectivity trees were drawn using disconnectionDPS[27].

Financial support and sponsorship

The work is partially supported by NSF-DMS 1720002, accessible via

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.


© The Author(s) 2024.


1. Dobson CM. Chemical space and biology. Nature 2004;432:824-8.

2. Reymond JL. The chemical space project. Acc Chem Res 2015;48:722-30.

3. Wales DJ. Exploring energy landscapes. Annu Rev Phys Chem 2018;69:401-25.

4. Weinan E, Zhou X. The gentlest ascent dynamics. Nonlinearity 2011;24:1831-42.

5. Weinan E, Ren W, Vanden-Eijnden E. String method for the study of rare events. Phys Rev B 2002;66:052301.

6. Weinan E, Ren W, Vanden-Eijnden E. Energy landscape and thermally activated switching of submicron-sized ferromagnetic elements. J Appl Phys 2002;93:2275-82.

7. Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York, USA. Association for Computing Machinery; 2014. pp. 701-10.

8. Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York, USA. Association for Computing Machinery; 2016. pp. 855-64.

9. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arxiv. [Preprint] Sep 7, 2013. [accessed on 2024 Mar 25]. Available from:

10. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. arxiv. [Preprint] Oct 16, 2013. [accessed on 2024 Mar 25]. Available from:

11. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 1958;65:386-408.

12. Mercurio P, Liu D. Identifying transition states of chemical kinetic systems using network embedding techniques. Math Biosci Eng 2021;18:868-87.

13. Mercurio P, Liu D. Network embedding using sparse approximation of a random walks. arxiv. [Preprint] Aug 25, 2023. [accessed on 2024 Mar 25]. Available from:

14. Laio A, Parrinello M. Escaping free-energy minima. Proc Natl Acad Sci 2002;99:12562-6.

15. Weinan E, Vanden-Eijnden E. Towards a theory of transition paths. J Stat Phys 2006;123:503-23.

16. Du J, Liu D. Transitions states of stochastic chemical reaction networks. Commun Comput Phys 2021;29:606-27.

17. Qiu H, Hancock ER. Clustering and embedding using commute times. IEEE Trans Pattern Anal Mach Intell 2007;29:1873-90.

18. Coifman RR, Maggioni M. Diffusion wavelets. Appl Comput Harmon A 2006;21:53-94.

19. pele: Python energy landscape explorer. 2012. Available from: [Last accessed on 25 Mar 2024].

20. Pettersen EF, Goddard TD, Huang CC, et al. UCSF Chimera - a visualization system for exploratory research and analysis. J Chem Phys 2004;25:1605-12.

21. Cragnolini T, Chakraborty D, Šponer J, Derreumaux P, Pasquali S, Wales DJ. Multifunctional energy landscape for a DNA G-quadruplex: an evolved molecular switch. J Chem Phys 2017;147:152715.

22. Pasquali S, Derreumaux P. HiRE-RNA: a high resolution coarse-grained energy model for RNA. J Phys Chem B 2010;114:11957-66.

23. Kingma DP, Ba J. Adam: a method for stochastic optimization. arxiv. [Preprint] Jan 30, 2017. [accessed on 2024 Mar 25]. Available from:

24. Bolhuis PG, Chandler D, Dellago C, Geissler PL. Transition path sampling: throwing ropes over rough mountain passes, in the dark. Annu Rev Phys Chem 2002;53:291-318.

25. Gómez-Bombarelli R, Wei JN, Duvenaud D, et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 2018;4:268-76.

26. PATHSAMPLE: a driver for OPTIM to create stationary point databases using discrete path sampling and perform kinetic analysis. Available from: [Last accessed on 25 Mar 2024].

27. disconnectionDPS provides a variety of tools for constructing disconnectivity graphs from a database created by PATHSAMPLE. The original program was written by Mark Miller. Available from: [Last accessed on 25 Mar 2024].

Cite This Article

Export citation file: BibTeX | RIS

OAE Style

Mercurio P, Liu D. Clustering molecular energy landscapes by adaptive network embedding. J Mater Inf 2024;4:3.

AMA Style

Mercurio P, Liu D. Clustering molecular energy landscapes by adaptive network embedding. Journal of Materials Informatics. 2024; 4(1): 3.

Chicago/Turabian Style

Mercurio, Paula, Di Liu. 2024. "Clustering molecular energy landscapes by adaptive network embedding" Journal of Materials Informatics. 4, no.1: 3.

ACS Style

Mercurio, P.; Liu D. Clustering molecular energy landscapes by adaptive network embedding. J. Mater. Inf. 2024, 4, 3.

About This Article

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (, which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments




Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at

Download PDF
Cite This Article 0 clicks
Like This Article 2 likes
Share This Article
Scan the QR code for reading!
See Updates
Journal of Materials Informatics
ISSN 2770-372X (Online)
Follow Us


All published articles are preserved here permanently:


All published articles are preserved here permanently: