Molecular dynamics

We have seen that using the tools of statistical mechanics thermodynamic properties can be obtained through ensemble averages. The next challenge is to generate a set of samples of different microstates that are distributed according to the equilibrium phase space distribution function. This is exactly the idea behind MD and MC simulations. However, there is a key difference between the two methods.

In their simplest form, MD simulations rely on the realization that the classical equations of motion of \(N\) particles, isolated in a box of volume \(V\) and acted upon by conservative forces, naturally generate samples from the microcanonical ensemble. The fixed energy \(E\) is determined by the initial conditions, which we will elaborate on later. It is possible, but not straightforward, to generate samples from other ensembles through MD simulations.

In contrast, MC simulations generate samples from the equilibrium phase space distribution of various ensembles through a random process. In this chapter, we first discuss \(NVE\) MD in detail. Then, you will write the code for the first numerical project implementing your very own MD simulation. Then, we will do the same for MC simulations in the canonical ensemble.

Propagating in time

The heart of an MD simulation is the algorithm that propagates the classical equations of motion. There are quite a few different algorithms, and we present only one in detail. All of them, however, divide the total duration of the simulations into \(P\) time steps of length \(\Delta t\) and propagate the positions and velocities or momenta step by step in small increments. How small? We will see later.

One of the most most popular algorithms is called velocity Verlet. It is derived by expanding \(\textbf{r}(t+\Delta t)\) in a Taylor series around \(\Delta t = 0\) up to second order,

\[ \textbf{r}(t+\Delta t) = \textbf{r}(t) + \dot{\textbf{r}}(t) \Delta t + \frac{1}{2} \ddot{\textbf{r}}(t) {\Delta t}^2, \]

or in terms of the momenta and forces,

(39)\[ \textbf{r}(t+\Delta t) = \textbf{r}(t) + \frac{\textbf{p}(t)}{m} \Delta t + \frac{1}{2} \frac{\textbf{F}(t)}{m} {\Delta t}^2. \]

We can similarly write a Taylor expansion of \(\textbf{r}(t-\Delta t)\),

\[ \textbf{r}(t-\Delta t) = \textbf{r}(t) - \frac{\textbf{p}(t)}{m} \Delta t + \frac{1}{2} \frac{\textbf{F}(t)}{m} {\Delta t}^2. \]

Which we rewrite, by defining \( \tilde{t} = t-\Delta t \) and then relabeling \(\tilde{t}\) back as \(t\),

\[ \textbf{r}(t) = \textbf{r}(t + \Delta t) - \frac{\textbf{p}(t + \Delta t)}{m} \Delta t + \frac{1}{2} \frac{\textbf{F}(t + \Delta t)}{m} {\Delta t}^2. \]

This equation shows a very nice property of the velocity Verlet algorithm - time reversal symmetry. If we stop the simulation after some time and propagate backward in time by reversing the momenta, we will get back to the initial starting point. Plugging this expression in Eq. (39), and solving for \(\textbf{p}(t + \Delta t)\), we get

(40)\[ \textbf{p}(t+\Delta t) = \textbf{p}(t) + \frac{1}{2} ( \textbf{F}(t) + \textbf{F}(t+\Delta t)) \Delta t. \]

At every time step, the positions are updated first, through Eq. (39), and then the new forces are evaluated using the new positions. In the second stage, the momenta are updated using (40).

An alternative form of the velocity Verlet algorithm is composed of three steps. First, the momenta are propagated half a step,

(41)\[ \textbf{p}(t + \Delta t /2) = \textbf{p}(t) + \frac{1}{2} \textbf{F}(t) {\Delta t}. \]

Then, the positions are updated according to Eq.(39), which is now written as

(42)\[ \textbf{r}(t+\Delta t) = \textbf{r}(t) + \frac{\textbf{p}(t + \Delta t /2)}{m} \Delta t. \]

The new forces, \(\textbf{F}(t + \Delta t)\), are evaluated next using the new positions, and the velocity is then propagated another half step using the new forces,

(43)\[ \textbf{p}(t + \Delta t ) = \textbf{p}(t + \Delta t /2 ) + \frac{1}{2} \textbf{F}(t + \Delta t) {\Delta t}. \]

Both forms of the velocity Verlet algorithm are equivalent, as can be seen by inserting Eq. (41) into Eqns. (42) and (43). However, the second form is more widespread since it avoids the need to store the forces at two different time instances on memory at every step. The second form can also be derived from a beautiful formal procedure for obtaining numerical schemes for MD simulations, which we will discuss next.

But first, what determines the time step \(\Delta t\)? While analytical solutions of the classical equations of motion rigorously conserve the total energy, our numerical solution keeps the energy constant only approximately. The size of the time step is typically determined so that the energy is conserved to some desired accuracy. The figure of merit is that the energy does not change more than \(0.1 \%\) of its initial value. You can get a good estimate of what this time step might be, by considering the fastest motion in the system. In molecular systems, it is typically the vibrational frequencies of H-X stretching modes (X=N, C, O) that are the fastest. Their period is \(\sim 10\) fs and thus the time step is usually between \(0.1-2\) fs. But there is no alternative for rigorous checking of energy conservation!

Deriving numerical propagators

In Eq. (16) we defined the Poisson brackets of some function of the phase space variables and the Hamiltonian. It turns out to be very powerful to define an operator, called the Liouvillian, using the Poisson brackets,

(44)\[ i\hat{L} = \left \{\, ,\mathcal{H} \right \} = \sum_{j=1}^N \left[ \frac{\partial }{\partial \textbf{q}_j} \cdot \frac{\partial \mathcal{H}}{\partial \textbf{p}_j} - \frac{\partial }{\partial \textbf{p}_j} \cdot \frac{\partial \mathcal{H}}{\partial \textbf{q}_j} \right] = \sum_{j=1}^N \left[ \frac{\partial }{\partial \textbf{q}_j} \cdot \dot{\textbf{q}}_j + \frac{\partial }{\partial \textbf{p}_j} \cdot \dot{\textbf{p}}_j \right]. \]

Using this definition, the time derivative of any function of phase space coordinates \(g(\textbf{x}) = g(\textbf{q}_1,...,\textbf{q}_N,\textbf{p}_1,...,\textbf{p}_N)\), can be written as

(45)\[ \frac{\mathrm{d} g}{\mathrm{d} t} = \left \{ g ,\mathcal{H} \right \} = i\hat{L} g \]

Eq. (45) can be formally solved to get

\[ g(\textbf{x}_t) = e^{i\hat{L} t} g(\textbf{x}_0), \]

where \(e^{i\hat{L} t}\) is called the classical propagator. This solution equally applies to the trajectory itself, giving

(46)\[ \textbf{x}_t = e^{i\hat{L} t} \textbf{x}_0. \]

You might have noticed that this framework for time propagation is very similar to the one you learned in quantum mechanics. If so, you are right, but unfortunately, we will not discuss this in depth here.

Eq. (46) is extremely powerful. We now demonstrate how it can be used to derive numerical algorithms to solve the classical equations of motion. For simplicity, we will focus on a single particle in one dimension.

In principle, if we knew how to apply the classical propagator on the microstate vector, we would have solved the equations of motion for all times. For a single particle,

\[ i \hat{L} = \dot{q} \frac{\partial }{\partial q} + \dot{p} \frac{\partial }{\partial p} = \frac{p}{m} \frac{\partial }{\partial q} + F(q) \frac{\partial }{\partial p}. \]

This classical propagator can be divided into two contributions

(47)\[\begin{align} i\hat{L}_1 &= \frac{p}{m} \frac{\partial }{\partial q}, & i\hat{L}_2 = F(q) \frac{\partial }{\partial p}. \end{align}\]

Active learning

Show that the operators \(i\hat{L}_1\) and \(i\hat{L}_2\) do not commute. Hint: Show that their commutator \([i\hat{L}_1, i\hat{L}_2] \ne 0.\)

The fact that the two operators do not commute means we cannot simply write \(e^{i\hat{L}t} = e^{i\hat{L}_1t} e^{i\hat{L}_2t}\) and apply the operators one by one on the microstate. This is unfortunate, because we often know what is the result of operating with \(e^{i\hat{L}_1t}\) or \(e^{i\hat{L}_2t}\) on \(\textbf{x}\), while we do not know how to act with \(e^{i\hat{L}t}\) on the microstate. But there is a way out!

We use the following property of two non-commuting operators \(\hat{A}\) and \(\hat{B}\),

(48)\[ e^{\hat{A}+\hat{B}} = \lim_{P \to \infty} \left[ e^{\hat{B}/2P} e^{\hat{A}/P} e^{\hat{B}/2P} \right]^P, \]

called the Trotter expansion, to write the propagator as

(49)\[ e^{i\hat{L}t} = \lim_{P \to \infty} \left[ e^{i\hat{L}_2 t/2P} e^{i \hat{L}_1 t/P} e^{i \hat{L}_2 t/2P} \right]^P. \]

We then define the time step \(\Delta t = t/P\), which shows that the Liouvillian can operate by successive propagation in small time steps. Does that ring any bells? It is exactly what we want to do in MD! This extremely powerful approach ensures that if we take the limit of \( P \to \infty \) (or, equivalently, \(\Delta t \to 0\)) for a fixed total time \(t\), we will obtain the exact dynamics! Of course, in practice, we always work with a finite \(P\), so the numerical trajectories are approximate. But Eq. (49) promises that some nice properties of exact propagation will be retained in the numerical propagation (time-reversal symmetry, phase space incompressibility, etc.).

So all that is left to do to derive a numerical algorithm is to operate with the discrete-time propagator,

(50)\[\begin{split} \begin{align} e^{i\hat{L} \Delta t} \equiv & e^{i\hat{L}_2 \frac{\Delta t}{2}} e^{i \hat{L}_1 \Delta t} e^{i \hat{L}_2 \frac{\Delta t}{2}} = \\ & \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right) \exp \left( \Delta t \frac{p}{m} \frac{\partial }{\partial q} \right) \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right), \end{align} \end{split}\]

on the microstate vector, to get

\[\begin{split} \left( \begin{matrix} q(t+\Delta t) \\ p(t+\Delta t) \end{matrix} \right) = \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right) \exp \left( \Delta t \frac{p}{m} \frac{\partial }{\partial q} \right) \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right) \left( \begin{matrix} q(t) \\ p(t) \end{matrix} \right). \end{split}\]

The final piece of the puzzle is to know how these exponential operators act on the microstate. This is given by their following property,

(51)\[ \exp \left( c \frac{\partial}{\partial \eta} \right) g(\eta) = g(\eta + c). \]

According to Eq. (51), these operators simply shift \(eta\) by the constant \(c\). Note that \(c\) could depend on other independent variables which are not \(\eta\). So if \(\eta\) is \(q\), \(c\) could depend on \(p\) and vice versa.

Active learning

Prove Eq. (51). Hint: use the Taylor series \(e^{x} = 1 + x + \frac{1}{2}x^2 + ...\) for the exponent.

The first operator acting on \(\textbf{x}(t)\) gives

\[\begin{split} \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right) \left( \begin{matrix} q(t) \\ p(t) \end{matrix} \right) = \left( \begin{matrix} q(t) \\ p(t) + F(t) \frac{\Delta t}{2} \end{matrix} \right) = \left( \begin{matrix} q(t) \\ p(t + \frac{\Delta t}{2}) \end{matrix} \right),\end{split}\]

where we have denoted \(F(q(t))\) as \(F(t)\) and the last step is due to Eq. (41).

With the second operator, we get

\[\begin{split} \exp \left( \Delta t \frac{p}{m} \frac{\partial }{\partial q} \right) \left( \begin{matrix} q(t) \\ p(t + \frac{\Delta t}{2}) \end{matrix} \right) = \left( \begin{matrix} q(t) + \frac{p(t + \frac{\Delta t}{2})}{m} \Delta t \\ p(t + \frac{\Delta t}{2}) \end{matrix} \right) = \left( \begin{matrix} q(t + \Delta t) \\ p(t + \frac{\Delta t}{2}) \end{matrix} \right), \end{split}\]

where we have used Eq. (42) in the last step.

Finally, the last operator shifts the momentum by half a time step again,

\[\begin{split} \exp \left( \frac{\Delta t}{2} F(q) \frac{\partial }{\partial p} \right) \left( \begin{matrix} q(t + \Delta t) \\ p(t + \frac{\Delta t}{2}) \end{matrix} \right) = \left( \begin{matrix} q(t + \Delta t) \\ p(t + \frac{\Delta t}{2}) + F(t + \Delta t) \frac{\Delta t}{2} \end{matrix} \right) = \left( \begin{matrix} q(t + \Delta t) \\ p(t + \Delta t)\\ \end{matrix} \right), \end{split}\]

where we have again denoted \( F(q(t + \Delta t)) = F(t + \Delta t) \) and used Eq. (43) in the final step.

If you have been following, you might have realized that this discretization of the propagator, given by Eq. (50), resulted in the velocity Verlet algorithm. In fact, we do not need to go through the math every time, we could have just read the three steps of propagation directly from Eq. (50)! Did I promise you a beautiful formalism or what?

It might seem like a little bit of an “overkill” to use this formalism to derive the velocity Verlet algorithm. You are right. But this formalism is a lot more powerful and can be used, for example, to derive efficient propagators for much more complicated situations, such as multiple time-stepping algorithms. They are required if you have a combination of very fast and very slow forces in the system, or for quantum simulations using path integrals (as some of you will see in the final project).

After understanding the propagation algorithm at the core of MD simulations, we discuss briefly all other components of the simulation. For simplicity, we will consider a collection of \(N\) identical particles of mass \(m\) in a cubic box of side length \(L\) and volume \(L^3\).

Sampling initial conditions

Before we integrate the equations of motion to obtain trajectories, we need to define the initial conditions. This can be done in several ways, the simplest is to sample randomly the initial positions of the particles inside a cubic simulations box of length \(L\). The initial momenta can be either set to zero, in that case, or sampled randomly themselves. A common practice is to sample them from the Maxwell-Boltzmann distribution.

The Maxwell-Boltzmann distribution is just the Boltzmann distribution for a gas of \(N\) non-interacting particles of mass \(m\). The Hamiltonian has only the kinetic energy contributions,

\[ \mathcal{H}(\textbf{x}) = \mathcal{H}(\textbf{p}_1,...,\textbf{p}_N) = \frac{1}{2m} \sum_{j=1}^N \textbf{p}_j^2 = \frac{1}{2m} \sum_{j=1}^N \left( {p_x^j}^2 + {p_y^j}^2 + {p_z^j}^2 \right), \]

and the phase space distribution function is

(52)\[ f(\textbf{x}) = f(\textbf{p}_1,...,\textbf{p}_N) = \left( \frac{\beta}{2 \pi m} \right) ^{\frac{3N}{2}} e^{- \frac{\beta}{2m} \sum_{j=1}^N \textbf{p}_j^2}. \]

Eq. (52) shows that the total probability density is just a multiplication of independent Gaussian probability densities for each momentum degree of freedom of each particle. Every Gaussian has zero mean and standard deviation \(\sigma = \sqrt{m/\beta}\).

Active learning

Show that the probability in Eq. (52) is normalized.

Thus, we can obtain initial momenta according to the Boltzmann distribution by sampling the value of each momentum component of each particle randomly from a normal distribution with the appropriate mean and variance.

Evaluating the forces

There are two common types of conservative forces that you encounter in MD simulations. The first is external potentials, which only depend on the position of the particle in the box, such as a harmonic trap of frequency \(\omega_0\),

\[ V(x,y,z) = \frac{1}{2} m \omega^2_0 (x^2 + y^2 + z^2). \]

The total potential energy is then a sum over the contributions of all particles,

(53)\[ V(\textbf{r}_1,...,\textbf{r}_N) = \frac{1}{2} m \omega_0^2 \sum_{j=1}^N (x_j^2 + y_j^2 + z_j^2). \]

Each particle feels a force due to the trap independently of the other particles based on its position vector,

\[ \textbf{F}_k = - \nabla_k \cdot V = - m \omega_0^2 \textbf{r}_k. \]

The second type is interaction potentials, such as the Lennard-Jones (LJ) potential,

\[ V_{LJ}(r) = 4 \varepsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^6 \right]. \]

This force acts between every pair of particles, so that for the entire system we get

(54)\[ V_{LJ}(\textbf{r}_1,...,\textbf{r}_N) = 4 \varepsilon \sum_{i=1}^{N-1} \sum_{j>i}^N \left[ \left( \frac{\sigma}{r_{ij}} \right)^{12} - \left( \frac{\sigma}{r_{ij}} \right)^6 \right], \]

where \(r_{ij} = | \textbf{r}_i - \textbf{r}_j |\) is the distance between particles \(i\) and \(j\).

Active learning

What is the meaning of the paramters \(\varepsilon\) and \(\sigma\) in the LJ potential? What is the value of \(r\) at the minimum of the well? Explain the two contributions to the potential energy, i.e., \(\sim r^{-12}\) and \(\sim -r^{-6}\).

Periodic boundary conditions

We have already discussed that solving the classical equations of motion for \(\sim 10^{23}\) particles is impossible and the MD simulations solve them for smaller systems, with \(10-10^6\) particles. Luckily, many properties already converge to their bulk values. This is largely thanks to the use of periodic boundary conditions.

The main problem with simulating small systems is their unrealistic surface area to volume ratio. This means that molecules on the surface feel different forces than molecules in the bulk of the material. If we had tried to simulate liquid water with a model of 10 or 100 water molecules in a cluster, the model would not be very accurate for most properties because of surface effects.

The solution is to take our cubic box and replicate it in space in all three directions. During the simulation, for every molecule in the simulation box, there is an identical copy, called an image, in each one of the replicated cells. The image moves exactly in the same manner as the original molecules in the center box. If a molecule leaves the box through one of its sides, its periodic image enters through the opposite direction. In this way, the density is conserved in the central box. We do not need to store the coordinates of all of the(infinitely many) images. It is sufficient to keep just the coordinates of the \(N\) particles in the main box and switch between images when one leaves it and the other enters.

Active learning

Write a short pseudo-code to implement periodic boundary conditions. Hint: you can use if statements to test if a molecule has left the box, then correct accordingly.

How large should be the main box so that the properties of the simulated system resemble the properties of a bulk liquid or solid? It depends on the interactions. If they are relatively short-ranged, then a small box may suffice. For the LJ potential, a box of side length \(L=6\sigma\) is usually sufficient. But for very long-range interaction, this is not the case, and more sophisticated methods are needed, of which we will not elaborate. In addition, this approach works when the fluctuations of the system in space are much smaller than the size of the box. So for the case of a gas-liquid phase transition, where the fluctuations are macroscopic (you can see the bubbles form!), this approach will not work. However, for systems far away from phase transitions and with short-range interactions, it is found to be highly successful.

Finally, while periodic boundary conditions are great, they seem to have created a new problem. Now that we have infinitely many images of every particle, how are we to calculate the interaction between all of them? The practice is to overcome this problem by defining a spherical cutoff for the interaction of short-ranged potentials. One way to do it is by implementing the minimum image convention. The minimum image convention states that for each particle \(i\), you calculate the interaction only with the closest image of particle \(j\). This is done similarly for testing for periodic boundary conditions, but instead of applying it to the position vector of each particle, you use the distance vector between two particles. The minimum image convention effectively introduces a spherical cutoff to the potential at a distance of half the box length.

Observables

In the previous chapter, we showed how to calculate thermodynamic properties using ensemble averages. In this chapter, we discussed how to use MD simulations to sample the microcanonical ensemble in particular. Now, the question is - how do we use the trajectories to evaluate ensemble averages, such as in Eq. (37)?

The numerical solution of the classical equations of motion generates a set of configurations at different time steps, denoted by \(\textbf{x}_j\), where \(t_j=j \Delta t\) and \(j=1,...,P\). For each sample, the desired thermodynamic property can be evaluated using expressions such as Eq. (9) for the kinetic energy or Eq. (53) for a harmonic external trap. These expression are called estimators, and we denote their values as \(a(\textbf{x}_j)\). Since the classical trajectories naturally sample the \(NVE\) ensemble, all configurations have the same energy and thus the same probability. The microcanonical expectation values are then obtained from a simple arithmetic mean of the estimators,

(55)\[ \langle a \rangle_E = \frac{1}{P} \sum_{j=1}^{P} a(\textbf{x}_j). \]

If we could generate MD simulations that naturally sample the canonical ensemble, we could evaluate expectation values also at constant temperature using Eq. (55). We will elaborate on this later.

The most common thermodynamic observables that are evaluated in molecular dynamics simulations are the kinetic, potential, and total energies. The instantaneous value of the kinetic energy is given by Eq. (9), or in terms of the momenta,

(56)\[ K(\textbf{p}_1,...,\textbf{p}_N) = \frac{1}{2} \sum_{j=1}^N \frac{\textbf{p}^2_j}{m_j}. \]

The potential energy is obtained from expressions such as Eq. (54) for the LJ potential or Eq. (53) for a harmonic trap. The total energy is of course the sum of the two contributions, given by

\[ \langle E \rangle = \langle K \rangle + \langle U \rangle = \langle \mathcal{H}(\textbf{r}_1,...,\textbf{r}_N,\textbf{p}_1,...,\textbf{p}_N) \rangle. \]

Other properties, such as the average pressure and temperature, can be obtained from the equipartition theorem which states that every degree of freedom that is quadratic in the Hamiltonian contributes \(\frac{1}{2} k_B T\) to the average total energy. Thus, for the temperature of \(N\) particles in three dimensions, we get

\[ \langle T \rangle = \frac{2 \langle K \rangle}{3Nk_B}. \]

Some properties, like the entropy or the (Gibbs or Helmholtz) free energy, can not be expressed as expectation values of some estimator in terms of the coordinates and momenta. Therefore, evaluating them is much more difficult. One of the final projects of this course will deal with methods to evaluate free energies.

Many structural properties are also often obtained from MD simulations, for example, the density is given by a histogram (normalized by the total number of particles) of the positions,

(57)\[ \rho(r) = N \langle \sum_{j=1}^N \delta(\textbf{r}-\textbf{x}_j) \rangle. \]

All of these relations hold both in the microcanonical and the canonical ensembles. The only difference is that the average should be taken with the proper phase space distribution function. After you will complete your first numerical project, we will learn about MC simulations, an alternative method to MD for sampling phase space distribution functions. We will use it to sample configurations from the canonical ensemble instead.

But one of the major advantages of MD simulations is that they do not provide arbitrary samples of the microcanonical probability distribution, but rather they give the dynamics in time! This means we can also go beyond just static thermodynamic properties at equilibrium, as described above, to obtain time-dependent properties.

Why is that important? A beautiful result in statistical mechanics is that transport properties, such as the conductance, viscosity, diffusion coefficients, and even reaction rates are determined by fluctuations of time correlation functions at equilibrium. This is called the fluctuation-dissipation theorem. It is remarkable because it states that how the system responds to a small perturbation taking it out of equilibrium, is determined (to first order) by fluctuations in time of some property, evaluated at equilibrium! You will learn more about it if you take the advanced nonequilibrium thermodynamics course. For our purposes, it is enough to know that equilibrium time correlation functions are defined as,

(58)\[ C_{AB}(\tau) = \langle a\left( \textbf{x}(t=0) \right) b\left( \textbf{x}(t=\tau) \right) \rangle, \]

where \(a\) and \(b\) are the estimators of the thermodynamic quantities \(A\) and \(B\), respectively. Note that Eq. (58) is valid for an observable that has zero mean and unit variance, but can be written also more generally.

Using the fluctuation-dissipation theorem, it is then possible to obtain, for example, the diffusion coefficient of a particle from its velocity autocorrelation function,

(59)\[ D = \frac{1}{3} \int_0^{\infty} C_{vv}(\tau) \mathrm{d} \tau. \]

Similar expressions exist for the other transport properties mentioned above. We will discuss time correlation functions in more detail in the Monte Carlo section, after the first numerical project.