Basic statistical mechanics

After deriving the equations of motion for a system of \(N\) classical particles, we would next like to solve for their trajectories and use them to evaluate thermodynamic properties of a macroscopic system, such as the temperature, energy, pressure, etc. If we can connect an observable macroscopic property \(A\) to some function of the microstate of the system, \(a(\textbf{x})\), we might associate the measured value of \(A\) with a time average, denoted by \(\bar{a}_{\tau}\) and defined as

(22)\[ A = \bar{a}_{\tau} = \frac{1}{\tau} \int_0^{\tau} a(\textbf{x}(t)) \mathrm{d} t, \]

where \(\tau\) is some long time scale compared to the molecular or atomic motion.

But now we find ourselves at an impasse. It is entirely clear what rules determine the evolution of the system in time, but solving the equations analytically is impossible. A numerical solution is similarly unfeasible for \(\sim 10^{23}\) particles, as we discussed in the previous chapter. The core idea of statistical mechanics is to provide an alternative approach to obtain macroscopic thermodynamic observables.

Ensembles

It was J.W. Gibbs that realized that thermodynamic macroscopic properties are not very sensitive to the details of the specific microscopic state of the system. Consider for example the temperature. You have learned in a preceding statistical thermodynamics course that each momentum degree of freedom contributes \(\frac{1}{2} k_B T \) to the average thermal kinetic energy (the equipartition theorem). Thus, the temperature can be obtained from the average kinetic energy of the system. But there are many ways to choose the momenta of the particles such that the average kinetic energy is the same. In other words, the temperature is insensitive to the specific microstate of the system, as long as it gives the same average kinetic energy.

Following this reasoning, and to make progress in connecting microscopic properties with thermodynamic observables, Gibbs introduced the concept of an ensemble. An ensemble is a collection of \(\mathcal{N}\) systems that obey the same microscopic interactions and are characterized by a common set of three thermodynamic properties, such as the number of particles \(N\), the volume \(V\), the total energy \(E\), the temperature \(T\), the pressure \(P\) or the chemical potential \(\mu\). Each system in the ensemble evolves in time from different initial conditions and is characterized by a unique microscopic state defined by its path in phase space, \(\textbf{x}(t)\), defined in Eq. (7). The entire ensemble can then be represented as a “cloud” of points in phase space.

We define the function \(f(\textbf{x},t)\), called the phase space distribution function, as the probability density of observing a system in the ensemble in a specific microstate at time \(t\). Since it is a probability density, it has to fulfill the properties:

(23)\[\begin{align} \int f(\textbf{x},t) \, \mathrm{d} \textbf{x} &= 1, & f(\textbf{x},t) & \ge 0. \end{align}\]

In other words, \(f(\textbf{x},t) \, \mathrm{d} \textbf{x}\) represents the probability that a system from the ensemble is in the volume element between \(\textbf{x}\) and \(\mathrm{d} \textbf{x}\) at time \(t\) (or, alternatively, the fraction of systems therein).

Gibbs then postulated that instead of using the time average of Eq. (22), macroscopic properties can be obtained by freezing the ensemble at some time instance and taking the average of the desired observable over all the systems, called an ensemble average. More formally, the ensemble average, denoted as \(\langle a \rangle_f\), is defined as

(24)\[ A(t) = \langle a \rangle_f = \int f(\textbf{x},t) a(\textbf{x}) \, \mathrm{d} \textbf{x}. \]

Note the difference between Eqs. (22) and (24), the former is an integral over time while the latter is an integral over phase space. This distinction is important enough that we have used a bar to denote the time average but brackets to denote the ensemble average. Before moving forward, and discussing what we can do with ensemble averages, we will briefly investigate an important property of the phase space distribution function, described by the Liouville equation.

Liouville equation

For clarity, we will prove the Liouville equation for a single particle, but it can be generalized to \(N\) particles. In this case, \( f(\textbf{x},t) = f(q,p,t) \). The theorem states that the phase space distribution function is conserved along the trajectory, i.e., that

(25)\[ \frac{\mathrm{d} f}{\mathrm{d} t} = \frac{\partial f}{\partial t} + \frac{\partial f }{\partial q} \dot{q} + \frac{\partial f }{\partial p} \dot{p} = 0. \]

We now consider some volume in phase space \(\Omega\) and define the number of systems in the ensemble whose microstate is in this volume as

\[ \mathcal{N}_{\Omega} = \mathcal{N} \int_{\Omega} f(\textbf{x},t) \, \mathrm{d} \textbf{x} = \mathcal{N} \int_{\Omega} f(q,p,t) \, \mathrm{d} q \, \mathrm{d} p. \]

The rate of change of the number of systems in \(\Omega\) is given by

(26)\[ \frac{\mathrm{d} \mathcal{N}_{\Omega}}{\mathrm{d} t} = \mathcal{N} \int_{\Omega} \frac{\partial f}{\partial t} \, \mathrm{d} q \, \mathrm{d} p. \]

But, equivalently, the flux of molecules leaving the volume through a surface element \(\mathrm{d} \mathcal{A}\) can be integrated to give the rate,

\[ \frac{\mathrm{d} \mathcal{N}_{\Omega}}{\mathrm{d} t} = - \mathcal{N} \int_{\mathcal{A}} f \dot{\textbf{x}} \cdot \textbf{n} \, \mathrm{d} \mathcal{A}, \]

where \(\textbf{n}\) is a vector normal to the surface element, and \(\dot{\textbf{x}}=[\dot{q},\dot{p}]\).

We can now use the divergence theorem to get

(27)\[ \frac{\mathrm{d} \mathcal{N}_{\Omega}}{\mathrm{d} t} = - \mathcal{N} \int_{\Omega} \nabla \cdot ( f \dot{\textbf{x}} ) \, \mathrm{d} \Omega = - \mathcal{N} \int_{\Omega} \left[ \frac{\partial (f \dot{q})}{\partial q} + \frac{\partial (f \dot{p})}{\partial p} \right] \, \mathrm{d} q \, \mathrm{d} p. \]

Eq.(27) and Eq. (26) should be equal for every volume element \(\mathrm{d} q \, \mathrm{d} p \), giving

(28)\[ \frac{\partial f}{\partial t} + \left[ \frac{\partial (f \dot{q})}{\partial q} + \frac{\partial (f \dot{p})}{\partial p} \right] = 0. \]

Active learning

Show that the term in square brackets can be written as

\[ \frac{\partial (f \dot{q})}{\partial q} + \frac{\partial (f \dot{p})}{\partial p} = \frac{\partial f }{\partial q} \dot{q} + \frac{\partial f }{\partial p} \dot{p}. \]

Hint: Use Hamilton’s equations.

Based on the solution of the above exercise, Eq. (28) can be written as

\[ \frac{\partial f}{\partial t} + \frac{\partial f }{\partial q} \dot{q} + \frac{\partial f }{\partial p} \dot{p} = 0, \]

which completes the proof of the Liouville equation.

Using the definition of the Poisson bracket between \(f\) and the Hamiltonian, Eq. (16), we can write

\[ \frac{\partial f }{\partial q} \dot{q} + \frac{\partial f }{\partial p} \dot{p} = \frac{\partial f }{\partial q} \frac{\partial \mathcal{H}}{\partial p} - \frac{\partial f }{\partial p} \frac{\partial \mathcal{H}}{\partial q} = \{ f, \mathcal{H} \}, \]

which leads to another useful form of the Liouville equation,

(30)\[ \frac{\partial f}{\partial t} + \{ f, \mathcal{H} \} = 0. \]

Phase space incompressibility

An important consequence of the Liouville equation is that of phase space incompressibility. We denote an initial point in phase space as \(\textbf{x}_0 = [q_0,p_0]\). The point in phase space that results from the dynamics starting from \(\textbf{x}(0)\) after time \(t\) is \(\textbf{x}_t = [q_t,p_t]\). In the previous chapter, we learned that the trajectory is uniquely defined by the initial conditions, meaning that \(\textbf{x}_t\) can be written as a function of the initial conditions,

\[ \textbf{x}_t = g(\textbf{x}_0). \]

Thus, we can consider the evolution of the trajectory as a coordinate transformation. Assuming we know the function \(g\), we need to evaluate how the volume elements change when we transform from \(\mathrm{d} \textbf{x}_0\) to \(\mathrm{d} \textbf{x}_t\). J. Liouville proved that

(31)\[ \mathrm{d} \textbf{x}_0 = \mathrm{d} \textbf{x}_t, \]

i.e., that the phase space volume element is conserved along the trajectory. Its shape might change, but not the volume.

Why is it important? Because, as we said, the fraction of systems in a volume element between \(\textbf{x}\) and \(\mathrm{d} \textbf{x}\) at time \(t\) is \(f(\textbf{x},t) \, \mathrm{d} \textbf{x}\). Since the phase space distribution is conserved along the trajectory, if the volume element is conserved too, so is the fraction of systems in it. This justifies the claim that ensemble averages can be performed at any time instance.

To prove Eq. (31), we remember that two volume elements of a coordinate transformation are connected through

(32)\[ \mathrm{d} \textbf{x}_t = \mathcal{J}(\textbf{x}_t; \textbf{x}_0) \mathrm{d} \textbf{x}_0, \]

where the Jacobian \(\mathcal{J}\) is defined as

(33)\[\begin{align} \mathcal{J}(\textbf{x}_t; \textbf{x}_0) &= \det{J}, & J_{kl} &= \frac{\partial x_t^k}{\partial x_0^l}. \end{align}\]

Then, we use a property of the determinant

\[ \det{J} = e^{\mathrm{Tr} \left[ \ln{J} \right]}, \]

where \(\mathrm{Tr}[J] = \sum_k J_{kk} \). Taking the derivative, we get

\[ \frac{\mathrm{d}}{\mathrm{d}t} \mathcal{J}(\textbf{x}_t;\textbf{x}_0) = \frac{\mathrm{d}}{\mathrm{d}t} e^{\mathrm{Tr} \left[ \ln{J} \right] } = e^{\mathrm{Tr} \left[ \ln{J} \right]} \mathrm{Tr} [\frac{\mathrm{d J}}{\mathrm{d}t} J^{-1}] = \mathcal{J}(\textbf{x}_t;\textbf{x}_0) \sum_{k,l} \left( \frac{\mathrm{d}J}{\mathrm{d}t} \right)_{kl} \left( J^{-1} \right)_{lk}. \]

The elements of the matrices \(\frac{\mathrm{d}J}{\mathrm{d}t}\) and \(J^{-1}\) are given by

(34)\[\begin{align} \left( \frac{\mathrm{d}J}{\mathrm{d}t} \right)_{kl} &= \frac{\partial \dot{x}_t^k}{\partial x_0^l} & \left( J^{-1} \right)_{lk} = \frac{\partial x_0^l}{\partial x_t^k}, \end{align}\]

so multiplying them and summing over \(l\) is just the chain rule for \(\frac{\partial \dot{x}_t^k}{\partial x_t^k}\), leading to

\[ \frac{\mathrm{d}}{\mathrm{d}t} \mathcal{J}(\textbf{x}_t;\textbf{x}_0) = \mathcal{J}(\textbf{x}_t;\textbf{x}_0) \sum_k \frac{\partial \dot{x}_t^k}{\partial x_t^k}. \]

For a single particle, the sum over \(k\) gives Eq. (29), which we already showed was equal to zero. Alternatively, for every coordinate \(q_t\) in \(\textbf{x}_t\) the term in the sum gives \(\frac{\partial^2 \mathcal{H}}{\partial q_t \partial p_t}\) while for every momentum \(p_t\) we get \(- \frac{\partial^2 \mathcal{H}}{\partial p_t \partial q_t}\) which cancel out to give zero. This concludes the proof that

\[\frac{\mathrm{d}}{\mathrm{d} t} \mathcal{J}(\textbf{x}_t;\textbf{x}_0) = 0, \]

which means that since \(\mathcal{J}=1\) initially by definition, it will stay conserved along the trajectory, thus proving Eq. (31).

If you didn’t follow the derivation, do not worry! You will now demonstrate phase space incompressibility for a simple example.

Active learning

Show that the Eq. (31) is correct for the case of a single particle of mass \(m\) in a harmonic trap of frequency \(w_0\). In this case, \(\textbf{x} = [q,p]\). You can use the results of the previous chapter,

\[ q(t) = q_0 \cos{\omega_0 t} + \frac{p_0}{m \omega_0} \sin{\omega_0 t} \]
\[ p(t) = p_0 \cos{\omega_0 t} - m \omega_0 q_0 \sin{\omega_0 t} \]

Hint: Compute the Jacobian of the coordinate transformation and show that it is unity.

Equilibrium

Now we go back to the connection we established between the Poisson bracket of the phase space distribution function with the Hamiltonian, through Eq. (30). We define thermodynamic equilibrium as the case when the distribution \(f(\textbf{x})\) does not depend explicitly on time. Eq.(30) then reads

\[ \{ f, \mathcal{H} \} = 0. \]

This equation can be solved for \(f(\textbf{x})\) as a function of the Hamiltonian, which we formally denote as

\[ f(\textbf{x}) = \frac{1}{\mathcal{Z}} \mathcal{F}(\mathcal{H}(\textbf{x})). \]

The averages of macroscopic observables will then also be time-independent (you might remember this definition from earlier classes for thermodynamic equilibrium), given by

(35)\[ A = \langle a \rangle_f = \frac{1}{\mathcal{Z}} \int \mathcal{F}(\mathcal{H}(\textbf{x})) \, a(\textbf{x}) \, \mathrm{d} \textbf{x}. \]

The denominator \(\mathcal{Z}\) is called the partition function, which is used to normalize the function \(\mathcal{F}\), defined as

(36)\[ \mathcal{Z} = \int \mathcal{F}(\mathcal{H}(\textbf{x})) \, \mathrm{d} \textbf{x}. \]

To proceed and write specific expressions for \(\mathcal{F}\), we invoke a second postulate, called the hypothesis of equal a priori probabilities. In simple terms, it says that for an ensemble of isolated systems, each characterized by the same number of particles, volume, and total energy, all microstates have equal probability. That is, the phase space distribution is a constant when the energy is between \(E\) and \(E + \delta E\) and zero otherwise.

We saw previously that the classical equations of motion for an isolated system under the influence of conservative forces ensure that the total energy is constant. An ensemble composed of such isolated systems is called the microcanonical ensemble. Since they are characterized by a constant number of particles, volume, and total energy, the ensemble is often denoted by the letters NVE.

According to our second postulate, the equilibrium distribution at the NVE ensemble can be written formally as

\[ \mathcal{F}(\mathcal{H}(\textbf{x})) \propto \delta (\mathcal{H}(\textbf{x}) - E), \]

and expectation values are obtained through

(37)\[ A = \langle a \rangle_E = \frac{ \int \delta (\mathcal{H}(\textbf{x}) - E) \, a(\textbf{x}) \, \mathrm{d} \textbf{x}} {\int \delta (\mathcal{H}(\textbf{x}) - E) \, \mathrm{d} \textbf{x}} . \]

You have derived in previous courses \(\mathcal{F}(\mathcal{H}(\textbf{x}))\) for other emsembles. We will not repeat this derivation here, but just mention that all of them can be derived from our two postulates by considering the entire ensemble as an isolated system. For an ensemble of systems with a fixed number of particles, volume, and temperature, called the canonical ensemble, \(\mathcal{F}\) is the famous Boltzmann distribution,

\[ \mathcal{F}(\mathcal{H}(\textbf{x})) \propto e^{-\beta \mathcal{H} (\textbf{x})}. \]

and the observables are given by

(38)\[ A = \langle a \rangle_{\beta} = \frac{ \int e^{-\beta \mathcal{H} (\textbf{x})} \, a(\textbf{x}) \, \mathrm{d} \textbf{x}} {\int e^{-\beta \mathcal{H} (\textbf{x})} \, \mathrm{d} \textbf{x}} . \]

The canonical ensemble is usually denoted by the letters NVT.