T-5

Goal: So far we have discussed the equilibrium properties of disordered systems, that are encoded in their partition function and free energy. In this set of problems, we characterize the energy landscape of the spherical $p$ -spin, by determining the number of its stationary points.

Key concepts: gradient descent, oout-of-equilibrium dynamics, metastable states, Hessian matrices, random matrix theory, Langevin dynamics,?

Dynamics, optimization, trapping local minima

Energy landscapes. Consider the spherical $p$ -spin model discussed in the Problems 2 and 3; The function $E({\vec {\sigma }})$ is an energy landscape: it is a random function defined on configuration space, which is the space all configurations ${\vec {\sigma }}$ belong to. This landscape has its global minimum(a) at the ground state configuration(s): the energy density of the ground state(s) can be obtained studying the partition function $Z$ in the limit $\beta \to \infty$ . Besides the ground state(s), the energy landscape can have other local minima; the fully-connected models of glasses are characterized by the fact that there are plenty of these local minima, see SKETCH.

Gradient descent and stationary points. Suppose that we are interested in finding the configurations of minimal energy of some model with energy landscape $E({\vec {\sigma }})$ , starting from an arbitrary initial configuration ${\vec {\sigma }}_{0}$ : we can think about a dynamics in which we progressively update the configuration of the system moving towards lower and lower values of the energy, hoping to eventually converge to the ground state(s). The simplest dynamics of this sort is gradient descent, ${\frac {d{\vec {\sigma }}(t)}{dt}}=-\nabla _{\perp }E({\vec {\sigma }})$
where the configuration changes in time moving in the direction of the gradient of the energy landscape restricted to the sphere, $\nabla _{\perp }E({\vec {\sigma }})$ . The dynamics stops when it reaches a stationary point , i.e. a configuration where $\nabla _{\perp }E({\vec {\sigma }})=0$ . If the landscape has a simple, convex structure, this will be the ground state one is seeking for; if the energy landscape is very non-convex like in glasses, the end point of this algorithm will be a local minimum at energies much higher than the ground state. SKETCH

The landscape’s complexity. To understand the structure of the energy landscape and to guess where gradient descent dynamics (or its variation) are expected to converge, it is useful to characterize the distribution of the stationary points, i.e. the number ${\mathcal {N}}(\epsilon )$ of such configuration having a given energy density $\epsilon$ . In fully-connected models of glasses, this quantity has an exponential scaling, ${\mathcal {N}}(\epsilon )\sim {\text{exp}}\left(N\Sigma (\epsilon )\right)$ , where $\Sigma (\epsilon )$ is the complexity of the landscape. ^[1]

MAYBE REMOVE: Noise, Langevin dynamics and activation. How can one modify the dynamics to escape from a given local minimum and explore other regions of the energy landscape? One possibility is to add some stochasticity (or noise), i.e. some random terms that kick the systems in random directions in configuration space, towards which maybe the energy increases instead of decreasing: ${\frac {d{\vec {\sigma }}(t)}{dt}}=-\nabla E({\vec {\sigma }})+{\vec {\eta }}(t)$
The simplest choice is to choose ${\vec {\eta }}(t)$ to be a Gaussian vector at each time $t$ , uncorrelated from the vectors at other times $t'\neq t$ , with zero average and some constant variance. This variance, which measures the strength of the noisy kicks, can be interpreted as a temperature: the resulting dynamics is known as Langevin dynamics .

Problem 5.1: the Kac-Rice method and the complexity

In this Problem, we set up the computation of the annealed complexity of the spherical $p$ -spin model, which is defined by

\Sigma _{\text{a}}(\epsilon )=\lim _{N\to \infty }{\frac {1}{N}}\log {\overline {{\mathcal {N}}(\epsilon )}},\quad \quad {\mathcal {N}}(\epsilon )=\left\{{\text{number stat. points of energy density }}\epsilon \right\}

The Kac-Rice formula. Consider first a random function of one variable $f(x)$ defined on an interval $[a,b]$ , and let ${\mathcal {N}}$ be the number of points $x$ such that $f(x)=0$ . Justify why
${\overline {\mathcal {N}}}=\int _{a}^{b}dx\,{\overline {\delta (f(x))|f'(x)|}}$

In particular, why is the derivative of the function appearing in this formula? Consider now the number of stationary points ${\mathcal {N}}(\epsilon )$ of the $p$ -spin energy landscape, which satisfy $\nabla _{\perp }E({\vec {\sigma }})=0$ . Justify why the generalization of the formula above gives

${\overline {{\mathcal {N}}(\epsilon )}}=\int _{S_{N}}d{\vec {\sigma }}\,\;{\overline {|{\text{det}}\nabla _{\perp }^{2}E({\vec {\sigma }})|\,\,\delta (\nabla _{\perp }E({\vec {\sigma }})=0)\,\,\delta (E({\vec {\sigma }})-N\epsilon )}}$

where $\nabla _{\perp }^{2}E({\vec {\sigma }})$ is the Hessian matrix of the function $E({\vec {\sigma }})$ restricted to the sphere.^[2]

Statistical rotational invariance. Recall the expression of the correlations of the energy landscape of the $p$ -spin computed in Problem 2.1: in which sense the correlation function is rotationally invariant? Justify why rotational invariance implies that
${\overline {{\mathcal {N}}(\epsilon )}}=(2\pi e)^{\frac {N}{2}}\,\;{\overline {|{\text{det}}\nabla _{\perp }^{2}E({\vec {1}})|\,\,\delta (\nabla _{\perp }E({\vec {1}})=0)\,\,\delta (E({\vec {1}})-N\epsilon )}}$

where ${\vec {1}}=(1,1,1,\cdots ,1)$ . Where does the prefactor arise from?

Gaussianity and correlations. Determine the distribution of the quantity $E({\vec {1}})$ . Show that the components of the vector $\nabla E({\vec {1}})$ are also Gaussian random variables with zero mean and covariances
${\overline {(\nabla E)_{i}\,(\nabla E)_{j}}}={\frac {N}{2}}p\,\delta _{ij}$

The quantity $\nabla _{\perp }E({\vec {1}})$ can be shown to be uncorrelated to $E({\vec {1}}),\nabla _{\perp }^{2}E({\vec {1}})$ . Moreover, in the notation of ^[2], $\nabla _{\perp }^{2}E({\vec {1}})={\hat {\Pi }}({\vec {1}})\,\nabla ^{2}E({\vec {1}})\,{\hat {\Pi }}({\vec {1}})-pE({\vec {1}})\mathbb {I}$ .
Using this, show that

${\overline {{\mathcal {N}}(\epsilon )}}=(2\pi e)^{\frac {N}{2}}\,{\frac {1}{(\pi Np)^{\frac {N}{2}}}}\;{\overline {|{\text{det}}\nabla _{\perp }^{2}E({\vec {\sigma }})|\,\,\delta (E({\vec {\sigma }})-N\epsilon )}}$

It remains to compute the expectation value of the determinant: this is the subject of the next problem.

Problem 5.2: the Hessian and random matrix theory

AGGIUSTA NORMALIZZAZIONE N-1 In this problem, we determine the average of the determinant of the Hessian matrix and conclude the calculation of the annealed complexity. The entries of the matrix $\nabla _{\perp }^{2}E({\vec {\sigma }})$ are also Gaussian variables. Computing their correlation, one finds that the matrix can be written as

$[\nabla _{\perp }^{2}E({\vec {1}})]_{\alpha \beta }=N\left(G_{\alpha \beta }-p\epsilon \,\delta _{\alpha \beta }\right),$

where the matrix $G$ has random entries with zero average and correlations

${\overline {{G}_{\alpha \beta }\,{G}_{\gamma \delta }}}={\frac {p(p-1)}{2N}}\left(\delta _{\alpha \gamma }\delta _{\beta \delta }+\delta _{\alpha \delta }\delta _{\beta \gamma }\right)$

. Show that the matrix $G$ is a GEO matrix, i.e. a matrix taken from the Gaussian Orthogonal Ensemble, meaning that its distribution is
$P(G)={\frac {1}{Z_{N}}}e^{-{\frac {N}{4\sigma ^{2}}}{\text{Tr}}G^{2}},\quad \quad \sigma ^{2}={\frac {p(p-1)}{2}}$

What’s is the value of $\sigma ^{2}$ ?

Let $\lambda _{\alpha }$ be the eigenvalues of the matrix $G$ . Show that the following identity holds:
${\overline {|{\text{det}}\nabla _{\perp }^{2}E({\vec {\sigma }})|\,\,\delta (E({\vec {\sigma }})-N\epsilon )}}=N^{N}{\overline {{\text{exp}}\left[N\left(\int d\lambda \,\rho _{N}(\lambda )\,\log |\lambda -p\epsilon |\right)\right]}},\quad \quad \rho _{N}(\lambda )={\frac {1}{N}}\sum _{\alpha =1}^{N}\delta (\lambda -\lambda _{\alpha })$

where $\rho _{N}(\lambda )$ is the empirical eigenvalue density.

. Concentration: the empirical density has a distribution of the large deviation form (see TD1) with speed DEFINE SPEED $N^{2}$ , meaning that $P[\rho ]=e^{-N^{2}\,g[\rho ]}$ where now $g$ is a functional (a function of a function). Using a saddle point argument, show that this implies
${\overline {{\text{exp}}\left[N\left(\int d\lambda \,\rho _{N}(\lambda )\,\log |\lambda -p\epsilon |\right)\right]}}={\text{exp}}\left[N\left(\int d\lambda \,\rho _{\text{ty}}(\lambda )\,\log |\lambda -p\epsilon |\right)+o(N)\right]$

where $\rho _{\text{ty}}(\lambda )$ is the typical value of the eigenvalue density. This quantity is self averaging, and for a GOE equals to

$\rho _{N}={\overline {\rho _{N}}}=\rho _{\text{ty}}(\lambda )={\sqrt {\lambda ^{2}}}$

- check numerically - show that the resulting complexity is - plot this quantity, and determine numerically where it vanishes. Why the corresponding energy density must coincide with (average energy for beta to infty)

. </math>

Show that for the spherical $p$ -spin it holds $E({\vec {\sigma }})=p\,\nabla E({\vec {\sigma }})\cdot {\vec {\sigma }}$ : thus the projection of the gradient vector on the radial direction is the energy of the system. Since $\nabla _{\perp }E({\vec {\sigma }})$ is orthogonal to the radial direction, it is a vector that is uncorrelated to $E({\vec {\sigma }})$ .