T-6: Difference between revisions

From Disordered Systems Wiki
Jump to navigation Jump to search
No edit summary
 
(50 intermediate revisions by the same user not shown)
Line 1: Line 1:
<strong>Goal: </strong>  
<strong>Goal: </strong>  
So far we have discussed the equilibrium properties of disordered systems, that are encoded in their partition function/free energy. When a system (following Langevin, Monte Carlo dynamics) equilibrates at sufficiently large times, its long-time properties are captured by these equilibrium calculations. In glassy systems the equilibration timescales are extremely large: for very large timescales the system  does not visit equilibrium configurations, but rather metastable states. In this set of problems, we characterize the energy landscape of the spherical <math>p</math>-spin by studying its metastable states (local minima).
Complete the characterisation of the energy landscape of the spherical <math>p</math>-spin.
<br>
<br>
<strong>Techniques: </strong> conditional probabilities, saddle point, random matrix theory.
<strong>Techniques: </strong> saddle point, random matrix theory.
<br>
<br>




== Dynamics, optimization, trapping local minima ==
[[File:Landscapes-GDD.png|thumb|right|x200px|Convex and rugged energy landscapes.]]
<ul>
<li> '''Rugged landscapes.''' Consider the spherical <math>p</math>-spin model: <math>E(\vec{\sigma})</math> is an <ins> energy landscape </ins>. It is a random function on configuration space (the surface <math> \mathcal{S}_N </math> of the sphere). This landscape has its global minimum(a) at the ground state configuration(s): the energy density of the ground state(s) can be obtained studying the partition function <math> Z </math> in the limit <math> \beta \to \infty </math>. Besides the ground state(s), the energy landscape can have other local minima; fully-connected models of glasses are characterized by the fact that there are plenty of these local minima: the energy landscape is <ins> rugged</ins>, see the sketch.
</li>
<br>
<li> '''Optimization by gradient descent.''' Suppose that we are interested in finding the configurations of minimal energy, starting from an arbitrary configuration <math>\vec{\sigma}_0</math>: we can implement a dynamics in which we progressively update the configuration moving towards lower and lower values of the energy, hoping to eventually converge to the ground state(s). The simplest dynamics of this sort is <ins>gradient descent</ins>,
<center> <math>
\frac{d \vec{\sigma}(t)}{dt}=- \nabla_{\perp} E(\vec{\sigma})
</math> </center>
where <math>\nabla_{\perp} E(\vec{\sigma})</math> is the gradient of the landscape restricted to the sphere. The dynamics stops when it reaches a  <ins> stationary point </ins>, a configuration where <math>  \nabla_\perp E(\vec{\sigma})=0</math>. If the landscape has a convex structure, this will be the ground state; if the energy landscape is very non-convex like in glasses, the end point of this algorithm will be a local minimum at energies much higher than the ground state (see sketch).
</li>
<br>
<li> '''Stationary points and complexity.''' To guess where gradient descent dynamics (or <ins> Langevin dynamics </ins>) are expected to converge, it is useful to understand the distribution of the stationary points, i.e. the number <math> \mathcal{N}(\epsilon)</math> of such configuration having a given energy density <math> \epsilon </math>. In fully-connected models, this quantity has an exponential scaling, <math> \mathcal{N}(\epsilon) \sim \text{exp}\left(N \Sigma(\epsilon) \right)</math>, where  <math>  \Sigma(\epsilon)</math> is the landscape’s <ins>complexity</ins>. <sup>[[#Notes|[*] ]]</sup>. Stationary points can be stable (local minima), or unstable (saddles or local maxima): their stability is encoded in the spectrum of the <ins> Hessian matrix </ins> <math>\nabla_{\perp}^2 E(\vec{\sigma})</math>: when all the eigenvalues of the Hessian are positive, the point is a local minimum (and a saddle otherwise).
</li>
</ul>
<br>
<div style="font-size:89%">
: <small>[*]</small> - This quantity looks similar to the entropy <math> S(\epsilon) </math> we computed for the REM in Problem 1. However, while the entropy counts all configurations at a given energy density, the complexity <math> \Sigma(\epsilon) </math> accounts only for the stationary points.
</div>
<br>


== Problems ==
== Problems ==
In these problems, we discuss the computation of the annealed complexity of the spherical <math>p</math>-spin model, which is defined by
<center> <math>
\Sigma_{\text{a}}(\epsilon)= \lim_{N \to \infty}\frac{1}{N}\log \overline{\mathcal{N}(\epsilon)} , \quad \quad \mathcal{N}(\epsilon)= \left\{ \text{number stat. points of energy density }  \epsilon\right\}
</math> </center>
=== Problem 5.1: the Kac-Rice formula and the complexity ===


=== Problem 6: the Hessian at the stationary points, and random matrix theory ===


 
This is a continuation of problem 5. To get the complexity of the spherical <math>p</math>-spin, it remains to compute the expectation value of the determinant of the Hessian matrix: this is the goal of this problem. We will do this exploiting results from random matrix theory discussion in the <code>Tutorial and Exercise 4 </code>.  
<ol>
<li> <em> The Kac-Rice formula.</em> Consider first a random function of one variable <math> f(x)</math> defined on an interval <math> [a,b]</math>, and let <math> \mathcal{N}</math> be the number of points <math> x </math> such that <math> f(x)=0</math>. Justify why
<center>
<math>
\overline{\mathcal{N}}= \int_a^b dx \,p_0(x) , \quad \quad p_0(x)=\overline{\delta(f(x)) |f'(x)|}
</math>
</center>
where <math>  p_0(x) </math> is the probability density that <math> x </math> is a zero of the function.
In particular, why is the derivative of the function appearing in this formula? Consider now the number of stationary points <math> \mathcal{N}(\epsilon)</math> of the <math>p</math>-spin energy landscape, which satisfy <math> \nabla_\perp E(\vec{\sigma})=0</math>. Justify why the generalization of the formula above gives
<center>
<math>
\overline{\mathcal{N}(\epsilon)}= \int_{\mathcal{S}_N} d \vec{\sigma} \,p_{\epsilon}(\vec{\sigma})  , \quad \quad p_{\epsilon}(\vec{\sigma})=\overline{|\text{det} \nabla_\perp^2 E (\vec{\sigma})|\,\, \delta(\nabla_\perp E(\vec{\sigma})=0) \, \,\delta(E(\vec{\sigma})- N \epsilon)}
</math>
</center>
where <math> p_{\epsilon}(\vec{\sigma})</math> is the probability density that <math> \vec \sigma</math> is a stationary point of energy density <math> \epsilon </math>, and <math> \nabla_\perp^2 E (\vec{\sigma}) </math> is the Hessian matrix of the function  <math> E (\vec{\sigma}) </math> restricted to the sphere.
</li>
</ol>
<br>
 
<ol start="2">
<li><em> Statistical rotational invariance.</em> Recall the expression of the correlations of the energy landscape of the <math>p</math>-spin computed in Problem 3.1: in which sense the correlation function is rotationally invariant? Justify why rotational invariance implies that
<center>
<math>
\overline{\mathcal{N}(\epsilon)}= (2 \pi e)^{\frac{N}{2}} \, p_{\epsilon}(\vec{1})
</math>
</center>
where <math> \vec{1}=(1,1,1, \cdots, 1) </math> is one fixed vector belonging to the surface of the sphere. Where does the prefactor arise from?
</li>
</ol>
<br>
 
<ol start="3">
<li><em> Gaussianity and correlations.</em>
 
<ul>
<li> Determine the distribution of the quantity <math> E (\vec{1})</math>. </li>
<li> The entries of <math>\nabla_\perp E (\vec{1}), \nabla^2_\perp E (\vec{1})</math> are Gaussian variables. One can show that the <math> N-1 </math> components of <math> \nabla_\perp E (\vec{1})</math> are uncorrelated to <math> E (\vec{1}), \nabla^2_\perp E (\vec{1})</math>; they have zero mean and covariances
<math>
\overline{(\nabla_\perp E)_\alpha  \, (\nabla_\perp E)_\beta}=  p \, \delta_{\alpha \beta}+O\left(\frac{1}{N} \right).
</math>
Compute the probability density that <math> \nabla_\perp E (\vec{1})=0</math>. </li>
<li> The <math>(N-1)\times (N-1) </math> matrix <math> \nabla_\perp^2 E (\vec{\sigma}) </math> conditioned to the fact that <math> E(\vec 1)=N \epsilon </math> can be written as
<center>
<math>
[\nabla_\perp^2 E(\vec{1})]_{\alpha \beta}=  M_{\alpha \beta}- p  \epsilon\, \delta_{\alpha \beta},
</math>
</center>
where the matrix <math> M </math> has random entries with zero average and correlations
<math>
\overline{{M}_{\alpha \beta} \, {M}_{\gamma \delta}}= \frac{p (p-1)}{ N} \left( \delta_{\alpha \gamma} \delta_{\beta \delta}+ \delta_{\alpha \delta} \delta_{\beta \gamma}\right)
</math>
Combining this with the results above, show that
<center>
<math>
\overline{\mathcal{N}(\epsilon)}= (2 \pi e)^{\frac{N}{2}} \,\frac{1}{(2 \pi \, p)^{\frac{N-1}{2}}}\; \sqrt{\frac{N}{2 \pi}} e^{-\frac{N \epsilon^2}{2}}\;\overline{|\text{det} \left( M- p  \epsilon \mathbb{I} \right)|}
</math>
</center>
</ul>
</li>
 
</ol>
<br>
 
=== Problem 5.2: the Hessian and random matrix theory ===
 
To get the complexity, it remains to compute the expectation value of the determinant of the Hessian matrix: this is the goal of this problem. We will do this exploiting results from random matrix theory.




<ol>
<ol>
<li> <em> Gaussian Random matrices. </em> Show that the matrix <math> M </math> is a GOE matrix, i.e. a matrix taken from the Gaussian Orthogonal Ensemble, meaning that it is a symmetric matrix with distribution  
<li> <em> Gaussian Random matrices. </em> Show that the matrix <math> M </math>, defined in Problem 5, is a GOE matrix, i.e. a matrix taken from the Gaussian Orthogonal Ensemble, meaning that it is a symmetric matrix with distribution <math> P_N(M)= Z_N^{-1}\text{exp}(-\frac{N}{4 \sigma^2} \text{Tr} M^2) </math>
<math>
where <math> Z_N </math> is a normalization. What is the value of <math> \sigma^2 </math>?  
P(M)= Z_N^{-1}e^{-\frac{N}{4 \sigma^2} \text{Tr} M^2}.
</math>
What is the value of <math> \sigma^2 </math>?  
</li>
</li>
</ol>
</ol>
<br>




<ol start="2">
<ol start="2">
<li><em> Eigenvalue density and concentration. </em> Let <math> \lambda_\alpha </math> be the eigenvalues of the matrix <math> M </math>. Show that the following identity holds:
<li><em> Eigenvalue density and concentration. </em> Let <math> \lambda_\alpha </math> be the eigenvalues of the matrix <math> M </math>. Show that the following identity holds:
<center>
<math display="block">
<math>
\mathbb{E}[|\text{det}  \left(M - p \epsilon \mathbb{I} \right)|]=  \mathbb{E}\left[\text{exp} \left((N-1) \int d \lambda \, \rho_{N-1}(\lambda) \, \log |\lambda - p \epsilon|\right) \right], \quad \quad \rho_{N-1}(\lambda)= \frac{1}{N-1} \sum_{\alpha=1}^{N-1} \delta (\lambda- \lambda_\alpha)
\overline{|\text{det}  \left(M - p \epsilon \mathbb{I} \right)|}=  \overline{\text{exp} \left[(N-1) \left( \int d \lambda \, \rho_N(\lambda) \, \log |\lambda - p \epsilon|\right) \right]}, \quad \quad \rho_{N}(\lambda)= \frac{1}{N-1} \sum_{\alpha=1}^{N-1} \delta (\lambda- \lambda_\alpha)
</math>
</math>
</center>
where <math>\rho_{N-1}(\lambda)</math> is the empirical eigenvalue distribution. It can be shown that if <math> M </math> is a GOE matrix, the distribution of the empirical distribution has a large deviation form with speed <math> N^2 </math>, meaning that <math> P_N[\rho] = e^{-N^2 \, g[\rho]} </math> where now <math> g[\cdot] </math> is a functional. Using a saddle point argument, show that this implies  
where <math>\rho_{N}(\lambda)</math> is the empirical eigenvalue density. It can be shown that if <math> M </math> is a GOE matrix, the distribution of the empirical density has a large deviation form (recall TD1) with speed <math> N^2 </math>, meaning that <math> P_N[\rho] = e^{-N^2 \, g[\rho]} </math> where now <math> g[\cdot] </math> is a functional (a function of a function). Using a saddle point argument, show that this implies  
<math display="block">
<center>
\mathbb{E}\left[\text{exp} \left((N-1) \int d \lambda \, \rho_{N-1}(\lambda) \, \log |\lambda - p \epsilon|\right) \right]=\text{exp} \left[N \int d \lambda \,  \rho_\infty(\lambda+p \epsilon) \, \log |\lambda|+ o(N) \right]
<math>
\overline{\text{exp} \left[(N-1) \left( \int d \lambda \, \rho_N(\lambda) \, \log |\lambda - p \epsilon|\right) \right]}=\text{exp} \left[N \left( \int d \lambda \,  \rho_{\text{typ}}(\lambda+p \epsilon) \, \log |\lambda|\right)+ o(N) \right]
</math>
</math>
</center>
where <math> \rho_\infty(\lambda) </math> is the typical value of the eigenvalue density, which satisfies  <math> g[\rho_\infty]=0 </math>.
where <math> \rho_{\text{typ}}(\lambda) </math> is the typical value of the eigenvalue density, which satisfies  <math> g[\rho_{\text{typ}}]=0 </math>.
</li>
</li>
</ol>
</ol>
<br>




<ol start="3">
<ol start="3">
<li><em> The semicircle and the complexity.</em> The eigenvalue density of GOE matrices is self-averaging, and it equals to  
<li><em> The semicircle and the complexity.</em> The eigenvalue density of GOE matrices is self-averaging, and it equals to  
<center>
<math display="block">
<math>
\lim_{N \to \infty}\rho_N (\lambda)=\lim_{N \to \infty} \mathbb{E}[\rho_N(\lambda)]= \rho_\infty(\lambda)= \frac{1}{2 \pi \sigma^2}\sqrt{4 \sigma^2-\lambda^2 }
\lim_{N \to \infty}\rho_N (\lambda)=\lim_{N \to \infty} \overline{\rho_N}(\lambda)= \rho_{\text{typ}}(\lambda)= \frac{1}{2 \pi \sigma^2}\sqrt{4 \sigma^2-\lambda^2 }
</math>
</math>
</center>
<ul>
<ul>
<li>Check this numerically: generate matrices for various values of <math> N </math>, plot their empirical eigenvalue density and compare with the asymptotic curve. Is the convergence faster in the bulk, or in the edges of the eigenvalue density, where it vanishes?  </li>
<!--<li>Check this numerically: generate matrices for various values of <math> N </math>, plot their empirical eigenvalue density and compare with the asymptotic curve. Is the convergence faster in the bulk, or in the edges of the eigenvalue density, where it vanishes?  </li>-->
 
 


 
Combining all the results, show that the annealed complexity is
<li> Combining all the results, show that the annealed complexity is
<math display="block">
<center> <math>
\Sigma_{\text{a}}(\epsilon)= \frac{1}{2}\log [4 e (p-1)]- \frac{\epsilon^2}{2}+ I_p(\epsilon), \quad \quad  I_p(\epsilon)= \frac{2}{\pi}\int d x \sqrt{1-\left(x- \frac{\epsilon}{ \epsilon_{\text{th}}}\right)^2}\, \log |x| , \quad \quad  \epsilon_{\text{th}}= -2\sqrt{\frac{p-1}{p}}.
\Sigma_{\text{a}}(\epsilon)= \frac{1}{2}\log [4 e (p-1)]- \frac{\epsilon^2}{2}+ I_p(\epsilon), \quad \quad  I_p(\epsilon)= \frac{2}{\pi}\int d x \sqrt{1-\left(x- \frac{\epsilon}{ \epsilon_{\text{th}}}\right)^2}\, \log |x| , \quad \quad  \epsilon_{\text{th}}= -2\sqrt{\frac{p-1}{p}}.
</math> </center>
</math>  
The integral <math>  I_p(\epsilon)</math> can be computed explicitly, and one finds:
The integral <math>  I_p(\epsilon)</math> can be computed explicitly, and one finds:
<center> <math>
<math display="block">
  I_p(\epsilon)=  
  I_p(\epsilon)=  
\begin{cases}
\begin{cases}
Line 168: Line 54:
&\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-\frac{1}{2}-\log 2 \quad \text{if} \quad \epsilon > \epsilon_{\text{th}}
&\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-\frac{1}{2}-\log 2 \quad \text{if} \quad \epsilon > \epsilon_{\text{th}}
\end{cases}
\end{cases}
</math> </center>
</math>  
Plot the annealed complexity, and determine numerically where it vanishes: why is this a lower bound or the ground state energy density?
Plot the annealed complexity, and determine numerically where it vanishes: why is this a lower bound or the ground state energy density?
</li>
</ul>
</ul>
</ol>
</ol>
<br>
 


<ol start="4">
<ol start="4">
<li><em> The threshold and the stability.</em>
<li><em> The threshold and the stability.</em>
  Sketch <math> \rho_{\text{typ}}(\lambda+p \epsilon) </math> for different values of <math> \epsilon </math>; recalling that the Hessian encodes for the stability of the stationary points, show that there is a transition in the stability of the stationary points at the critical value of the energy density  
  Sketch <math> \rho_\infty(\lambda+p \epsilon) </math> for different values of <math> \epsilon </math>; recalling that the Hessian encodes for the stability of the stationary points, show that there is a transition in the stability of the stationary points at the critical value of the energy density  
<math>
<math>
\epsilon_{\text{th}}= -2\sqrt{(p-1)/p}.
\epsilon_{\text{th}}= -2\sqrt{(p-1)/p}.
Line 184: Line 69:
</li>
</li>
</ol>
</ol>
<br>
== Back to dynamics: quenches, and dynamical transitions ==
Through Problems 5 and 6, we have shown that the energy landscape of the spherical <math>p</math>-spin model has exponentially many stationary points , and that there is a transition at the energy density  <math>\epsilon_{\rm th}</math>: for <math>\epsilon>\epsilon_{\rm th}</math> the stationary points are saddles, for <math>\epsilon\leq \epsilon_{\rm th}</math> they are local minima. Let us try to deduce something on the systems's dynamics out of this.
<li> '''Gradient descent dynamics.''' The local minima are dynamically stable: if we do gradient descent, we get stuck in a local minimum and we exert a small perturbation to the configuration, gradient descent brings us back to the local minimum. These configurations are <em>trapping</em>. If we try to optimize the landscape, i.e. to reach the ground state, with gradient descent dynamics, we expect that we will not be able to reach it easily, as we will be trapped by local minima. In fact, for the spherical <math>p</math>-spin model it can be shown that starting from random initial conditions and evolving the configuration with gradient descent (possibly with infinitesimal noise, to be sent to zero with a protocol),
<math display="block">
\lim_{t \to \infty} \lim_{N \to \infty} \frac{ E(\vec{\sigma}(t))}{N} = \epsilon_{\rm th} \neq \epsilon_{\rm gs}.
</math>
The system gets stuck at the energy density level where local minima start to appear, and does not reach the deeper local minima.
</li>
<br>
<li> '''Quenches in temperature and equilibration.''' We can generalize this protocol to higher <math>T</math>: we extract randomly the initial condition of the dynamics, and then we evolve the configuration with Langevin dynamics (gradient descent + noise):
<math display="block">
\frac{d \vec{\sigma}(t)}{dt}=- {\nabla}_\perp E(\vec{\sigma})+ {\vec{\eta}}_\perp(t), \quad \quad \langle \eta_i(t) \eta_j(t')\rangle= 2 T \delta_{ij} \delta(t-t')
</math>
In Langevin dynamics, <math>{\vec{\eta}}_\perp(t)</math> a Gaussian vector at each time <math> t </math>, uncorrelated from the vectors at other times <math> t' \neq t </math>,  with zero average and constant variance proportional to temperature. It represents the action of a thermal bath on the system.
This dynamical protocol is called a  <ins>quench </ins>. The question we can ask is: does the system equilibrate with the bath under this dynamics? If yes, we should see that
<math display="block">
\lim_{t \to \infty} \lim_{N \to \infty} \frac{E(\vec{\sigma}(t))}{N} = \epsilon_{\rm eq}(T),
</math>
where  <math>\epsilon_{\rm eq}(T)</math> is the equilibrium energy density at the temperature <math> T </math>, the same one controlling the strength of the noise. Equilibrating with the bath would indeed imply that at large time the system visits uniformly the equilibrium energy shell.
</li>
<br>
<li> '''Dynamical transition.''' Now, in the spherical <math>p</math>-spin we know that if <math>\epsilon_{\rm eq}(T)>\epsilon_{\rm th}</math>, the energy shell has many stationary points, but they are all unstable saddles and do not trap the dynamics. We expect that this energy shell is relatively easy to explore dynamically, and that equilibration takes place. On the other hand, if <math>\epsilon_{\rm eq}(T)<\epsilon_{\rm th}</math>, in the equilibrium energy shell and at higher energy, there are exponentially many local minima that trap the dynamics, and we expect that reaching equilibrium configurations will be difficult. This tells us that there exists a critical <math>T_d</math>, defined by
<math display="block">
\epsilon_{\rm eq}(T_d)=\epsilon_{\rm th},
</math>
such that for <math>T<T_d</math>
<math display="block">
\lim_{t \to \infty} \lim_{N \to \infty} \frac{E(\vec{\sigma}(t))}{N} \neq \epsilon_{\rm eq}(T).
</math>
The statement above for gradient descent corresponds to the special case <math>T=0</math>. <math>T_d</math> is called the <ins>dynamical transition temperature</ins>.
</li>
<br>
[[File:Activated Jump.png|thumb|right|x160px|Fig. 6 - Activated jump across an energy barrier.]]
<li> '''Equilibration timescales.''' Does it mean that when <math>T<T_d</math>, the system  <em>never</em> equilibrates? This is true only in the limit <math>N \to \infty</math>. When <math>N </math> is finite, there is a timescale <math>\tau_{\rm eq}(T, N)</math> beyond which the system equilibrates.  However, this equilibration timescale
in the spherical <math>p</math>-spin scales as
<math display="block">
\tau_{\rm eq}(T< T_d, N) \sim e^{N}.
</math>
This is again due to the presence of many local minima/ metastable states, that are separated by <ins>extensive</ins> energy barriers. So, when we take <math>N \to \infty</math> before taking the large time limit, we are unable to see equilibration and we have a sharp transition, which becomes a crossover for finite <math>N</math>.
</li>
<br>
<li> '''Activation and Arrhenius law.''' Why exponential timescales?  When the noise in the Langevin dynamics is weak (temperature is small), the dynamics  gets stuck in local minima for very large time. This time depends crucially on the <em> energy barrier </em> which separate the minimum from the other configurations (see Fig 6.1). The <ins>Arrhenius law</ins> states that the typical timescale <math> \tau</math> required to escape from a local minimum through a barrier of height <math> \Delta E </math> with thermal dynamics with inverse temperature <math> \beta </math> scales as <math>\tau \sim \tau_0 e^{-\beta \, \Delta E} </math>. Since in the spherical <math>p</math>-spin we have <math> \Delta E \sim N \;  \Delta \epsilon </math>, then <math> \tau_{\rm eq}(T< T_d, N)> \tau_0 e^{-\beta \, \Delta E}\sim e^{N} </math>. A dynamics made of jumps from minimum to minimum through the crossing of energy barriers is called <ins> activated dynamics </ins>.
</li>
<br>
<br>


== Check out: key concepts ==
== Check out: key concepts ==


Gradient descent, rugged landscapes, metastable states, Hessian matrices, random matrix theory, landscape’s complexity.
Metastable states, Hessian matrices, random matrix theory, landscape’s complexity.

Latest revision as of 17:27, 15 March 2026

Goal: Complete the characterisation of the energy landscape of the spherical p-spin.
Techniques: saddle point, random matrix theory.


Problems

Problem 6: the Hessian at the stationary points, and random matrix theory

This is a continuation of problem 5. To get the complexity of the spherical p-spin, it remains to compute the expectation value of the determinant of the Hessian matrix: this is the goal of this problem. We will do this exploiting results from random matrix theory discussion in the Tutorial and Exercise 4 .


  1. Gaussian Random matrices. Show that the matrix M, defined in Problem 5, is a GOE matrix, i.e. a matrix taken from the Gaussian Orthogonal Ensemble, meaning that it is a symmetric matrix with distribution PN(M)=ZN1exp(N4σ2TrM2) where ZN is a normalization. What is the value of σ2?


  1. Eigenvalue density and concentration. Let λα be the eigenvalues of the matrix M. Show that the following identity holds: 𝔼[|det(Mpϵ𝕀)|]=𝔼[exp((N1)dλρN1(λ)log|λpϵ|)],ρN1(λ)=1N1α=1N1δ(λλα) where ρN1(λ) is the empirical eigenvalue distribution. It can be shown that if M is a GOE matrix, the distribution of the empirical distribution has a large deviation form with speed N2, meaning that PN[ρ]=eN2g[ρ] where now g[] is a functional. Using a saddle point argument, show that this implies 𝔼[exp((N1)dλρN1(λ)log|λpϵ|)]=exp[Ndλρ(λ+pϵ)log|λ|+o(N)] where ρ(λ) is the typical value of the eigenvalue density, which satisfies g[ρ]=0.


  1. The semicircle and the complexity. The eigenvalue density of GOE matrices is self-averaging, and it equals to limNρN(λ)=limN𝔼[ρN(λ)]=ρ(λ)=12πσ24σ2λ2
      Combining all the results, show that the annealed complexity is Σa(ϵ)=12log[4e(p1)]ϵ22+Ip(ϵ),Ip(ϵ)=2πdx1(xϵϵth)2log|x|,ϵth=2p1p. The integral Ip(ϵ) can be computed explicitly, and one finds: Ip(ϵ)={ϵ2ϵth212ϵϵthϵ2ϵth21+log(ϵϵth+ϵ2ϵth21)log2ifϵϵthϵ2ϵth212log2ifϵ>ϵth Plot the annealed complexity, and determine numerically where it vanishes: why is this a lower bound or the ground state energy density?


  1. The threshold and the stability. Sketch ρ(λ+pϵ) for different values of ϵ; recalling that the Hessian encodes for the stability of the stationary points, show that there is a transition in the stability of the stationary points at the critical value of the energy density ϵth=2(p1)/p. When are the critical points stable local minima? When are they saddles? Why the stationary points at ϵ=ϵth are called marginally stable ?


Back to dynamics: quenches, and dynamical transitions

Through Problems 5 and 6, we have shown that the energy landscape of the spherical p-spin model has exponentially many stationary points , and that there is a transition at the energy density ϵth: for ϵ>ϵth the stationary points are saddles, for ϵϵth they are local minima. Let us try to deduce something on the systems's dynamics out of this.


  • Gradient descent dynamics. The local minima are dynamically stable: if we do gradient descent, we get stuck in a local minimum and we exert a small perturbation to the configuration, gradient descent brings us back to the local minimum. These configurations are trapping. If we try to optimize the landscape, i.e. to reach the ground state, with gradient descent dynamics, we expect that we will not be able to reach it easily, as we will be trapped by local minima. In fact, for the spherical p-spin model it can be shown that starting from random initial conditions and evolving the configuration with gradient descent (possibly with infinitesimal noise, to be sent to zero with a protocol), limtlimNE(σ(t))N=ϵthϵgs. The system gets stuck at the energy density level where local minima start to appear, and does not reach the deeper local minima.

  • Quenches in temperature and equilibration. We can generalize this protocol to higher T: we extract randomly the initial condition of the dynamics, and then we evolve the configuration with Langevin dynamics (gradient descent + noise): dσ(t)dt=E(σ)+η(t),ηi(t)ηj(t)=2Tδijδ(tt) In Langevin dynamics, η(t) a Gaussian vector at each time t, uncorrelated from the vectors at other times tt, with zero average and constant variance proportional to temperature. It represents the action of a thermal bath on the system. This dynamical protocol is called a quench . The question we can ask is: does the system equilibrate with the bath under this dynamics? If yes, we should see that limtlimNE(σ(t))N=ϵeq(T), where ϵeq(T) is the equilibrium energy density at the temperature T, the same one controlling the strength of the noise. Equilibrating with the bath would indeed imply that at large time the system visits uniformly the equilibrium energy shell.

  • Dynamical transition. Now, in the spherical p-spin we know that if ϵeq(T)>ϵth, the energy shell has many stationary points, but they are all unstable saddles and do not trap the dynamics. We expect that this energy shell is relatively easy to explore dynamically, and that equilibration takes place. On the other hand, if ϵeq(T)<ϵth, in the equilibrium energy shell and at higher energy, there are exponentially many local minima that trap the dynamics, and we expect that reaching equilibrium configurations will be difficult. This tells us that there exists a critical Td, defined by ϵeq(Td)=ϵth, such that for T<Td limtlimNE(σ(t))Nϵeq(T). The statement above for gradient descent corresponds to the special case T=0. Td is called the dynamical transition temperature.

  • Fig. 6 - Activated jump across an energy barrier.
  • Equilibration timescales. Does it mean that when T<Td, the system never equilibrates? This is true only in the limit N. When N is finite, there is a timescale τeq(T,N) beyond which the system equilibrates. However, this equilibration timescale in the spherical p-spin scales as τeq(T<Td,N)eN. This is again due to the presence of many local minima/ metastable states, that are separated by extensive energy barriers. So, when we take N before taking the large time limit, we are unable to see equilibration and we have a sharp transition, which becomes a crossover for finite N.

  • Activation and Arrhenius law. Why exponential timescales? When the noise in the Langevin dynamics is weak (temperature is small), the dynamics gets stuck in local minima for very large time. This time depends crucially on the energy barrier which separate the minimum from the other configurations (see Fig 6.1). The Arrhenius law states that the typical timescale τ required to escape from a local minimum through a barrier of height ΔE with thermal dynamics with inverse temperature β scales as ττ0eβΔE. Since in the spherical p-spin we have ΔENΔϵ, then τeq(T<Td,N)>τ0eβΔEeN. A dynamics made of jumps from minimum to minimum through the crossing of energy barriers is called activated dynamics .

  • Check out: key concepts

    Metastable states, Hessian matrices, random matrix theory, landscape’s complexity.