T-5: Difference between revisions

From Disordered Systems Wiki
Jump to navigation Jump to search
 
(73 intermediate revisions by the same user not shown)
Line 1: Line 1:
<strong>Goal: </strong>  
<strong>Goal: </strong>  
So far we have discussed the equilibrium properties of disordered systems, that are encoded in their partition function and free energy. In this set of problems, we characterize the energy landscape of the spherical <math>p</math>-spin, by determining the number of its stationary points.
In this set of problems, we characterize the energy landscape of the spherical <math>p</math>-spin by studying its metastable states (local minima).
<br>
<strong>Techniques: </strong> Kac-Rice formula, conditional probabilities.
<br>
<br>




<strong>Key concepts: </strong>  gradient descent, out-of-equilibrium dynamics, metastable states, Hessian matrices, random matrix theory, Langevin dynamics,?
== Dynamics, optimization, trapping local minima ==
[[File:Landscapes-GDD.png|thumb|right|x200px|Non-rugged vs rugged energy landscapes.]]


So far we have discussed the equilibrium properties of disordered systems, that are encoded in their partition function/free energy. When a system (following Langevin, Monte Carlo dynamics) equilibrates at sufficiently large times, its long-time properties are captured by these equilibrium calculations. In glassy systems the equilibration timescales are extremely large: for very large timescales the system  does not visit equilibrium configurations, but rather metastable states.


=== Dynamics, optimization, trapping local minima ===
<ul>


ADD HESSIAN
<li> '''Rugged landscapes.''' Consider the spherical <math>p</math>-spin model: <math>E(\vec{\sigma})</math> is an <ins> energy landscape </ins>. It is a random function on configuration space (the surface <math> \mathcal{S}_N </math> of the sphere). This landscape has its global minimum(a) at the ground state configuration(s): the energy density of the ground state(s) can be obtained studying the partition function <math> Z </math> in the limit <math> \beta \to \infty </math>. Besides the ground state(s), the energy landscape can have other local minima; fully-connected models of glasses are characterized by the fact that there are plenty of these local minima: the energy landscape is <ins> rugged</ins>, see the sketch.
[[File:Landscapes-GDD.png|thumb|right|x140px|Entropy of the Random Energy Model]]
</li>
<br>


<br>
<ul>
<li>Consider the spherical <math>p</math>-spin model discussed in the Problems 2 and 3; The function <math>E(\vec{\sigma})</math> is an <ins> energy landscape </ins>: it is a random function defined on configuration space, which is the space all configurations <math>\vec{\sigma}</math> belong to. This landscape has its global minimum(a) at the ground state configuration(s): the energy density of the ground state(s) can be obtained studying the partition function <math> Z </math> in the limit <math> \beta \to \infty </math>. Besides the ground state(s), the energy landscape can have other local minima; the fully-connected models of glasses are characterized by the fact that there are plenty of these local minima, see SKETCH. </li><br>


<li> Suppose that we are interested in finding the configurations of minimal energy of some model with energy landscape <math>E(\vec{\sigma})</math>, starting from an arbitrary initial configuration <math>\vec{\sigma}_0</math>: we can think about a dynamics in which we progressively update the configuration of the system moving towards lower and lower values of the energy, hoping to eventually converge to the ground state(s). The simplest dynamics of this sort is <ins>gradient descent</ins>,
<li> '''Optimization by gradient descent.''' Suppose that we are interested in finding the configurations of minimal energy, starting from an arbitrary configuration <math>\vec{\sigma}_0</math>: we can implement a dynamics in which we progressively update the configuration moving towards lower and lower values of the energy, hoping to eventually converge to the ground state(s). The simplest dynamics of this sort is <ins>gradient descent</ins>,
<center> <math>
<center> <math>
\frac{d \vec{\sigma}(t)}{dt}=- \nabla_{\perp} E(\vec{\sigma})  
\frac{d \vec{\sigma}(t)}{dt}=- \nabla_{\perp} E(\vec{\sigma})  
</math> </center>
</math> </center>
where the configuration changes in time moving in the direction of the gradient of the energy landscape restricted to the sphere, <math>\nabla_{\perp} E(\vec{\sigma})</math>. The dynamics stops when it reaches a  <ins>stationary point </ins>, i.e. a configuration where <math>  \nabla_\perp E(\vec{\sigma})=0</math>. If the landscape has a simple, convex structure, this will be the ground state one is seeking for; if the energy landscape is very non-convex like in glasses, the end point of this algorithm will be a local minimum at energies much higher than the ground state. SKETCH
where <math>\nabla_{\perp} E(\vec{\sigma})</math> is the gradient of the landscape restricted to the sphere. The dynamics stops when it reaches a  <ins> stationary point </ins>, a configuration where <math>  \nabla_\perp E(\vec{\sigma})=0</math>. If the landscape has a convex structure, this will be the ground state; if the energy landscape is very non-convex like in glasses, the end point of this algorithm will be a local minimum at energies much higher than the ground state (see sketch).
</li>
</li>
<br>
<br>


<li> To understand the structure of the energy landscape and to guess where gradient descent dynamics (or its variation) are expected to converge, it is useful to characterize the distribution of the stationary points, i.e. the number <math> \mathcal{N}(\epsilon)</math> of such configuration having a given energy density <math> \epsilon </math>. In fully-connected models of glasses, this quantity has an exponential scaling,   <math> \mathcal{N}(\epsilon) \sim \text{exp}\left(N \Sigma(\epsilon) \right)</math>, where  <math>  \Sigma(\epsilon)</math> is the <ins>complexity of the landscape</ins>. <sup>[[#Notes|[1] ]]</sup>
<li> '''Stationary points and complexity.''' To guess where gradient descent dynamics (or <ins> Langevin dynamics </ins>) are expected to converge, it is useful to understand the distribution of the stationary points, i.e. the number <math> \mathcal{N}(\epsilon)</math> of such configuration having a given energy density <math> \epsilon </math>. In fully-connected models, this quantity has an exponential scaling, <math> \mathcal{N}(\epsilon) \sim \text{exp}\left(N \Sigma(\epsilon) \right)</math>, where  <math>  \Sigma(\epsilon)</math> is the landscape’s <ins>complexity</ins>. <sup>[[#Notes|[*] ]]</sup>. Stationary points can be stable (local minima), or unstable (saddles or local maxima): their stability is encoded in the spectrum of the <ins> Hessian matrix </ins> <math>\nabla_{\perp}^2 E(\vec{\sigma})</math>: when all the eigenvalues of the Hessian are positive, the point is a local minimum (and a saddle otherwise).
</li>
</li>
</ul>
<br>




<!--<br>
 
<li> <em> MAYBE REMOVE: Noise, Langevin dynamics and activation. </em> How can one modify the dynamics to escape from a given local minimum and explore other regions of the energy landscape? One possibility is to add some stochasticity (or noise), i.e. some random terms that kick the systems in random directions in configuration space, towards which maybe the energy increases instead of decreasing:
<div style="font-size:89%">
<center> <math>
: <small>[*]</small> - This quantity looks similar to the entropy <math> S(\epsilon) </math> we computed for the REM in Problem 1. However, while the entropy counts all configurations at a given energy density, the complexity <math> \Sigma(\epsilon) </math> accounts only for the stationary points.
\frac{d \vec{\sigma}(t)}{dt}=- \nabla E(\vec{\sigma})+ \vec{\eta}(t)
</div>
</math> </center>
The simplest choice is to choose <math>\vec{\eta}(t)</math> to be a Gaussian vector at each time <math> t </math>, uncorrelated from the vectors at other times <math> t' \neq t </math>,  with zero average and some constant variance. This variance, which measures the strength of the noisy kicks, can be interpreted as a temperature: the resulting dynamics is known as <em> Langevin dynamics </em>.
</li>-->
</ul>
<br>
<br>


=== Problem 5.1: the Kac-Rice method and the complexity ===
== Problems ==
In this Problem, we set up the computation of the annealed complexity of the spherical <math>p</math>-spin model, which is defined by  
In this and in the following problem, we discuss the computation of the annealed complexity of the spherical <math>p</math>-spin model, which is defined by  
<center> <math>
<center> <math>
\Sigma_{\text{a}}(\epsilon)= \lim_{N \to \infty}\frac{1}{N}\log \overline{\mathcal{N}(\epsilon)} , \quad \quad \mathcal{N}(\epsilon)= \left\{ \text{number stat. points of energy density }  \epsilon\right\}
\Sigma_{\text{a}}(\epsilon)= \lim_{N \to \infty}\frac{1}{N}\log \overline{\mathcal{N}(\epsilon)} , \quad \quad \mathcal{N}(\epsilon)= \left\{ \text{number stat. points of energy density }  \epsilon\right\}
</math> </center>
</math> </center>
=== Problem 5: the Kac-Rice formula and the complexity ===




Line 56: Line 60:
<center>
<center>
<math>
<math>
\overline{\mathcal{N}(\epsilon)}= \int_{S_N} d \vec{\sigma} \,p_{\epsilon}(\vec{\sigma})  , \quad \quad p_{\epsilon}(\vec{\sigma})=\overline{|\text{det} \nabla_\perp^2 E (\vec{\sigma})|\,\, \delta(\nabla_\perp E(\vec{\sigma})=0) \, \,\delta(E(\vec{\sigma})- N \epsilon)}
\overline{\mathcal{N}(\epsilon)}= \int_{\mathcal{S}_N} d \vec{\sigma} \,p_{\epsilon}(\vec{\sigma})  , \quad \quad p_{\epsilon}(\vec{\sigma})=\overline{|\text{det} \nabla_\perp^2 E (\vec{\sigma})|\,\, \delta(\nabla_\perp E(\vec{\sigma})=0) \, \,\delta(E(\vec{\sigma})- N \epsilon)}
</math>
</math>
</center>
</center>
where <math> p_{\epsilon}(\vec{\sigma})</math> is the probability density that <math> \vec \sigma </math> is a stationary point of energy density <math> \epsilon </math>, and <math> \nabla_\perp^2 E (\vec{\sigma}) </math> is the Hessian matrix of the function  <math> E (\vec{\sigma}) </math> restricted to the sphere.<sup>[[#Notes|[2] ]]</sup>
where <math> p_{\epsilon}(\vec{\sigma})</math> is the probability density that <math> \vec \sigma</math> is a stationary point of energy density <math> \epsilon </math>, and <math> \nabla_\perp^2 E (\vec{\sigma}) </math> is the Hessian matrix of the function  <math> E (\vec{\sigma}) </math> restricted to the sphere.
</li>
</li>
</ol>
</ol>
Line 65: Line 69:


<ol start="2">
<ol start="2">
<li><em> Statistical rotational invariance.</em> Recall the expression of the correlations of the energy landscape of the <math>p</math>-spin computed in Problem 2.1: in which sense the correlation function is rotationally invariant? Justify why rotational invariance implies that  
<li><em> Statistical rotational invariance.</em> Recall the expression of the correlations of the energy landscape of the <math>p</math>-spin computed in Problem 3.1: in which sense the correlation function is rotationally invariant? Justify why rotational invariance implies that  
<center>
<center>
<math>
<math>
Line 77: Line 81:


<ol start="3">
<ol start="3">
<li><em> Gaussianity and correlations.</em> Determine the distribution of the quantity <math> E (\vec{1})</math>. Show that the components of the vector <math> \nabla E (\vec{1})</math> are Gaussian random variables with zero mean and covariances
<li><em> Gaussianity and correlations.</em>
<center>
 
<ul>
<li> Determine the distribution of the quantity <math> E (\vec{1})</math>. </li>
<li> The entries of <math>\nabla_\perp E (\vec{1}), \nabla^2_\perp E (\vec{1})</math> are Gaussian variables. One can show that the <math> N-1 </math> components of <math> \nabla_\perp E (\vec{1})</math> are uncorrelated to <math> E (\vec{1}), \nabla^2_\perp E (\vec{1})</math>; they have zero mean and covariances
<math>
<math>
\overline{(\nabla E)_i \, (\nabla E)_j}=  \frac{p}{2} \, \delta_{ij}+O\left(\frac{1}{N} \right)
\overline{(\nabla_\perp E)_\alpha \, (\nabla_\perp E)_\beta}=  p \, \delta_{\alpha \beta}+O\left(\frac{1}{N} \right).
</math>
</math>
</center>
Compute the probability density that <math> \nabla_\perp E (\vec{1})=0</math>. </li>
The quantity <math>\nabla_\perp E (\vec{1})</math> can be shown to be uncorrelated to <math> E (\vec{1}), \nabla^2_\perp E (\vec{1})</math>. The entries of the <math>(N-1)\times (N-1) </math> matrix <math> \nabla_\perp^2 E (\vec{\sigma}) </math> are also Gaussian variables. Computing their correlation, one finds that the matrix conditioned to the fact that <math> E(\vec 1)=N \epsilon </math> can be written as  
<li> The <math>(N-1)\times (N-1) </math> matrix <math> \nabla_\perp^2 E (\vec{\sigma}) </math> conditioned to the fact that <math> E(\vec 1)=N \epsilon </math> can be written as  
<center>
<center>
<math>
<math>
[\nabla_\perp^2 E(\vec{1})]_{\alpha \beta}= \left[\hat \Pi \nabla^2 E (\vec{1}) \hat \Pi - N^{-1}p \, E (\vec{\sigma}) \mathbb{I} \right]_{\alpha \beta}=  G_{\alpha \beta}- p  \epsilon\, \delta_{\alpha \beta},
[\nabla_\perp^2 E(\vec{1})]_{\alpha \beta}=  M_{\alpha \beta}- p  \epsilon\, \delta_{\alpha \beta},
</math>
</math>
</center>
</center>
where the matrix <math> G </math> has random entries with zero average and correlations
where the matrix <math> M </math> has random entries with zero average and correlations
<center>
<math>
<math>
\overline{{G}_{\alpha \beta} \, {G}_{\gamma \delta}}= \frac{p (p-1)}{2 N} \left( \delta_{\alpha \gamma} \delta_{\beta \delta}+ \delta_{\alpha \delta} \delta_{\beta \gamma}\right)
\overline{{M}_{\alpha \beta} \, {M}_{\gamma \delta}}= \frac{p (p-1)}{ N} \left( \delta_{\alpha \gamma} \delta_{\beta \delta}+ \delta_{\alpha \delta} \delta_{\beta \gamma}\right)
</math>
</math>
</center>
Combining this with the results above, show that
Combining everything, show that this implies
<center>
<center>
<math>
<math>
\overline{\mathcal{N}(\epsilon)}= (2 \pi e)^{\frac{N}{2}} \,\frac{1}{(\pi \, p)^{\frac{N-1}{2}}}\; \sqrt{\frac{N}{\pi}} e^{-N \epsilon^2}\;\overline{|\text{det} \left( G- p  \epsilon \mathbb{I} \right)|}
\overline{\mathcal{N}(\epsilon)}= (2 \pi e)^{\frac{N}{2}} \,\frac{1}{(2 \pi \, p)^{\frac{N-1}{2}}}\; \sqrt{\frac{N}{2 \pi}} e^{-\frac{N \epsilon^2}{2}}\;\overline{|\text{det} \left( M- p  \epsilon \mathbb{I} \right)|}
</math>
</math>
</center>
</center>
</ul>
</li>
</li>
</ol>
<br>


=== Problem 5.2: the Hessian and random matrix theory ===
To get the complexity, it remains to compute the expectation value of the determinant of the Hessian matrix: this is the goal of this problem. We will do this exploiting results from random matrix theory.
<ol>
<li> <em> Gaussian Random matrices. </em> Show that the matrix <math> G </math> is a GOE matrix, i.e. a matrix taken from the Gaussian Orthogonal Ensemble, meaning that it is a symmetric matrix with distribution
<math>
P(G)= Z_N^{-1}e^{-\frac{N}{4 \sigma^2} \text{Tr} G^2}.
</math>
What is the value of <math> \sigma^2 </math>?
</li>
</ol>
</ol>
<br>
<br>


== Check out: key concepts ==


<ol start="2">
Gradient descent, rugged landscapes, metastable states, landscape’s complexity.
<li><em> Eigenvalue density and concentration. </em> Let <math> \lambda_\alpha </math> be the eigenvalues of the matrix <math> G </math>. Show that the following identity holds:
<center>
<math>
\overline{|\text{det}  \left(G - p \epsilon \mathbb{I} \right)|}=  \overline{\text{exp} \left[(N-1) \left( \int d \lambda \, \rho_N(\lambda) \, \log |\lambda - p \epsilon|\right) \right]}, \quad \quad \rho_{N}(\lambda)= \frac{1}{N-1} \sum_{\alpha=1}^{N-1} \delta (\lambda- \lambda_\alpha)
</math>
</center>
where <math>\rho_{N}(\lambda)</math> is the empirical eigenvalue density. It can be shown that if <math> G </math> is a GOE matrix, the distribution of the empirical density has a large deviation form (recall TD1) with speed <math> N^2 </math>, meaning that <math> P_N[\rho] = e^{-N^2 \, g[\rho]} </math> where now <math> g[\cdot] </math> is a functional (a function of a function). Using a saddle point argument, show that this implies
<center>
<math>
\overline{\text{exp} \left[(N-1) \left( \int d \lambda \, \rho_N(\lambda) \, \log |\lambda - p \epsilon|\right) \right]}=\text{exp} \left[N \left( \int d \lambda \, \rho_{\text{ty}}(\lambda+p \epsilon) \, \log |\lambda|\right)+ o(N) \right]
</math>
</center>
where <math> \rho_{\text{ty}}(\lambda) </math> is the typical value of the eigenvalue density, which satisfies  <math> g[\rho_{\text{ty}}]=0 </math>.
</li>
</ol>
<br>
 


<ol start="3">
<li><em> The semicircle, the threshold and the ground state.</em> The eigenvalue density of GOE matrices is self-averaging, and it equals to
<center>
<math>
\lim_{N \to \infty}\rho_N (\lambda)=\lim_{N \to \infty} \overline{\rho_N}(\lambda)= \rho_{\text{ty}}(\lambda)= \frac{1}{2 \pi \sigma^2}\sqrt{4 \sigma^2-\lambda^2 }
</math>
</center>
<ul>
<li>Check this numerically: generate matrices for various values of <math> N </math>, plot their empirical eigenvalue density and compare with the asymptotic curve. Is the convergence faster in the bulk, or in the edges of the eigenvalue density, where it vanishes?  </li>


<li> Sketch <math> \rho_{\text{ty}}(\lambda+p \epsilon) </math> for different values of <math> \epsilon </math>; recalling that the Hessian encodes for the stability of the stationary points, show that there is a transition in the stability of the stationary points at a critical value of the energy density
<center>
<math>
\epsilon_{\text{th}}= -\sqrt{\frac{2(p-1)}{p}}
</math>
</center>
When are the critical point stable local minima? When are they saddles? Why the stationary points at <math> \epsilon= \epsilon_{\text{th}}</math> are called  <em> marginally stable </em>?
</li>




<li> Combining all the results, show that the annealed complexity is
<center> <math>
\Sigma_{\text{a}}(\epsilon)= \frac{1}{2}\log [4 e (p-1)]- \epsilon^2+ I_p(\epsilon), \quad \quad  I_p(\epsilon)= \frac{2}{\pi}\int d x \sqrt{1-\left(x- \frac{\epsilon}{ \epsilon_{\text{th}}}\right)^2}\, \log |x| , \quad \quad  \epsilon_{\text{th}}= -\sqrt{\frac{2(p-1)}{p}}.
</math> </center>
The integral <math>  I_p(\epsilon)</math> can be computed explicitly, and one finds:
<center> <math>
I_p(\epsilon)=
\begin{cases}
&\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-\frac{1}{2} - \frac{\epsilon}{\epsilon_{\text{th}}}\sqrt{\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-1}+ \log \left( \frac{\epsilon}{\epsilon_{\text{th}}}+ \sqrt{\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-1} \right)- \log 2 \quad \text{if} \quad \epsilon \leq \epsilon_{\text{th}}\\
&\frac{\epsilon^2}{\epsilon_{\text{th}}^2}-\frac{1}{2}-\log 2 \quad \text{if} \quad \epsilon > \epsilon_{\text{th}}
\end{cases}
</math> </center>
Plot the annealed complexity, and determine numerically where it vanishes: why is this a lower bound or the ground state energy density?
</li>
</ul>
</ol>
<br>


==Notes==
<div style="font-size:89%">
: <small>[1]</small> - This quantity looks similar to the entropy <math> S(\epsilon) </math> we computed for the REM in Problem 1.1. However, while the entropy counts all configurations at a given energy density, the complexity <math> \Sigma(\epsilon) </math> accounts only for the stationary points.
</div>


<div style="font-size:89%">
<!----------OLD---------
: <small>[2]</small> - We define with <math> \hat \Pi(\vec{\sigma}) </math> the projector on  the tangent plane to the sphere at <math> \vec{\sigma}</math>: this is the plane orthogonal to the vector <math> \vec{\sigma}</math>. The gradient <math> \nabla_\perp E(\vec{\sigma}) </math> is a <math> (N-1)</math>-dimensional vector that is obtained projecting the gradient <math> [\nabla E(\vec{\sigma})]_i=\partial E/\partial \sigma_i </math> on the tangent plane, <math> \nabla_\perp E(\vec{\sigma})=\hat \Pi \nabla E(\vec{\sigma})</math>. The Hessian <math> \nabla^2_\perp E(\vec{\sigma}) </math> is a <math> (N-1) \times (N-1)</math>-dimensional matrix that is obtained from the Hessian <math> [\nabla^2 E(\vec{\sigma})]_{ij}=\partial^2 E/\partial \sigma_i \partial \sigma_j </math> as <math> \nabla^2_\perp E(\vec{\sigma})= \hat \Pi(\vec{\sigma}) \, \nabla^2 E(\vec{\sigma}) \, \hat\Pi(\vec{\sigma}) - N^{-1}\nabla E(\vec{\sigma}) \cdot \vec{\sigma} \mathbb{I}</math> where <math> \mathbb{I}</math> is the identity matrix.
First, a few notions of geometry: we define with <math> \hat \Pi(\vec{\sigma}) </math> the projector on  the tangent plane to the sphere at <math> \vec{\sigma}</math>: this is the plane orthogonal to the vector <math> \vec{\sigma}</math>. The gradient <math> \nabla_\perp E(\vec{\sigma}) </math> is a <math> (N-1)</math>-dimensional vector that is obtained projecting the gradient <math> [\nabla E(\vec{\sigma})]_i=\partial E/\partial \sigma_i </math> on the tangent plane, <math> \nabla_\perp E(\vec{\sigma})=\hat \Pi \nabla E(\vec{\sigma})</math>. The Hessian <math> \nabla^2_\perp E(\vec{\sigma}) </math> is a <math> (N-1) \times (N-1)</math>-dimensional matrix that is obtained from the Hessian <math> [\nabla^2 E(\vec{\sigma})]_{ij}=\partial^2 E/\partial \sigma_i \partial \sigma_j </math> as <math> \nabla^2_\perp E(\vec{\sigma})= \hat \Pi(\vec{\sigma}) \, \nabla^2 E(\vec{\sigma}) \, \hat\Pi(\vec{\sigma}) - N^{-1}\nabla E(\vec{\sigma}) \cdot \vec{\sigma} \mathbb{I}</math> where <math> \mathbb{I}</math> is the identity matrix. <br>-->
</div>

Latest revision as of 11:00, 12 March 2025

Goal: In this set of problems, we characterize the energy landscape of the spherical -spin by studying its metastable states (local minima).
Techniques: Kac-Rice formula, conditional probabilities.


Dynamics, optimization, trapping local minima

Non-rugged vs rugged energy landscapes.

So far we have discussed the equilibrium properties of disordered systems, that are encoded in their partition function/free energy. When a system (following Langevin, Monte Carlo dynamics) equilibrates at sufficiently large times, its long-time properties are captured by these equilibrium calculations. In glassy systems the equilibration timescales are extremely large: for very large timescales the system does not visit equilibrium configurations, but rather metastable states.

  • Rugged landscapes. Consider the spherical -spin model: is an energy landscape . It is a random function on configuration space (the surface of the sphere). This landscape has its global minimum(a) at the ground state configuration(s): the energy density of the ground state(s) can be obtained studying the partition function in the limit . Besides the ground state(s), the energy landscape can have other local minima; fully-connected models of glasses are characterized by the fact that there are plenty of these local minima: the energy landscape is rugged, see the sketch.

  • Optimization by gradient descent. Suppose that we are interested in finding the configurations of minimal energy, starting from an arbitrary configuration : we can implement a dynamics in which we progressively update the configuration moving towards lower and lower values of the energy, hoping to eventually converge to the ground state(s). The simplest dynamics of this sort is gradient descent,

    where is the gradient of the landscape restricted to the sphere. The dynamics stops when it reaches a stationary point , a configuration where . If the landscape has a convex structure, this will be the ground state; if the energy landscape is very non-convex like in glasses, the end point of this algorithm will be a local minimum at energies much higher than the ground state (see sketch).


  • Stationary points and complexity. To guess where gradient descent dynamics (or Langevin dynamics ) are expected to converge, it is useful to understand the distribution of the stationary points, i.e. the number of such configuration having a given energy density . In fully-connected models, this quantity has an exponential scaling, , where is the landscape’s complexity. [*] . Stationary points can be stable (local minima), or unstable (saddles or local maxima): their stability is encoded in the spectrum of the Hessian matrix : when all the eigenvalues of the Hessian are positive, the point is a local minimum (and a saddle otherwise).



[*] - This quantity looks similar to the entropy we computed for the REM in Problem 1. However, while the entropy counts all configurations at a given energy density, the complexity accounts only for the stationary points.


Problems

In this and in the following problem, we discuss the computation of the annealed complexity of the spherical -spin model, which is defined by

Problem 5: the Kac-Rice formula and the complexity

  1. The Kac-Rice formula. Consider first a random function of one variable defined on an interval , and let be the number of points such that . Justify why

    where is the probability density that is a zero of the function. In particular, why is the derivative of the function appearing in this formula? Consider now the number of stationary points of the -spin energy landscape, which satisfy . Justify why the generalization of the formula above gives

    where is the probability density that is a stationary point of energy density , and is the Hessian matrix of the function restricted to the sphere.


  1. Statistical rotational invariance. Recall the expression of the correlations of the energy landscape of the -spin computed in Problem 3.1: in which sense the correlation function is rotationally invariant? Justify why rotational invariance implies that

    where is one fixed vector belonging to the surface of the sphere. Where does the prefactor arise from?


  1. Gaussianity and correlations.
    • Determine the distribution of the quantity .
    • The entries of are Gaussian variables. One can show that the components of are uncorrelated to ; they have zero mean and covariances Compute the probability density that .
    • The matrix conditioned to the fact that can be written as

      where the matrix has random entries with zero average and correlations Combining this with the results above, show that


Check out: key concepts

Gradient descent, rugged landscapes, metastable states, landscape’s complexity.