T-II-3

From ESPCI Wiki
Jump to navigation Jump to search


Extreme Statistics

Generically, finding the distribution of the maximum of a set of random variables is a non-trivial problem, which appears in many contexts ranging from the maximal height of water in a river to fluctuations in stock markets We consider N independent random variables (x1,...,xN) drawn from the same distribution p(x). We denote

yN=max(x1,...,xN)

It is useful to use the following notations for the cumulative distributions

P<(x)=xdxp(x)P>(x)=x+dxp(x)

Let us denote by qN(y) the distribution of yN and by QN(y)=Prob(yN<y) its cumulative distribution.

  • Write QN(y) in terms of P<(y). (Help: Start to write this relation for N=2,3,...).

This is the fundamental relation of Extreme statistics and we analyze its consequences in the large N limit where, analogously to the central limit theorem, extremes statistics display universal features.

  • In particular shows that in the large N limit we can write
QN(y)exp(NP>(y))

In the present exercise, we first study the case of the exponential distribution. In a second step we generalize our results to a larger class of distributions.


Exponential distribution

The exponential distribution is one of the fundamental continuous distributions, and already for this reason worthy of study. Among many other places, it appears in the Poisson process. The distribution writes:

p(x)=λexp(λx)

where both λ and x are positive numbers.

Preliminaries: the central limit

  • compute the mean value and the variance of this distribution
  • consider XN, the sum of N independent, exponentially distributed, random variables. How XN is distributed?


We write XN in a more convenient way

XN=aN+bNz

where aN the location of the distribution and bN is the width of the distribution of XN. Both numbers depend on N. Finally, z is a random number and its distribution, π(z) becomes independent of N in the large "N" limit. In other words this means that the distribution of XN is significantly different from zero when the value of XN is around aN, in a region of size bN.


  • From the central limit theorem which is the natural choice for aN and bN? Write the distribution π(z)

The Maxima

Consider now the case λ=1

  • Write P>(x) and P<(x). (Remember that x is a positive number.)
  • Write QN(y) and qN(y).
  • Plot qN(y) for different values of N.


We want now to give a natural definition for the number aN and bN.

Consider P>(y~)=12. If you draw N independent exponential variables, how many variables (in average) will be greater than y~? Repeat the same exercise with y~~ such that P>(y~~)=23

  • Justify that aN can be estimated from
P>(aN)=1N
  • Compute aN for the exponential distribution and justify that
QN(y=aN+z)

In the large N limit, the distribution π(z) becomes N independent.

  • Show that in this limit its cumulative takes the from
Π(z)=eez

This is the cumulative distribution of the famous Gumbel distribution.

Let us remark that the precise definition of aN and bN fix the mean and the variance of the rescaled distribution π(z) At variance with the central limit case the mean will be different from zero and the variance different from one.

  • Compute the mean, the variance and the asymptotic behavior of the Gumbel distribution. Draw the distribution. Explain why z=0 is a special point

Generic case: Universality of the Gumbel distribution

The Gumbel distribution is the limit distribution of the maxima of a large class of function. We can say that the Gumbel distribution plays, for extreme statistics, the same role of the Gaussian distribution for the central limit theorem.

By contrast the behavior of aN and bN as a function of N strongly depend on the particular distributions p(x). We discuss here a family of distribution characterized by a fast decay for large x

p(x)cexα

where α>0 The key point is to be able to determine A(x) such that

P>(x)=exp(A(x))
  • For p(x)=ex shows A(x)=x

Otherwise A(x) should be determined asymptotically for large x

  • Show that A(x)=xα+(α1)logx+...
  • Show that in general A(aN)=logN+... and compute aN as a function of α for large N.
  • Show that the maximum distribution take the form
limNQN(y)=(y=aN+zA(aN))

with z Gumbel distributed

  • Identify bN and discuss its behavior as a function of α

If the distribution p(x) is defined on the entire real axis and is characterized by the same fast decay, it is easy to generalize this result also for the distribution of the minima.

  • Write the Gumbel distribution for the minima

Minimum of exponential random numbers

The Gumbel distribution is not the only distribution for the extremes. Consider the simple case of the minima of the exponential distribution

  • Show analytically that the distribution function for the minimum of N exponential random numbers x=min(x1,,xN) with parameters λ1,λN is again an exponential random number with parameter λ1++λN:

π(x)=(λ1++λN)exp((λ1++λN)x)
Program this in Python, produce a histogram and compare with the exact result.

  • Look on the web which are the possible extreme distributions for independent and identically distributed variable