4 Convergence in distribution with cdfs

This chapter provides a standard characterization of convergence in distribution (weak convergence of probability measures) on the real line in terms of cumulative distribution functions.

4.1 Convergence in distribution

Convergence in distribution for random variables can be defined when the random variables take values in a topological space, and it amounts to the weak convergence of the probability measures that are the laws of those random variables. In the special case of real-valued random variables, or probability measures on the real line, the definition reads:

Definition 4.1 Weak convergence of probability measures

✓

A sequence $(\mu _n)_{n \in \mathbb {N}}$ of Borel probability measures on $\mathbb {R}$ converges weakly to a Borel probability measure $\mu $ on $\mathbb {R}$ if for all bounded continuous functions $f \colon \mathbb {R}\to [0,+\infty )$ we have

\begin{align*} \lim _{n \to \infty } \int _{\mathbb {R}} f(x) \, \mathrm{d}\mu _n(x) = \int _{\mathbb {R}} f(x) \, \mathrm{d}\mu (x) . \end{align*}

4.2 Auxiliary results

Lemma 4.2 Monotone real functions have only countably many points of discontinuity

✓

A monotone function $f \colon \mathbb {R}\to \mathbb {R}$ can have at most countably many points of discontinuity. In particular the set $D \subset \mathbb {R}$ of continuity points of $f$ is dense in $\mathbb {R}$.

Proof ▶

(The proof should already be in Mathlib.)

Lemma 4.3 Tightness of a cumulative distribution function

Let $F$ be a cumulative distribution function. Then for any $\varepsilon {\gt} 0$ there exists points $a,b \in \mathbb {R}$ with $a {\lt} b$ such that $F(b) - F(a) {\gt} 1 - \varepsilon $ and $F$ is continuous at the points $a$ and $b$.

Proof ▶

Cumulative distribution functions satisfy $F(x) \downarrow 0$ as $x \downarrow - \infty $ and $F(x) \uparrow 1$ as $x \uparrow + \infty $. The required large difference $F(b) - F(a)$ is obtained by choosing $a$ small enough so that $F(a) {\lt} \frac{\varepsilon }{2}$ and $b$ large enough so that $F(b) {\gt} 1 - \frac{\varepsilon }{2}$. In order to guarantee that $a {\lt} b$ and that $a$ and $b$ are continuity points of $F$, we recall that continuity points of the monotone function $F$ are dense by Lemma 4.2, so we may decrease $a$ and increase $b$ as appropriate.

Lemma 4.4 Subdivision with small mesh and within dense set

Let $D \subset \mathbb {R}$ be a dense set and $a,b \in D$ with $a {\lt} b$. Then for any $\delta {\gt} 0$ there exists a $k \in \mathbb {N}$ and $a = c_0, c_1, \ldots , c_{k-1}, c_k = b \in D$ such that $|c_j - c_{j-1}| {\lt} \delta $ for all $j=1,\ldots ,k$.

Proof ▶

…

Lemma 4.5 Subdivision for continuous function approximation

Let $D \subset \mathbb {R}$ be a dense set, let $f \colon \mathbb {R}\to \mathbb {R}$ be continuous, let $a, b \in D$ with $a {\lt} b$, and let $\varepsilon {\gt} 0$. Then there exists a $k \in \mathbb {N}$ and points $a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k = b$ such that for each $j = 1, \ldots , k$ we have $c_j \in D$ and

\begin{align*} \big| f(x) - f(c_j) \big| {\lt} \varepsilon \qquad \text{ for } \qquad x \in [c_{j-1} , c_j] . \end{align*}

Proof ▶

On the compact interval $[a,b] \subset \mathbb {R}$, the continuous function $f$ is uniformly continuous, so for some $\delta {\gt} 0 $ we have $|f(x)-f(y)|{\lt}\varepsilon $ whenever $|x-y|{\lt}\delta $ and $x,y \in [a,b]$. Now apply Lemma 4.4 to choose $k$ and points $a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k$ such that $c_j - c_{j-1} {\lt} \delta $ and $c_j \in D$ for all $j = 1 , \ldots , k$. Now for any $j = 1 , \ldots , k$, since for $x \in [c_{j-1} , c_j]$ we have $|x - c_j| {\lt} \delta $, we get

\begin{align*} \big| f(x) - f(c_j) \big| {\lt} \varepsilon \end{align*}

as desired.

Lemma 4.6 Simple function integral as linear combination of cdf differences

Let $a = c_0 {\lt} c_1 {\lt} \cdots {\lt} c_k = b$ and consider the linear combination of indicator functions

\begin{align*} h(x) = \sum _{j=1}^k \alpha _j \; \mathbb {I}_{{(c_{j-1},c_j]}}(x) . \end{align*}

Then the integral of $h$ with respect to a Borel probability measure $\mu $ on $\mathbb {R}$ whose can be written as

\begin{align*} \int _{\mathbb {R}} h(x) \, \mathrm{d}\mu (x) = \sum _{j=1}^k \alpha _j \, \big( F(c_j) - F(c_{j-1}) \big) , \end{align*}

where $F$ is the c.d.f. of $\mu $.

Proof ▶

\begin{align*} \int _{\mathbb {R}} h \, \mathrm{d}\mu = \; & \int _{\mathbb {R}} \Big( \sum _{j=1}^k \alpha _j \, \mathbb {I}_{{(c_{j-1},c_j]}}(x) \Big) \, \mathrm{d}\mu (x) \\ = \; & \sum _{j=1}^k \alpha _j \; \int _{\mathbb {R}} \mathbb {I}_{{(c_{j-1},c_j]}}(x) \, \mathrm{d}\mu (x) \\ = \; & \sum _{j=1}^k \alpha _j \; \mu \big[ (c_{j-1},c_j] \big] \\ = \; & \sum _{j=1}^k \alpha _j \, \big( F_n(c_j) - F_n(c_{j-1}) \big) \end{align*}

Lemma 4.7 One of the portmanteau implications

✓

Weak convergence of probability measures implies that if the boundary of a Borel set carries no probability mass under the limit measure, then the limit of the measures of the set equals the measure of the set under the limit probability measure.

In other words, if $\lim _{n \to \infty } \mu _n = \mu $ in the sense of weak convergence of measures, Definition 4.1, and if $A \subset \mathbb {R}$ is a Borel set such that $\mu [\partial A] = 0$, then

\begin{align*} \lim _{n \to \infty } \mu _n [ A ] = \mu [ A ] . \end{align*}

Proof ▶

(The proof is in Mathlib.)

4.3 Convergence in distribution from pointwise convergence of cdfs

Theorem 4.8 Sufficient condition for convergence in distribution with cdfs

Let $F$ and $F_n$, $n \in \mathbb {N}$, be cumulative distribution functions of probability measures $\mu $ and $\mu _n$, $n \in \mathbb {N}$, respectively, i.e.,

\begin{align*} F(x) = \; & \mu \big[(-\infty ,x]\big] & & \text{for $x \in \mathbb {R}$} \\ F_n(x) = \; & \mu _n \big[(-\infty ,x]\big] & & \text{for $x \in \mathbb {R}$ and $n \in \mathbb {N}$.} \end{align*}

If $\lim _{n \to \infty } F_n(x) = F(x)$ for all continuity points $x$ of $F$, then $\lim _{n \to \infty } \mu _n = \mu $ in the sense of weak convergence of measures, Definition 4.1.

Proof ▶

Let $D \subset \mathbb {R}$ denote the set of continuity points of $F$. By Lemma 4.2, $D$ is dense in $\mathbb {R}$. Assume that $\lim _{n \to \infty } F_n(x) = F(x)$ for all $x \in D$.

Let $\varepsilon {\gt} 0$. Choose, by Lemma 4.3, points $a,b \in D$, $a{\lt}b$, such that $F(b) - F(a) {\gt} 1 - \varepsilon $.

Observe also that since $\lim _{n \to \infty } F_n(a) = F(a)$ and $\lim _{n \to \infty } F_n(b) = F(b)$, there exists some $N_1$ such that we have

\begin{align*} F_n(b) - F_n(a) {\gt} 1 - 2\varepsilon \qquad \text{ for all } n \geq N_1 . \end{align*}

Let $f \colon \mathbb {R}\to \mathbb {R}$ be bounded and continuous. By Lemma 4.5 we can choose points $a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k = b$ such that for all $j = 1 , \ldots , k$ we have $c_j \in D$ and

\begin{align*} \big| f(x) - f(c_j) \big| {\lt} \varepsilon \qquad \text{ for } \qquad x \in [c_{j-1} , c_j] . \end{align*}

Define the simple function $h \colon \mathbb {R}\to \mathbb {R}$ by

\begin{align*} h(x) = \sum _{j=1}^k f(c_j) \; \mathbb {I}_{{(c_{j-1},c_j]}}(x) \end{align*}

The above estimate shows that $|f(x) - h(x)| {\lt} \varepsilon $ for all $x \in [a,b]$. By boundedness of $f$, there exists a constant $K{\gt}0$ such that $|f(x)| \leq K$ for all $x \in \mathbb {R}$. Since $h$ vanishes outside $(a,b]$, the triangle inequality for integral with respect to $\mu _n$ gives

\begin{align*} \Big| \int _{\mathbb {R}} f \, \mathrm{d}\mu _n - \int _{\mathbb {R}} h \, \mathrm{d}\mu _n \Big| \, \leq \; & \, \underbrace{\int _{(a,b]} |f-h| \, \mathrm{d}\mu _n}_{\leq \varepsilon } + \underbrace{\int _{\mathbb {R}\setminus (a,b]} |f| \, \mathrm{d}\mu _n}_{\leq K \, \mu _n\big[ \mathbb {R}\setminus (a,b] \big]} . \end{align*}

When $n \geq N_1$, we have $\mu _n\big[ \mathbb {R}\setminus (a,b] \big] = 1 - \mu _n\big[ (a,b] \big] = 1 - (F_n(b) - F_n(a)) {\lt} 2 \varepsilon $, and thus the triangle inequality implies

\begin{align*} \Big| \int _{\mathbb {R}} f \, \mathrm{d}\mu _n - \int _{\mathbb {R}} h \, \mathrm{d}\mu _n \Big| \, \leq \; & \varepsilon + K \, 2 \varepsilon = (1 + 2K) \, \varepsilon . \end{align*}

Similarly, integrating now with respect to $\mu $ instead, one shows that

\begin{align*} \Big| \int _{\mathbb {R}} f \, \mathrm{d}\mu - \int _{\mathbb {R}} h \, \mathrm{d}\mu \Big| \, \leq \; & (1 + K) \, \varepsilon . \end{align*}

It remains to consider the integrals of the function $h$ with respect to both $\mu _n$ and $\mu $. By Lemma 4.6, these integrals are expressible in terms of the cumulative distribution functions,

\begin{align*} \int _{\mathbb {R}} h \, \mathrm{d}\mu _n = \; & \sum _{j=1}^k f(c_j) \, \big( F_n(c_j) - F_n(c_{j-1}) \big) \end{align*}

and

\begin{align*} \int _{\mathbb {R}} h \, \mathrm{d}\mu = \; & \sum _{j=1}^k f(c_j) \, \big( F(c_j) - F(c_{j-1}) \big) . \end{align*}

The difference of the integrals of $h$ with respect to these two can therefore be estimated as

\begin{align*} \Big| \int _{\mathbb {R}} h \, \mathrm{d}\mu - \int _{\mathbb {R}} h \, \mathrm{d}\mu _n \Big| \, = \; & \, \Big| \sum _{j=1}^k f(c_j) \, \big( F(c_j) - F_n(c_j) - F(c_{j-1}) + F_n (c_{j-1}) \big) \Big| \\ \leq \; & \, \sum _{j=1}^k |f(c_j)| \; \Big( \big| F(c_j) - F_n(c_j) \big| + \big| F(c_{j-1}) + F_n (c_{j-1}) \big| \Big) \\ \leq \; & \, 2 k K \max _{j = 0 , \ldots , k} \big| F(c_j) - F_n(c_j) \big| . \end{align*}

By our assumption (ii), we have $\lim _{n \to \infty } F_n(c_j) = F(c_j)$ for each $j = 1 , \ldots , k$, so there exists $N_2$ such that for $n \geq N_2$ we have $\max _{j = 1 , \ldots , k} | F(c_j) - F_n(c_j) | {\lt} \frac{\varepsilon }{k}$, and thus

\begin{align*} \Big| \int _{\mathbb {R}} h \, \mathrm{d}\mu - \int _{\mathbb {R}} h \, \mathrm{d}\mu _n \Big| \leq \; & \, 2 K \varepsilon . \end{align*}

Combining the estimates we have obtained, for $n \geq \max (N_1 , N_2)$, we have

\begin{align*} & \Big| \int _{\mathbb {R}} f \, \mathrm{d}\mu - \int _{\mathbb {R}} f \, \mathrm{d}\mu _n \Big| \\ \leq \; \, & \underbrace{ \Big| \int _{\mathbb {R}} f \, \mathrm{d}\mu - \int _{\mathbb {R}} h \, \mathrm{d}\mu \Big|}_{ \leq (1+K) \varepsilon } + \underbrace{ \Big| \int _{\mathbb {R}} h \, \mathrm{d}\mu - \int _{\mathbb {R}} h \, \mathrm{d}\mu _n \Big|}_{ \leq 2 K \varepsilon } + \underbrace{ \Big| \int _{\mathbb {R}} h \, \mathrm{d}\mu _n - \int _{\mathbb {R}} f \, \mathrm{d}\mu _n \Big|}_{ \leq (1+2K) \varepsilon } \\ \leq \; \, & (2 + 5 K) \varepsilon . \end{align*}

Since $\varepsilon {\gt} 0$ was arbitrary, this shows that $\int f \, \mathrm{d}\mu _n \to \int f \, \mathrm{d}\mu $ as $n \to \infty $, so we have established the weak convergence $\mu _n \to \mu $ according to Definition 4.1.

Lemma 4.9 Necessary condition for convergence in distribution with cdfs

✓

Let $\mu $ and $\mu _n$, $n \in \mathbb {N}$, be Borel probability measures on $\mathbb {R}$, and let $F$ and $F_n$, $n \in \mathbb {N}$, be their cumulative distribution functions, respectively, i.e.,

If $\lim _{n \to \infty } \mu _n = \mu $ in the sense of weak convergence of measures, Definition 4.1, then for all continuity points $x$ of $F$ we have $\lim _{n \to \infty } F_n(x) = F(x)$.

Proof ▶

Let $x \in \mathbb {R}$ be a continuity point of $F$. Then we have $\mu [\left\{ x \right\} ] = 0$, by Lemma 1.15. Note that the boundary of the Borel set $(-\infty ,x] \subset \mathbb {R}$ is the singleton $\partial (-\infty ,x] = \left\{ x \right\} $. Therefore the assumption $\lim _{n \to \infty } \mu _n = \mu $ implies that

\begin{align*} \mu _n \big[ (-\infty ,x] \big] \to \mu \big[ (-\infty ,x] \big] , \end{align*}

by a general fact (Lemma 4.7) about weakly converging sequences of measures that a for Borel sets whose boundary carries no mass in the limit measure. In terms of the c.d.f.s, the above reads

\begin{align*} F_n(x) \to F(x) \end{align*}

as asserted.