4 Convergence in distribution with cdfs
This chapter provides a standard characterization of convergence in distribution (weak convergence of probability measures) on the real line in terms of cumulative distribution functions.
4.1 Convergence in distribution
Convergence in distribution for random variables can be defined when the random variables take values in a topological space, and it amounts to the weak convergence of the probability measures that are the laws of those random variables. In the special case of real-valued random variables, or probability measures on the real line, the definition reads:
A sequence \((\mu _n)_{n \in \mathbb {N}}\) of Borel probability measures on \(\mathbb {R}\) converges weakly to a Borel probability measure \(\mu \) on \(\mathbb {R}\) if for all bounded continuous functions \(f \colon \mathbb {R}\to [0,+\infty )\) we have
4.2 Auxiliary results
A monotone function \(f \colon \mathbb {R}\to \mathbb {R}\) can have at most countably many points of discontinuity. In particular the set \(D \subset \mathbb {R}\) of continuity points of \(f\) is dense in \(\mathbb {R}\).
(The proof should already be in Mathlib.)
Let \(F\) be a cumulative distribution function. Then for any \(\varepsilon {\gt} 0\) there exists points \(a,b \in \mathbb {R}\) with \(a {\lt} b\) such that \(F(b) - F(a) {\gt} 1 - \varepsilon \) and \(F\) is continuous at the points \(a\) and \(b\).
Cumulative distribution functions satisfy \(F(x) \downarrow 0\) as \(x \downarrow - \infty \) and \(F(x) \uparrow 1\) as \(x \uparrow + \infty \). The required large difference \(F(b) - F(a)\) is obtained by choosing \(a\) small enough so that \(F(a) {\lt} \frac{\varepsilon }{2}\) and \(b\) large enough so that \(F(b) {\gt} 1 - \frac{\varepsilon }{2}\). In order to guarantee that \(a {\lt} b\) and that \(a\) and \(b\) are continuity points of \(F\), we recall that continuity points of the monotone function \(F\) are dense by Lemma 4.2, so we may decrease \(a\) and increase \(b\) as appropriate.
Let \(D \subset \mathbb {R}\) be a dense set and \(a,b \in D\) with \(a {\lt} b\). Then for any \(\delta {\gt} 0\) there exists a \(k \in \mathbb {N}\) and \(a = c_0, c_1, \ldots , c_{k-1}, c_k = b \in D\) such that \(|c_j - c_{j-1}| {\lt} \delta \) for all \(j=1,\ldots ,k\).
…
Let \(D \subset \mathbb {R}\) be a dense set, let \(f \colon \mathbb {R}\to \mathbb {R}\) be continuous, let \(a, b \in D\) with \(a {\lt} b\), and let \(\varepsilon {\gt} 0\). Then there exists a \(k \in \mathbb {N}\) and points \(a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k = b\) such that for each \(j = 1, \ldots , k\) we have \(c_j \in D\) and
On the compact interval \([a,b] \subset \mathbb {R}\), the continuous function \(f\) is uniformly continuous, so for some \(\delta {\gt} 0 \) we have \(|f(x)-f(y)|{\lt}\varepsilon \) whenever \(|x-y|{\lt}\delta \) and \(x,y \in [a,b]\). Now apply Lemma 4.4 to choose \(k\) and points \(a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k\) such that \(c_j - c_{j-1} {\lt} \delta \) and \(c_j \in D\) for all \(j = 1 , \ldots , k\). Now for any \(j = 1 , \ldots , k\), since for \(x \in [c_{j-1} , c_j]\) we have \(|x - c_j| {\lt} \delta \), we get
as desired.
Let \(a = c_0 {\lt} c_1 {\lt} \cdots {\lt} c_k = b\) and consider the linear combination of indicator functions
Then the integral of \(h\) with respect to a Borel probability measure \(\mu \) on \(\mathbb {R}\) whose can be written as
where \(F\) is the c.d.f. of \(\mu \).
Weak convergence of probability measures implies that if the boundary of a Borel set carries no probability mass under the limit measure, then the limit of the measures of the set equals the measure of the set under the limit probability measure.
In other words, if \(\lim _{n \to \infty } \mu _n = \mu \) in the sense of weak convergence of measures, Definition 4.1, and if \(A \subset \mathbb {R}\) is a Borel set such that \(\mu [\partial A] = 0\), then
(The proof is in Mathlib.)
4.3 Convergence in distribution from pointwise convergence of cdfs
Let \(F\) and \(F_n\), \(n \in \mathbb {N}\), be cumulative distribution functions of probability measures \(\mu \) and \(\mu _n\), \(n \in \mathbb {N}\), respectively, i.e.,
If \(\lim _{n \to \infty } F_n(x) = F(x)\) for all continuity points \(x\) of \(F\), then \(\lim _{n \to \infty } \mu _n = \mu \) in the sense of weak convergence of measures, Definition 4.1.
Let \(D \subset \mathbb {R}\) denote the set of continuity points of \(F\). By Lemma 4.2, \(D\) is dense in \(\mathbb {R}\). Assume that \(\lim _{n \to \infty } F_n(x) = F(x)\) for all \(x \in D\).
Let \(\varepsilon {\gt} 0\). Choose, by Lemma 4.3, points \(a,b \in D\), \(a{\lt}b\), such that \(F(b) - F(a) {\gt} 1 - \varepsilon \).
Observe also that since \(\lim _{n \to \infty } F_n(a) = F(a)\) and \(\lim _{n \to \infty } F_n(b) = F(b)\), there exists some \(N_1\) such that we have
Let \(f \colon \mathbb {R}\to \mathbb {R}\) be bounded and continuous. By Lemma 4.5 we can choose points \(a=c_0 {\lt} c_1 {\lt} \cdots {\lt} c_{k-1} {\lt} c_k = b\) such that for all \(j = 1 , \ldots , k\) we have \(c_j \in D\) and
Define the simple function \(h \colon \mathbb {R}\to \mathbb {R}\) by
The above estimate shows that \(|f(x) - h(x)| {\lt} \varepsilon \) for all \(x \in [a,b]\). By boundedness of \(f\), there exists a constant \(K{\gt}0\) such that \(|f(x)| \leq K\) for all \(x \in \mathbb {R}\). Since \(h\) vanishes outside \((a,b]\), the triangle inequality for integral with respect to \(\mu _n\) gives
When \(n \geq N_1\), we have \(\mu _n\big[ \mathbb {R}\setminus (a,b] \big] = 1 - \mu _n\big[ (a,b] \big] = 1 - (F_n(b) - F_n(a)) {\lt} 2 \varepsilon \), and thus the triangle inequality implies
Similarly, integrating now with respect to \(\mu \) instead, one shows that
It remains to consider the integrals of the function \(h\) with respect to both \(\mu _n\) and \(\mu \). By Lemma 4.6, these integrals are expressible in terms of the cumulative distribution functions,
and
The difference of the integrals of \(h\) with respect to these two can therefore be estimated as
By our assumption (ii), we have \(\lim _{n \to \infty } F_n(c_j) = F(c_j)\) for each \(j = 1 , \ldots , k\), so there exists \(N_2\) such that for \(n \geq N_2\) we have \(\max _{j = 1 , \ldots , k} | F(c_j) - F_n(c_j) | {\lt} \frac{\varepsilon }{k}\), and thus
Combining the estimates we have obtained, for \(n \geq \max (N_1 , N_2)\), we have
Since \(\varepsilon {\gt} 0\) was arbitrary, this shows that \(\int f \, \mathrm{d}\mu _n \to \int f \, \mathrm{d}\mu \) as \(n \to \infty \), so we have established the weak convergence \(\mu _n \to \mu \) according to Definition 4.1.
Let \(\mu \) and \(\mu _n\), \(n \in \mathbb {N}\), be Borel probability measures on \(\mathbb {R}\), and let \(F\) and \(F_n\), \(n \in \mathbb {N}\), be their cumulative distribution functions, respectively, i.e.,
If \(\lim _{n \to \infty } \mu _n = \mu \) in the sense of weak convergence of measures, Definition 4.1, then for all continuity points \(x\) of \(F\) we have \(\lim _{n \to \infty } F_n(x) = F(x)\).
Let \(x \in \mathbb {R}\) be a continuity point of \(F\). Then we have \(\mu [\left\{ x \right\} ] = 0\), by Lemma 1.15. Note that the boundary of the Borel set \((-\infty ,x] \subset \mathbb {R}\) is the singleton \(\partial (-\infty ,x] = \left\{ x \right\} \). Therefore the assumption \(\lim _{n \to \infty } \mu _n = \mu \) implies that
by a general fact (Lemma 4.7) about weakly converging sequences of measures that a for Borel sets whose boundary carries no mass in the limit measure. In terms of the c.d.f.s, the above reads
as asserted.