\magnification=\magstep1
\hsize=16truecm
\input amstex
\parindent=20pt
\parskip=3pt plus 1pt
\TagsOnRight
\define\A{{\bold A}}
\define\BB{{\bold B}}
\define\DD{{\bold D}}
\define\T{{\bold T}}
\define\U{{\bold U}}
\define\Var{\text{\rm Var}\,}
\define\({\left(}
\define\){\right)}
\define\[{\left[}
\define\]{\right]}
\define\e{\varepsilon}
\define\oo{\omega}
\define\const{\text{\rm const.}\,}
\define\supp {\sup\limits}
\define\inff{\inf\limits}
\define\summ{\sum\limits}
\define\prodd{\prod\limits}
\define\limm{\lim\limits}
\define\limsupp{\limsup\limits}
\define\liminff{\liminf\limits}
\define\bigcapp{\bigcap\limits}
\define\bigcupp{\bigcup\limits}
\centerline{\bf The relation between the closeness of random variables}
\centerline {\bf and their distributions}
\smallskip
\centerline{\it by P\'eter Major}
\centerline{Mathematical Institute of the Hungarian Academy of Sciences}
\smallskip
\noindent
{\narrower
{\it Abstract:}\/ If two random variables are close to each other,
then the same relation holds for their distributions. This statement
cannot be reversed. Nevertheless, some non-trivial results can be proved in this
in the opposite direction if the question is formulated in the
right way. Given two probability measures on a general metric space let
us try to construct two random variables with these distributions
which are close to each other. In this problem the distribution of the
random variables are prescribed, but we have the freedom to ``couple
them", to define their joint distribution at our taste.
We show that the questions how close two probability measures are to
each other and how close random variables can be constructed with such
distributions are closely related. Such problems will be discussed here.
In particular, we introduce the notion of the Prochorov metric, the
quantile transform and discuss their most important properties. It is
worth mentioning that these results are strongly related to a classical
result in combinatorics, the K\"onig--Hall theorem.\par }
\beginsection Introduction
If two random variables are close to each other, then the same
relation holds for their distributions. The converse statement does
naturally not hold. For instance these two random variables can be
independent. But an interesting and non-trivial answer can be given
to the following question. Let two (close) probability measures
$\mu$ and $\nu$ be given on the Borel $\sigma$-algebra of a metric
space $(X,\rho)$. Is it possible to construct a probability space
$(\Omega,\Cal A, P)$ and two random variables $\xi$ and $\eta$
on it in such a way that the distribution of $\xi$ is $\mu$, the
distribution of $\eta$ is $\nu$, and the random variables $\xi$
and $\eta$ are close to each other? Naturally, in a detailed
discussion we must tell explicitly what we mean by closeness
of probability measures and random variables.
In the study of this problem the following approach is natural:
Put $(\Omega, \Cal A, P)=(X\times X,\Cal A\times \Cal A,P)$,
where $\times$ denotes direct product, $\Cal A$ is the $\sigma$-algebra
induced by the metric $\rho$, or more precisely by the topology
generated by it, and $P$ is an appropriate probability measure on the
space $(X,\Cal A)$. Furthermore, let us define the random variables
$\xi(x_1,x_2)=x_1$ and $\eta(x_1,x_2)=x_2$, $(x_1,x_2)\in X\times X$,
on this space. After the introduction of these objects the problem of
constructing random variables with prescribed distributions close to
each other can be reformulated in the following way: Let us construct
a probability measure $P$ on the space $(X\times X,\Cal A\times
\Cal A)$ whose marginal distributions are the measures $\mu$ and $\nu$,
and which is essentially concentrated close to the diagonal $\{(x,x),\;
x\in X\}$. In the investigation of this problem a classical
combinatorial result, the K\"onig--Hall theorem (sometimes called
the marriage problem), or more precisely a continuous version of this
result is very useful. We formulate these results, and also write
down their proofs in an Appendix. \vfill\eject
\headline{\ifodd\pageno \hfill {\it Coupling methods in Probability
Theory} \hfill \else \hfill {\it P\'eter Major} \hfill \fi}
\noindent
{\bf K\"onig--Hall theorem, (marriage problem).} {\it Let us consider
$n$ boys and $n$ girls such that some boys and girls know each other.
(All acquaintances are mutual.) We want to put these boys and girls
into pairs (make married couples) in such a way that all persons put
into a pair know each other. This is possible if and only if an
arbitrary group of the girls knows at least as many boys as the number
of this group.
In a more formal way: Let us consider a bipartitated graph consisting of
two sets $Y=\{y_1,\dots,y_n\}$ and $Z=\{z_1,\dots,z_n\}$ and a map
$Y\times Z\to\{0,1\}$. We can interpret the relation $d(y,z)=1$, $y\in
Y$, $z\in Z$ so that the points $y$ and $z$ are connected, while
$d(y,z)=0$ means that they are not connected. For all sets $A\subset
Y$ let us define the set $B(A)\subset Z$ which contains the points which
are connected to one of the points of $A$, i.e. let
$$
B(A)=\{z\: z\in Z,\text{ and there exists such an }y\in A, \text{ for
which }d(y,z)=1\}.
$$
There exists a factorization of this bipartitated graph, i.e. we can
divide the points of the sets $Y$ and $Z$ into pairs $(y_j,
z_{\pi(j)})$, $y_j\in Y$, $z_{\pi(j)}\in Z$, $j=1,2,\dots,n$, in such
a way that $d(y_j,z_{\pi(j)})=1$ for all $j=1,2,\dots,n$, and $\pi(j)$,
$j=1,\dots,n$, is an appropriate permutation of the set
$\{1,\dots,n\}$ if and only if $|B(A)|\ge |A|$ for all sets
$|A|\subset Y$, where $|C|$ denotes the cardinality of
a set $C$.} \medskip\noindent
{\bf The continuous version of the K\"onig--Hall theorem.} {\it Let
$r$ depots with stocks $u_1,u_2,\dots,u_r$ and $s$ plants with claims
$v_1\dots,v_s$ be given such that $\summ_{j=1}^r u_j=\summ_{k=1}^s v_k$.
Let certain depots and plants be connected by a route. We can satisfy
all claims of the plants by transporting the stocks of the depots on
these routes if and only if the joint demand of an arbitrary group of
plants is not greater than the joint stock of the depots which are
connected by a route with one of these plants.
In a more formal way: Let us consider a bipartitated graph consisting
of two sets $Y=\{y_1,\dots,y_r\}$ and sets $Z=\{z_1,\dots,z_s\}$ and
a map $d\:Y\times Z\to\{0,1\}$. We connect two points $y$ and $z$, $y\in
Y$, $z\in Z$, if $d(y,z)=1$, and we do not connect them if $d(y,z)=0$.
Furthermore, let two weight function $u(y)$, $u(y)\ge0$, $y\in Y$ and
$v(z)$, $v(z)\ge0$, $z\in Z$ be given such that $\summ_{y\in
Y}u(y)=\summ_{z\in Z}v(z)$. For all sets $A\subset Y$ let
us define the set $B(A)\subset Z$ by the formula
$$
B(A)=\{z\:z\in Z,\text{and there exists such an }y\in A, \text{ for
which } d(y,z)=1\}.
$$
There exists a ``transport function" $w(y,z)\ge0$ with the properties
\medskip
\item {i.)} $\summ_{z\: d(y,z)=1} w(y,z)=u(y)$ for all $y\in
Y$,\hfill\break and $\summ_{y\: d(y,z)=1} w(y,z)=v(z)$ for all $z\in
Z$.
\item{ii.)} The inequality $w(y,z)>0$ holds only if
$d(y,z)=1$, \medskip\noindent
if and only if the relation $\summ_{z\in B(A)} v(z)\ge \summ_{y\in
A}u(y)$ holds for all sets $A\subset Y$.}
\beginsection Problems
\item{0.)} Let us show that the conditions in the K\"onig--Hall
theorem and in its continuous version are symmetric for the sets
$Y$ and $Z$, i.e. these conditions remain valid, if we replace in them
(and in the definition of the set $B(A)$) the sets $Y$ and $Z$.
\item{1.)} Let two probability measures $\mu$ and $\nu$ be given on the
$\sigma$-algebra determined by the topology generated by the metric of
a separable metric space $(X,\rho)$. Let $B^\alpha=\{x\:
\rho(x,B)<\alpha\}$ denote the open neighborhood of the set $B\subset X$
of radius $\alpha$. Let us assume that the measures $\mu$ and $\nu$
satisfy the condition $\mu(B)\le \nu(B^\alpha)+\beta$ for all closed
sets $B\subset X$ with some numbers $\alpha>0$ and $\beta>0$. Then for
all $\e>0$ a probability space $(\Omega,\Cal A,P)$ can be constructed,
together with two random variables $\xi$ and $\eta$ on this probability
space with values in the space $(X,\rho)$ whose distributions are
$\mu$ and $\nu$ respectively, and which satisfy the relation
$$
P(\rho(\xi,\eta)>\alpha+\e)\le \beta+\e \tag a
$$
Conversely, if two random variables $\xi$ and $\eta$ with distributions
$\mu$ and $\nu$ respectively satisfy relation~(a), then
$\mu(B)\le \nu(B^{\alpha+\e})+\beta+\e$ for all closed sets~$B$.
\item{} If $X$ is not only a separable, but also a complete separable
metric space, then under the above conditions relation~(a) also holds
with~$\e=0$. (In the proof of this statement we may apply the result
by which a uniformly compact sequence of probability measures on a
complete metric space has a subsequence which is convergent with
respect to the weak convergence of probability measures.)
\item{2.)} Let a separable metric space $(X,\rho)$ be given together
with the Borel $\sigma$-algebra $\Cal A$ induced by the natural topology
of this space. Let $\Cal M$ denote the space of probability measures on
the space $(X,\Cal A)$, and let us introduce the following function
$d(\cdot,\cdot)$ on the pairs of probability measures on the space
$(X,\rho)$: If $\mu\in \Cal M$ and $\nu\in \Cal M$, then
$$
d(\mu,\nu)=\inf\{\alpha\:\mu(B)\le \nu(B^\alpha)+\alpha\;\text{for all
closed sets } B\subset X \},
$$
where $B^\alpha$ has the same meaning as in the previous problem.
Let us show that $d(\cdot,\cdot)$ is a metric on the space $\Cal M$
which metrizes the weak convergence in this space, i.e.\ the relation
$\mu_n\Rightarrow \mu$, as $n\to\infty$, holds for a sequence of
measures $\mu_n\in \Cal M$, $n=1,2,\dots$, and $\mu\in \Cal M$, where
$\Rightarrow$ denotes weak convergence, if and only if
$d(\mu_n,\mu)\to0$. The space $(\Cal M,d)$ is a separable metric
space, and if $(X,\rho)$ is a complete separable metric space, then the
same property holds for $(\Cal M,d)$. \medskip
\item{}{\it Remark:}\/ The property whether a metric generating a
topology is complete or non-complete is not topologically invariant,
i.e.\ it is possible that a complete and a non-complete metric generate
the same topology on a space. There is a classical
result by which the space of probability measures on a complete metric
space can be endowed with a metric which induces weak convergence, and
with which the space of probability measures is a complete metric
space. By the above remark this result does not imply automatically
that also the metric introduced in problem~2 has this property.
\medskip We shall also prove the following {\it Statement~A:}
\medskip\noindent
{\bf Statement A:} {\it Let us apply the notations of problem~2.
If $\mu_n\in\Cal M$, $n=1,2,\dots$, is a sequence of probability
measures on a metric space $(X,\rho)$, and this sequence of measures
together with a probability measure $\mu\in\Cal M$ satisfy the relation
$\mu_n\Rightarrow\mu$, then there exists a probability space
$(\Omega,\Cal A,P)$ and a sequence of random variables $\xi_n$,
$n=1,2,\dots$, together with a random variable $\xi$ on this space in
such a way that the distribution of $\xi_n$ is $\mu_n$, $n=1,2,\dots$,
the distribution of $\xi$ is $\mu$, and $\xi_n\to\xi$ with probability
one.}\medskip
The proof of {\it Statement A}\/ is based on the observation that in an
appropriate construction it can be achieved that the sets depending on
the index~$n$ where the distance between the random variables
$\xi_n$ and~$\xi$ is relatively large almost overlap each other.
This can be achieved if the joint distribution of the random variables
$\xi_n$ is appropriately chosen. As a consequence, the almost sure
convergence in {\it Statement~A}\/ is less useful than it may seem at
first sight. The deep results of the probability theory
containing "with probability one\dots" statements depend on the joint
distribution of the random variables. On the other hand, {\it Statement
A}\/ allows
the changement of the joint distribution of the random variables.
The proof of {\it Statement A}\/ is simpler in complete separable
spaces, and in this case the probability space where the convergent
sequence of random variables is constructed can be chosen in a very
special way. To prove {\it Statement A}\/ in this special case it is
useful first to solve the following problem. \medskip
\item{3.)} Let $(X,\rho)$ be a complete metric space, $\mu$ a
probability measure on the Borel $\sigma$-algebra of this
space. Let us consider the special probability space $(\Omega,\Cal
A,P)$ for which $\Omega=[0,1]$, $\Cal A$ is the Borel $\sigma$-algebra
on the interval $[0,1]$, and $P$ is the Lebesgue measure on the Borel
$\sigma$-algebra of the interval $[0,1]$. On this probability space
a random variable can be constructed with values on the space
$(X,\rho)$ whose distribution is~$\mu$.
\item{4.)} Let us prove {\it Statement A}\/ in the case when
$(X,\rho)$ is a complete separable metric space. Show that in this case
the probability space $(\Omega,\Cal A,P)$ where the convergent sequence
of random variables is constructed can be chosen in the following
special way: $\Omega=[0,1]$, $\Cal A$ is the Borel $\sigma$-algebra on
the interval $[0,1]$, and $P$ is the Lebesgue measure on the Borel
$\sigma$-algebra of the interval $[0,1]$.
\item{5.)} {\it Statement A}\/ also holds for convergent sequence of
probability measures on an arbitrary separable (not necessarily
complete) metric space. (In this space the sequence of convergent random
variables has to be constructed on an appropriate (generally very
large) probability space $(\Omega,\Cal A,P)$.)
\item{6.)} Let $\xi_n$, $n=1,2,\dots$, and $\xi$ be $(X,\rho)$ valued
random variables, where $(X,\rho)$ is a separable metric space.
Let us assume that $\xi_n\Rightarrow\xi$, where $\Rightarrow$ denotes
stochastic convergence. The distributions $\mu_n$ of the random
variables $\xi_n$ and the distribution $\mu$ of
the random variable $\xi$ satisfy the relation $\mu_n\Rightarrow\mu$,
where $\Rightarrow$ denotes weak convergence of probability measures.
\medskip
We have investigated the question that given two probability
measures $\mu$ and $\nu$ on a metric space $(X,\rho)$
how a pair of $\mu$ distributed $\xi$ and $\nu$ distributed $\eta$
random variables can be constructed which are close to each other.
We have seen that informally saying this question leads to the
to the following ``transport problem": How can a system of points with
mass distribution $\mu$ be transported with relatively few
movements to a system of points with mass distribution $\nu$?
If the metric space $(X,\rho)$ where we are working is the real line
with the usual metric, then because of the simple structure of this
space the ``transport problem" we have to handle becomes considerably
simpler. In this case if some ``natural evaluation of the transport
cost" is considered (see the subsequent problem~9, where such a problem
is formulated in an explicit form) it is useful to exclude the following
possibility: There exist such pairs of numbers $x_10,h\to0} F(x+h)$, is uniformly distributed in
the interval~$[0,1]$.
\item{8.)} Let $F$ and $G$ be two distribution functions. If
$\zeta$ is a random variable with uniform distribution on the interval
$[0,1]$, then the random variables $\bar\xi=F^{-1}(\zeta)$ and
$\bar\eta=G^{-1}(\zeta)$, where the inverse functions $F^{-1}$ are
$G^{-1}$ are defined in the same way as in the previous problems,
have distributions $F$ and $G$. If $\bar{\bar\xi}$ is a random
variable with distribution $F$, and $\e$ is an on the interval
$[0,1]$ uniformly distributed random variable independent of the
random variable $\bar{\bar\xi}$, then the random variable
$\bar{\bar\eta}=G^{-1}(\tilde F(\bar{\bar \xi},\e))$ has
distribution $G$. The distributions of the random vectors
$(\bar\xi,\bar\eta)$ and $(\bar{\bar\xi},\bar{\bar\eta})$ agree.
\medskip
In the problems of probability theory we sometimes have to construct
two random variables $\xi$ and $\eta$ who have prescribed distributions
$\mu$ and $\nu$. Actually in the problems of probability theory
generally it is enough to give the joint distribution of the random
vector $(\xi,\eta)$. Two constructions which determine random vectors
with the same distribution can be considered equivalent.
Hence the construction of both random vectors
$(\bar\xi,\bar\eta)$ and $(\bar{\bar\xi},\bar{\bar\eta})$
given in the previous problem are called the quantile transform in the
literature. In the next problem we formulate an optimality property of
the quantile transform. \medskip
\item{9.)} Let $\xi$ and $\eta$ be two random variables on the real
line with distribution functions $F(x)$ and $G(x)$, which satisfy the
conditions $E|\xi|<\infty$, $E|\eta|<\infty$. Beside this, let us
consider a convex function $\Phi(x)$ on the real line. Then
$$
E\Phi(\xi-\eta)\ge \int_0^1
\Phi\(F^{-1}(x)-G^{-1}(x)\)\,dx>-\infty,
$$
where $F^{-1}(x)$ and $G^{-1}(x)$ are the inverse functions defined
in problem~7. If the random vector is defined by means of the quantile
transform, then the two sides of the above inequality are equal.
\item{9a.)} Let $\xi$ and $\eta$ be two random variables on the real
line with distribution functions $F(x)$ and $G(x)$, which satisfy the
conditions $E|\xi|<\infty$, $E|\eta|<\infty$. Then $E|\xi-\eta|\ge
\int_{-\infty}^\infty|F(x)-G(x)|\,dx$, and the inequality can be
replaced by identity if the pair $(\xi,\eta)$ is constructed by
means of quantile transform. \medskip
Finally we formulate some problems different of the previous ones which
may be useful in some investigations. Their proofs apply some
non-trivial facts from measure theory like the existence of conditional
distribution, the Banach decomposition theorem or the
Radon--Nikodym theorem which implies the latter result.
\medskip
\item{10.)} Let three complete separable metric spaces $(X_i,\rho_i)$,
$i=1,2,3$ be given, and let $\Cal A_i$, $i=1,2,3$, denote the
$\sigma$-algebra induced by the topology of these spaces.
Let $\mu$ be a probability measure on the space $(X_1\times
X_2,\Cal A_1\times\Cal A_2)$ and $\nu$ a probability measure on the
space $(X_2\times X_3,\Cal A_2\times\Cal A_3)$. Let us assume that the
projections of the measures $\mu$ and $\nu$ to the space $X_2$ agree.
Then there exists such a probability measure $P$ on the space $(X_1
\times X_2\times X_3,\Cal A_1\times\Cal A_2\times A_3)$ whose
projection to the space $X_1\times X_2$ is $\mu$, and to the space
$X_2\times X_3$ is $\nu$.\medskip
Let us remark that if $(X,\rho)$ is a complete separable space, then
the direct product of infinitely many copies of this space
$X\times X\times\cdots$ can also be considered as a complete metric
space with an appropriate metric $\bar \rho$. Indeed, we may assume
that the metric in the space $(X,\rho)$ is such that $\rho(x,\bar x)\le
1$ for all points $x\in X$ and $\bar x\in X$, by introducing for
instance the new metric $\rho'(x,\bar x)=\min(\rho(x,\bar x),1)$ if it
is necessary. Then the metric
$$
\bar\rho\((x_1,x_2,\dots),(\bar x_1,\bar x_2,\dots)\)=\sum_{k=1}^\infty
\frac1{2^k}\rho(x_k,\bar x_k)
$$
can be defined on the product space $X\times X\times\cdots$. The
space $(X\times X\times\cdots,\bar\rho)$ is a separable complete metric
space with this metric.
It follows from the results of problem~10 that if $\mu$ and $\nu$ are
two probability measures on a complete separable metric space
$(X,\rho)$ and we want to construct two random variables $\xi$ and
$\eta$ with distributions $\mu$ and $\nu$ respectively which
are close to each other then we can apply the following approach.
We introduce some auxiliary random variables $\zeta$ and $\bar\zeta$
with values in the space $(X,\rho)$ in such a way that the members
of the pairs $(\xi,\zeta)$ and $\(\eta,\bar \zeta\)$ are close to each
other, $\xi$ is a $\mu$, $\eta$ is a $\nu$ distributed random variable,
and the distributions of $\zeta$ and $\bar\zeta$ agree. Furthermore, by
the remark made after problem~10 this approach can be applied also in
the case if we want to approximate a series of random variables
$\xi_1,\xi_2,\dots$ with a prescribed distribution with another
sequence of random variables $\eta_1,\eta_2,\dots$ of possibly different
distribution.
The statement of the previous paragraph can be slightly strengthened.
Let us have a $\mu$ distributed random variable $\xi$ on a
sufficiently rich probability space $(\Omega,\Cal A,P)$ and a
probability measure $\nu$ on a product space $X\times X$,
where $(X,\rho)$ is a complete separable metric space. Let us further
assume that the projection of the probability measure $\nu$ to the
first coordinate of the product space $X\times X$ is the distribution
$\mu$ of the random variable $\xi$. Then a random variable
$\eta$ can be constructed on the probability space $(\Omega,\Cal A,P)$
for which the random vector $(\xi,\eta)$ is $\nu$ distributed.
This means that if we want to construct a coupled pair
$(\xi,\eta)$ with prescribed joint distribution on a sufficiently rich
probability space, then we may demand that the random variable $\xi$
(with the right distribution) be fixed at the start. An analogous
statement also holds if we consider two sequences of random variables
$\xi_1,\xi_2,\dots$ and $\eta_1,\eta_2,\dots$ instead of the random
variables $\xi$ and~$\eta$.
We shall prove the above statement in the next problem~11.
We remark that a probability space is ``sufficiently rich" in the
sense we need it if there exists an in the interval $[0,1]$ uniformly
distributed random variable in this space which is independent of the
random variable $\xi$ or the random sequence $\xi_1,\xi_2,\dots$ which
we want to couple with an appropriate random variable or random
sequence. \medskip
\item{11.)} let $\nu$ be a probability measure on a product space
$X\times X$, where $(X,\rho)$ is a complete separable metric space.
Let $\mu$ be the projection of the measure $\nu$ to the first coordinate
of the space $X\times X$. Let $\xi$ be a $\mu$ distributed random
variable on a probability space $(\Omega,\Cal A,P)$, and let us assume
that there exists a random variable $\chi$ on this probability space
$(\Omega,\Cal A,P)$ with uniform distribution on the interval $[0,1]$
which is independent of $\xi$. Then such a random variable $\eta$ can
be constructed for which the distribution of the random pair
$(\xi,\eta)$ is $\nu$.
\item{12.)} If $\xi$ and $\eta$ are two random variables taking their
values on a measurable space $(X,\Cal A)$, the distribution of
$\xi$ is $\mu$ and the distribution of $\eta$ is $\nu$, then
$P(\xi\neq\eta)\ge \text{Var}\,(\mu,\nu)$, where
$\text{Var}\,(\mu,\nu)=\supp_{A\in \Cal A}|\mu(A)-\nu(A)|$, is the
variational distance between the measures $\mu$ and $\nu$.
For arbitrary probability measures $\mu$ and $\nu$ there exist such
$\mu$ and $\nu$ distributed random variables $\xi$ and $\eta$,
for which the two sides of the above inequality are equal. \medskip
Finally, we give a concise proof of two statements with the help of
the above results. The first of them is the functional central limit
theorem which is also called the invariance principle in the
literature. The second statement enables us to deduce some useful
consequences of the functional central limit theorem. Before
its formulation let us recall the following form of the central limit
theorem.\medskip\noindent
{\bf Central limit theorem.} {\it Let $\xi_{k,j}$, $k=1,2,\dots$, $1\le
j\le n_k$, with some positive integers $n_k$, be a triangular array,
i.e. assume that the random variables $\xi_{k,j}$, $1\le j\le n_k$, are
independent for a fixed number $k$. Assume that $E\xi_{k,j}=0$,
$E\xi_{k,j}^2=\sigma_{k,j}^2<\infty$, $\limm_{k\to\infty}
\summ_{j=1}^{n_k}\sigma^2_{k,j}=1$. If beside this the triangular array
$\xi_{k,j}$, $k=1,2,\dots$, $1\le j\le n_k$,
satisfies the Lindeberg condition, i.e.
$$
\lim_{k\to\infty}E\xi_{k,j}^2I(|\xi_{k,j}|>\e)=0\quad\text{for all
numbers } \e>0,
$$
where $I(A)$ denotes the indicator function of a set $A$, then the
sums $S_k=\summ_{j=1}^{n_k}\xi_{k,j}$, $k=1,2,\dots$, converge
in distribution to the standard normal distribution as $k\to\infty$.}
\medskip
The functional central limit theorem, formulated below, says that under
the conditions of the central limit theorem also the following sharper
statement holds. \medskip\noindent
{\bf Functional central limit theorem.} {\it Let $\xi_{k,j}$,
$k=1,2,\dots$, $1\le j\le n_k$, be a triangular array
such that $E\xi_{k,j}=0$, $E\xi_{k,j}^2=\sigma_{k,j}^2<\infty$,
$\limm_{k\to\infty}\summ_{j=1}^{n_k}\sigma^2_{k,j}=1$. Let us also
assume that the triangular array $\xi_{k,j}$, $k=1,2,\dots$, $1\le j\le
n_k$, satisfies the Lindeberg condition. Introduce the partial sums
$S_{k,l}=\summ_{j=1}^{l}\xi_{k,j}$, $k=1,2,\dots$, $1\le l\le n_k$,
and define the random broken line $S_k(t)$, $0\le t\le 1$,
$k=1,2,\dots$, in the following way. Put
$u_{k,l}=\summ_{j=1}^{l}\sigma_{k,j}^2$, $\bar u_{k,l}=\dfrac
{u_{k,l}}{u_{k,n_k}}$, let $S_k(\bar u_{k,l})=S_{k,l}$, $S_k(0)=0$,
$k=1,2,\dots$, $1\le l\le n_k$, and let the function $S_k(t)$ be linear
in the intervals $[0,\bar u_{k,1}]$ and $[\bar u_{k,l-1},\bar
u_{k,l}]$, $k=1,2,\dots$, $2\le l\le n_k$. Then the random broken lines
$S_k(t)$, $0\le t\le 1$, can be considered as random variables taking
values in the Banach space $C([0,1])$ of continuous functions endowed
with the supremum norm. The functional central limit theorem states
that the distributions of the random variables $S_k(t)$ (in the space
$C([0,1])$) converge in distribution to the so-called Wiener measure,
i.e.\ to the distribution of a Wiener process $W(t)$, $0\le t\le1$, as
$k\to\infty$. }\medskip
\item{13.)} Prove (with the help of the results in this work, the
central limit theorem, inequalities for the maximum of sums of
independent random variables and some basic facts about Wiener
processes) the functional central limit theorem. \medskip
The coupling results of this work can be useful also in the proof of
the following statement which may help for instance to understand why
the functional central limit theorem is a useful result of probability
theory.\medskip
\item{14.)} Let $\mu_n$, $n=1,2,\dots$, be a sequence of probability
measures on a separable (not necessarily complete) metric space
$(X,\rho)$ which converges weakly to a probability measure $\mu_0$ on
this space. Let $\Cal F$ be a measurable map from this separable
metric space $(X,\rho)$ to some other separable metric space
$(Y,\rho_1)$ such that the map $\Cal F$ is continuous with probability
one with respect to the limit measure $\mu_0$. Let us consider the
probability measures $\Cal F\mu_n$, defined as $\Cal F\mu_n(B)=\mu\{x\:
\Cal F(x)\subset B\}$ if $B\subset Y$ is a Borel-measurable set,
$n=0,1,\dots$, on the space $(Y,\rho_1)$. Prove with the help of the
result of Problem~5 that the probability measures $\Cal F \mu_n$
converge weakly the probability measure $\Cal F\mu_0$ as $\to\infty$.
\beginsection Remarks
The result of problem~1 was originally proved by V.~Strassen. I
learned about it and its close relation to the ``transport problem"
and the K\"onig--Hall theorem, an important method of combinatorics,
from the works of R.~M. Dudley. Let us remark that this result gives
a non-trivial answer to the question how closely two random variables
or two stochastic processes with prescribed distributions can be put
to each other. Nevertheless this result does not give a real help in
most coupling problems. The reason for this deficiency is that
generally it is not simpler to check formula~(a) for {\it all}\/
closed sets than to construct a coupling in an explicit way.
The result of problem~2 was proven by Yu.~V. Prochorov, and the metric
introduced there is called the Prochorov metric in the literature.
The result of problem~4 (and the result of problem~3 which serves as
basis for its solution) belongs to A.~V. Skorochod. The construction
supplying the solution of problem~5 belongs to R.~M. Dudley. The
generalization of problem~4 given in problem~5 is non-trivial. One
could try to deduce the result of problem~5 to problem~4 by embedding a
separable metric space to a complete separable metric space. Such an
embedding is always possible, but it may happen that the embedded space
is a non-measurable subset of the larger space. Hence one cannot get a
solution for problem~5 in such a simple way.
The quantile transform is a well-known method in probability theory,
and it is frequently used in certain investigations. It is hard to
relate its introduction to a definite person. It is worth mentioning
that the proof of the optimality property of the quantile transform
formulated in problem~9 is based on its reformulation for a
``transport problem". It is the simple structure of the real line
which enables us to prove such an explicit result in this case.
The results of problems 10, 11 and 12 are more or less well-known for
mathematicians working in this subject. However, I had the impression
that the answer to the question how big freedom we have to construct
random variables or random sequences with prescribed distribution is
not sufficiently well understood. Hence I thought it may be useful to
discuss some results which may help to study such questions.
The functional central limit theorem is a classical result of
probability theory. Its usual proof is based on the investigation of
the convergence of probability measures. The functional central limit
theorem together with the result formulated in problem~14 explain why
all ``natural" functions of partial sums of independent random
variables have a limit distribution which is independent of the
distribution of the summands and can be expressed as the distribution
of a functional of a Wiener process.
\vfill\eject
\beginsection Solutions
\item{0.)} Let us define for all sets $B\subset Z$ the set
$$
A(B)=\{y\: y\in Y, \;d(y,z)=1 \text{ for some }z \in B\}.
$$
We have to show that if the conditions of the K\"onig--Hall theorems
are satisfied, then $|A(B)|\ge |B|$. This statement is equivalent to
the relation $|Y\setminus A(B)|\le |Z\setminus B|$. It follows from
the conditions of the problem that $|B(Y\setminus A(B))|\ge |Y\setminus
A(B)|$. Hence, it is enough to show that $B(Y\setminus A(B))\subset
Z\setminus B$. This relation holds, since if $y\in B(Y\setminus A(B))$,
i.e.\ there exists an $z\notin A(B)$ such that $d(y,z)=1$, then by
the definition of the set $A(B)$ $y\notin B$, i.e.\ $B(Y\setminus
A(B))\subset Z\setminus B$.
\item{} The analogous statement in the case of the continuous
version of the K\"onig--Hall theorem can be proved similarly. In this
case the statement we have to prove is equivalent to the inequality
$\summ_{y\in Y\setminus A(B)} u(y)\le\summ_{z\in Z\setminus B} v(z)$
because of the identity $\summ_{y\in Y} u(y)=\summ_{z\in Z} v(z)$.
This inequality follows from the relation $B(Y\setminus A(B))\subset
Z\setminus B$.
\item{1.)} Let us define the probability space $(\Omega,\Cal A,P)$ with
the following choice: $\Omega=X\times X$, $\Cal A$ is the $\sigma$
algebra generated by the topology of the space $X\times X$, and
$P$ is an appropriate probability measure on the measure space
$(\Omega,\Cal A)$ we still have to define. Put $\xi(x_1,x_2)=x_1$ and
$\eta(x_1,x_2)=x_2$. We solve the problem if we can construct a
probability measure $P$ on the space $(\Omega,\Cal A)$ which satisfies
the following relations: \medskip
\itemitem{a.)} $P(A\times X)=\mu(A)$, $P(X\times A)=\nu(A)$ for all
measurable sets $A\subset X$.
\itemitem{b.)} $P\(\{(x_1,x_2)\:
\rho(x_1,x_2)>\alpha+\e\}\)\le\beta+\e$. \medskip
\item{} We shall construct such a probability measure $P$ with the help
of the continuous version of the K\"onig--Hall theorem.
\item{} First we define a bipartitated graph with an appropriate weight
function. To do this we introduce some notations.
Let $G(x,\alpha)$ denote the ball with center $x$ and radius
$\alpha$ in the metric space $(X,\rho)$. Let $x_1,x_2,\dots$, be
an everywhere dense sequence on the space $X$, and let us fix a number
$\e>0$. As $\bigcupp_{n=1}^\infty G\(x,\frac\e5\)=X$, there exists such
a number $N=N(\e)$ for which the set $W_N=\bigcupp_{n=1}^{N(\e)}
G\(x,\frac\e5\)$ satisfies the relations $\mu(W_N)>1-\frac\e2$ and
$\nu(W_N)>1-\frac\e2$. Let us define the sets
$V_k=G\(x_k,\frac\e5\)\setminus \bigcupp_{j=1}^{k-1}
G\(x_j,\frac\e5\)$, $k=1,\dots, N$ and $V_{N+1}=X\setminus W_N$.
Then $V_k$, $k=1,\dots,N+1$ is a partition of the space $X$,
$d(V_k)\le \frac \e5$, if $1\le k\le N$, where $d(A)$ denotes
the diameter of set $A\subset X$. Further, $\mu(V_{N+1})<\frac\e2$ and
$\nu(V_{N+1})<\frac\e2$. We shall call the point $x_k$ the center of
the set $V_k$, $1\le k\le N$.
\item{} We define the following bipartitated graph
$(Y,Z,d(\cdot,\cdot))$: $Y=\{y_1,\dots,y_{N+1}\}=\{V_1,\dots,V_{N+1}\}$,
$Z=\{z_1,\dots,z_{N+1}\}=\{V_1,\dots,V_{N+1}\}$, $d(y_j, z_k)=1$, if
$\rho(x_j,x_k)\le \alpha+\frac\e2$, $1\le j,k\le N$, and also
$d(y_{N+1},z_k)=1$ and $d(y_j,z_{N+1})=1$ for all indices $1\le j,k\le
N+1$. In all other cases we define $d(y,z)=0$. (The relation $d(y,z)=1$
means that the points $y$ and $z$ are connected.) We also introduce the
following weight functions $u(\cdot)$ and $v(\cdot)$ on the sets $Y$
and $Z$: $u(y_j)=\mu(V_j)$, $v(z_j)=\nu(V_j)$, $j=1,\dots,N$,
$u(y_{N+1})=\mu(V_{n+1})+\beta$, $v(z_{N+1})=\nu(V_{n+1})+\beta$. We
claim that this bipartitated graph and weight function satisfy the
conditions of the continuous version of the K\"onig--Hall theorem.
\item{} The desired inequality obviously holds for such sets
$A\subset Y$, for which $y_{N+1}\in A$. Indeed, in this case
$B(A)=Z$, since $y_{N+1}$ is connected to all points of the set $Z$.
If $y_{N+1}\notin A$ then put $D_1=\bigcupp_{V_j\in A}
V_j$ and $D_2=\bigcupp_{V_j\in B(A)} V_j$. In this case $\summ_{y\in
A}u(y)=\mu(D_1)$, and $\summ_{z\in B(A)}v(z)=\nu(D_2)+\beta$, since
$y_{N+1}\notin A$ and $z_{N+1}\in B(A)$. Hence, it is enough to show
that $\bar D_1^\alpha\subset D_2$, where $\bar D_1$ denotes the closure
of the set $D_1$. But if $x\in \bar D_1$, then there exists such a
set $y_j=V_j\in A$ whose center $x_j$ satisfies the inequality $\rho(x,
x_j)\le\frac\e5$, thus $G(x,\alpha)\subset G\(x_j,\alpha+\frac\e5\)$.
We claim that $G(x,\alpha)\subset G\(x_j,\alpha+\frac\e5\)\subset
D_2$, with this point $x$, hence $\bar D_1^\alpha\subset D_2$ as we
claimed. Indeed, if this statement did not hold, then there would
exist a point $v\in G\(x_j,\alpha+\frac\e5\)$, such that the element
$V_k$ of the partition $\{V_1,\dots,V_{N+1}\}$ for which $v\in V_k$ and
its center $x_k$ have the following properties: The points $V_k$ and
$V_j$ are not connected in the bipartitated graph we have defined,
hence $1\le k\le N$, and $\rho(x_j,x_k)\ge\alpha+\frac\e2$.
But this is not possible, since $d(V_k)\le\frac \e5$ and thus
$\rho(x_j,x_k)\le\rho(x_j,v)+\frac\e5\le \alpha +\frac{2\e}5$. We
have shown the continuous version of the K\"onig--Hall theorem can be
applied to this system.
\item{} Let $\bar w(y,z)$, $y\in Y$, $z\in Z$ be a ``transport
function" satisfying the continuous version of the K\"onig--Hall
theorem in the above system, and let us define the following function
$w_1(y_j,z_k)$ with its help:
$$
\align
&w_1(y_j,z_k)=\bar w(y_j,z_k)\quad\text{if } 1\le j,k\le N,\\
&\qquad \text{ and } w_1(y_j,z_k)=0, \quad\text{if }j=N+1 \text{ or }
k=N+1.
\endalign
$$
This function $w_1(y,z)$ satisfies the following properties: \medskip
\itemitem{i.)} $w_1(y_j,z_k)\ge 0$, and $w_1(y_j,z_k)>0$ only if
$1\le j,k\le N$ and $\rho(x_j,x_k)<\alpha+\frac\e2$
\itemitem{ii.)} $\summ_{z\in Z}w_1(y_j,z)\le u(y_j)=\mu(V_j)$,
$\summ_{y\in Y}w_1(y,z_k)\le v(z_k)=\nu(V_k)$.
\itemitem{iii.)} $\summ_{y\in Y, z\in Z} w_1(y,z)\ge 1-\beta-\e$,
because $\summ_{y\in Y, z\in Z} w_1(y,z)\ge \summ_{y\in Y, z\in Z}
w(y,z)-u(y_{N+1})-v(z_{N+1})\ge 1+\beta-2(\beta+\frac\e2)$.
\medskip
\item{} By the properties of the function $w_1(y,z)$ there is a
function $w_2(y_j,z_k)\ge0$, $1\le j,k\le N+1$, such that
the function $w(y,z)=w_1(y,z)+w_2(y,z)$ satisfies the properties
\medskip
\itemitem{} $\summ_{z\in Z}w(y_j,z)= u(y_j)=\mu(V_j)$, \
$\summ_{y\in Y}w(y,z_k)= v(z_k)=\nu(V_k)$ \medskip
\item{} We shall define a probability measure $P$ which satisfies
properties a.) and b.) with the help of the function $w(y,z)$.
Put
$$
P(C\times D)=\frac{\mu(C)}{\mu(W_j)}
\frac{\nu(C)}{\nu(W_k)}w(y_j,z_k)\quad\text{if }
C\subset W_j,\; D\subset W_k
$$
with some indices $1\le j,k\le N+1$. In the general case let us define
$$
P(C\times D)=\sum_{j=1}^{N+1}\sum_{k=1}^{N+1} P\((C\cap V_j)\times
(D\cap V_k)\).
$$
After this definition the measure $P$ can be extended from the above
rectangular sets to the whole $\sigma$ algebra $\Cal A$ in a unique
way. We claim that this measure $P$ satisfies both properties
a.) and b.). Indeed, for all sets
$A\subset V_j$, $1\le j\le N+1$
$$
P(A\times X)=\summ_{k=1}^{N+1}P(A\times V_k)=\dfrac{\mu(A)}{\mu(V_j)}
\summ_{k=1}^{N+1}w(y_j,z_k)=\mu(A),
$$
and this implies that $P(X\times A)=\nu(A)$ for all measurable sets
$A\subset X$. The second part of statement~a.) can be proved similarly.
On the other hand,
$$
\align
P((x_1,x_2)\: \rho(x_1,x_2)\le \alpha+\e)&\ge
\summ_{(j,k)\:\rho(x_j,x_k)\le \alpha+\frac\e2}P(V_j\times V_k)\\
&=\summ_{(j,k)\:\rho(x_j,x_k)\le \alpha+\frac\e2}w(V_j\times V_k)\\
&\ge \summ_{y_j\in Y,\,z_k\in Z}w_1(V_j\times V_k)
\ge 1-\beta-\e
\endalign
$$
by the property iii.), and this statement is equivalent to
condition~b.)
\item{} Conversely, if property (a) holds, then for arbitrary closed set
$B$ $\{\xi\subset B,\eta\notin
B^{\alpha+\e}\}\subset \{(\xi,\eta)\: \rho(\xi,\eta)>\alpha+\e\}$,
hence $P(\xi\in B,\eta\notin B^{\alpha+\e})\le \beta+\e$. As
$\{\xi\in B\}\subset \{\xi \in B, \eta\notin B^{\alpha+\e}\}\cup
\{\eta\in B^{\alpha+\e}\}$, this implies that $\mu(B)\le
\beta+\e+\nu(B^{\alpha+\e})$, and this is what we have to prove.
\medskip
\item{} To prove the last statement of problem~1 let us make the
following observation. If $X$ is a complete metric space, and $P_n$,
$n=1,2,\dots$, is a sequence of probability measures on the space
$X\times X$ whose marginal distributions are two measures $\mu$ and
$\nu$ which do not depend on the index $n$, then this sequence of
probability measures has a convergent subsequence if the weak
convergence of probability measures are considered.
\item{} To prove this statement let us observe that for all numbers
$\e>0$ there exists such a compact set $K\subset X$ for which
$\mu(K)\ge 1-\frac\e2$, and $\nu(K)\ge 1-\frac\e2$. Hence the compact
set $K\times K\subset X\times X$ and a sequence of probability measures
$P_n$ on the space $X\times X$ whose marginal distributions are $\mu$
and $\nu$ satisfy the inequality $P_n(K\times K)\ge 1-\e$ for all
indices $n$. By some classical results in probability theory this
implies that the sequence of the probability measures $P_n$ is tight,
hence it has a weakly convergent subsequence.
\item{} Let $P_n$, $n=1,2,\dots$, be a sequence of probability measures
on the space $X\times X$ with marginal distributions $\mu$ and $\nu$
which satisfy property~(a) with numbers $\e_n=\frac1n$. Let
$P_{n_k}$, $k=1,2,\dots$, be a convergent subsequence of this sequence,
and let $P$ denote its limit. The marginal distributions of the
measure $P$ are $\mu$ and $\nu$. We claim that a random vector with
distribution $P$ satisfies property~(a) also with the number~$\e=0$.
Indeed, the sets of the form $\{(x_1,x_2)\: (x_1,x_2)\in X\times
X,\rho(x_1,x_2)>\alpha+\e\}$ are open. Hence
$$
\beta\ge \limsup_{k\to\infty}P_{n_k}\(\{(x_1,x_2)\:
\rho(x_1,x_2)>\alpha+\e\}\)\ge P\(\{(x_1,x_2)\:
\rho(x_1,x_2)>\alpha+\e\}\)
$$
for all numbers $\e>0$. This implies that
$$
P\(\{(x_1,x_2)\:\rho(x_1,x_2)>\alpha\}\)=\lim_{\e\to0}
P\(\{(x_1,x_2)\:\rho(x_1,x_2)>\alpha+\e\}\)\le\beta.
$$
Problem~1 is proved.
\item{2.)} Let us first show that the function $d(\cdot,\cdot)$ is a
metric. i.) $d(\mu,\mu)=0$. On the other hand, we show that in the case
$d(\mu,\nu)=0$ $\mu=\nu$. Indeed, if $d(\mu,\nu)=0$, then $\mu(F)\le
\nu(F)$ for all closed sets $F\subset X$, since $F=\bigcapp_{\e\to0}
F^\e$, and $\mu(F)\le \liminff_{\e\to0}(\nu(F^\e)+\e)=\nu(F)$. We show
that also the identity $\mu(F)=\nu(F)$ holds for closed sets~$F$.
Indeed, as $\mu(G)\ge\nu(G)$ for all open sets $G$, hence
$\mu(F)=\limm_{\e\to0}\mu(F^\e) \ge\limm_{\e\to0}\nu(F^\e)=\nu(F)$.
This implies that $\mu(A)=\nu(A)$ for all closed or open sets $A$. On
the other hand, a probability measure is uniquely determined by its
values on the open sets, hence $\mu=\nu$. ii.) $d(\mu,\nu)=d(\nu,\mu)$.
Let $d(\mu,\nu)<\alpha$ with some number $\alpha>0$. We show that in
this case $d(\nu,\mu)\le\alpha$. Let $F\subset X$ be an arbitrary
closed set, and let us define the set $F'=X\setminus F^\alpha$. We
claim that $(F')^\alpha\subset X\setminus F$. Indeed, if $y\in
(F')^\alpha$ then $d(y,X\setminus F^\alpha)<\alpha$, that is there
exists such a point $z\in X$, for which $d(z,F)\ge\alpha$, and
$d(z,y)<\alpha$. This implies that $y\notin F$, hence
$(F')^\alpha\subset X\setminus F$. With the help of this relation we
get that $1-\mu(F^\alpha)=\mu(F')\le\nu((F')^\alpha)+\alpha\le
\nu(X\setminus F)+\alpha=1-\nu(F)+\alpha$, i.e.\ $\nu(F)\le
\mu(F^\alpha)+\alpha$. Hence $d(\nu,\mu)\le\alpha$. In such a way we
have shown that $d(\nu,\mu)\le d(\mu,\nu)$. Because of symmetry
reasons $d(\mu,\nu)=d(\nu,\mu)$. iii.) $d(\mu_1,\mu_3)\le
d(\mu_1,\mu_2)+d(\mu_2,\mu_3)$. If $d(\mu_1,\mu_2)=\alpha$,
$d(\mu_2,\mu_3)=\beta$, then
$\mu_1(F)\le\mu_2(F^{\alpha+\e})+\alpha+\e$, $\mu_2(F^{\alpha+\e})\le
\mu_3((F^{\alpha+\e})^{\beta+\e})+\beta+\e$ for all numbers
$\e>0$ and closed sets $F$. As $(F^{\alpha+\e})
^{\beta+\e})\subset F^{\alpha+\beta+2\e}$ this implies that
$\mu_1(F)\le \mu_3(F^{\alpha+\beta+2\e})+\alpha+\beta+2\e$.
Since this relation holds for all closed sets $F$ and numbers
$\e>0$ it implies relation iii.).
\item{} We show that $d(\mu_n,\mu)\to0$ if and only if
$\mu_n\Rightarrow \mu$. If $d(\mu_n,\mu)\to0$, then
$\limsupp_{n\to\infty}\mu_n(F)\le \mu(F^\e)+\e$ for all closed sets
$F$ and numbers $\e>0$. As $F=\bigcapp_{\e\to0}F^\e$,
$\limm_{\e\to0}\(\mu(F^\e)+\e\)=\mu(F)$. This relation implies the
inequality $\limsupp_{n\to\infty}\mu_n(F)\le \mu(F)$ for all closed
sets $F$, and this is equivalent to the statement
$\mu_n\Rightarrow\mu$.
\item{} To prove the other direction of the statement first we show that
for all $\e>0$ there exists an integer
$N=N(\e)>0$ and a partition $V_1,\dots,V_N,V_{N+1}$ of the space $X$
which satisfies the following conditions:
$\mu(\partial V_k)=0$, $k=1,\dots,N+1$, $\bar d(V_k)<\e^2$,
$k=1,\dots,N$, and $\mu(V_{N+1})\le\frac\e2$, where $\partial A$
denotes the boundary of the set $A$ and $\bar d(A)$ its diameter.
Indeed, let $x_1,x_2,\dots$ be an everywhere dense set in the space
$X$. For all $x_k$ let us choose a ball $G(x_k,\delta_k)$ in the space
$X$ of center $x_k$ and radius $\delta_k$ with some $\delta_k$ such
that $\frac{\e^2}2<\delta_k<\e^2$ for which $\mu\(\partial
G(x_k,\delta_k)\)=0$. The union of these balls covers the whole space
$X$. Let us choose such a number $N=N(\e)$ for which
$\mu\(\bigcupp_{k=1}^N G(x_k,\delta_k)\)\le\frac\e2$.
Let $V_{N+1}=X\setminus\(\bigcupp_{k=1}^N G(x_k,\delta_k)\)$ and
$V_k=G(x_k,\delta_k)\setminus\(\bigcupp_{j=1}^{k=1} G(x_j,\delta_j)\)$,
$k=1,\dots,N$. These sets satisfy the conditions we have imposed.
\item{} For all closed sets $F\subset X$ let us define the set
$B(F)=\bigcupp_{k\:V_k\cap F\neq\emptyset} V_k$. Let us observe
that $F\subset B(F)\subset F^\e\cup V_{N+1}$. Further,
$\limm_{n\to\infty}\supp_{F\text { closed set}}
|\mu_n(B(F)-\mu(B(F)|=0$, because
$\limm_{n\to\infty}\mu_n(V_k)=\mu(V_k)$ for all $k=1,\dots,N+1$,
there are finitely many sets of the form $B(F)$, and all of them are
the union of finitely many sets $V_k$. Hence there exists a threshold
index $n=n(\e)$ independent of the closed set $F$ such that
$$
\mu(F^\e)-\mu_n(F)\ge\mu(B(F))-\mu(V_{N+1})-\mu_n (B(F))\ge -\e
$$
for all closed sets $F$. This implies that $d(\mu_n,\mu)\le\e$, if
$n\ge n_0(\e)$. Hence $\limm_{n\to\infty} d(\mu_n,\mu)=0$ as we
claimed.
\item{} We show that the metric space $(\Cal M, d)$ is separable.
Let $x_1,x_2,\dots$, be an everywhere dense set in the space $X$,
let $\Cal M_0$ be the set of those discrete probability measures
which are concentrated in the finite subsets of the set
$\{x_1,x_2,\dots\}$ and the measure of all points is a rational
number. This is a countable set, hence it is enough to show that
$\Cal M_0$ is an everywhere dense subset of the set $\Cal M$.
This can be seen for instance by a slight modification of the
previous argument. This shows that for an arbitrary measure
$\mu\in \Cal M$ and number $\e>0$ we can choose a partition
$V_1,\dots,V_N$ of the set $X$ and a number $\eta>0$ for which the
following statement holds: If a probability measure $\nu\in \Cal M$
satisfies the condition $|\mu(V_k)-\nu(V_k)|\le\eta$ for all numbers
$k=1,\dots,N$, then the measure $\nu$ also satisfies the
relation $d(\mu,\nu)<\e$. Since for all $\mu\in\Cal M$ $\Cal M_0$
contains such a measure $\nu$, hence $\Cal M_0$ is an everywhere dense
subset of the set $\Cal M$.
\item{} We show that if $X$ is a complete separable space, then the
same property holds for the space $(\Cal M,d)$. It is enough to show
that if
$$
\limm_{n\to\infty}\supp_{m\: m\ge n}d(\mu_n,\mu_m)=0,
$$
then the series $\mu_n$, $n=1,2,\dots$ is relatively compact, i.e.
it has a weakly convergent subsequence. To prove this statement it is
enough to show that for all numbers $\e>0$ there exists a compact set
$K\subset X$, for which $\mu_n(K)\ge 1-\e$ for all indices~$n$.
\item{} The statement from which we want to deduce the completeness
of the space $(\Cal M,d)$ can be slightly weakened. It is enough to
show that for arbitrary $\e>0$ there exists a compact set $K=K(\e)$
whose neighbourhood of radius $\e$ $K^\e=\{x\:\rho(x,K)<\e\}$
satisfies the inequality $\mu_n(K^\e)\ge1-\e$ for all indices
$n=1,2,\dots$. Indeed, let us consider such sets $K\(\e2^{-m}\)$
with $\e2^{-m}$ for all $m=1,2,\dots$, and let us define the set
$K=\bigcapp_{m=1}^\infty (K(\e2^{-m})^{\e2^{-m}}$. Then
$\mu_n(K)\ge1-\summ_{m=1}^\infty\e2^{-m}=1-\e$. Hence it is enough
to prove that this set $K$ is relatively compact, (that is, its
closure is compact). To show this let us recall the result by which
a subset $A$ of a separable complete metric space is relatively
compact if and only if this set $A$ has a finite $\delta$-net for
all $\delta>0$, i.e.\ there exists a finite subset of the space $X$
such that for arbitrary $x\in A$ the distance of $x$ from one of the
points of this finite set is less than $\delta$. This property holds
for the above set $K$, since for all numbers $\delta>0$ there exists
an integer $m$ such that $\delta>\e2^{-m}$. Further, the set
$K(\e2^{-m})^{\e2^{-m}}$ has a finite $\delta$-net for this $m$, and
this is also a finite $\delta$-net for the original set~$K$.
\item{} This weakened statement can be proved in the following way:
For a fixed number $\e>0$ let us choose an index $n_0=n_0(\e)$ such
that the relation $d(\mu_{n_0},\mu_n)\le\frac\e4$ holds for all $n\ge
n_0$, and let $K_0\subset X$ be a compact set such that
$\mu_{n_0}(K_0)\ge1-\frac\e4$. Then $\mu_n(K_0^{\e/4})\ge
\mu_{n_0}(K_0)-\frac\e4\ge1-\frac\e2$ for all numbers $n\ge n_0$.
Let us choose such a compact set $K_1$, for which
$\mu_n(K_1)\ge 1-\frac\e2$ for all numbers $n\le n_0$. Then the set
$K=K_1\cup K_0$ is compact, and $\mu_n(K^\e)\ge1-\e$ for all numbers
$n=1,2,\dots$. Indeed, $\mu_n(K^\e)\ge \mu_n(K_0^{\e/4})\ge1-\e$, if
$n\ge n_0$, and $\mu_n(K^\e)\ge\mu_n(K_1)\ge1-\e$, if $n\le n_0$.
The statements of problem~2 are proved.
\item{3.)} Let us observe that the space $X$ has a partition $\Cal
X_1=\{A_1,A_2,\dots\}$ such that $\bar d(A_j)\le1$, and $\mu(\partial
A_j)=0$ for all indices $j=1,2,\dots$. Here $\bar d(A)$ denotes the
diameter of the set $A$ and $\partial A$ its boundary. (This statement
follows from the argument presented at the start of problem~2, when it
was shown that the relation $\mu_n\Rightarrow\mu$ implies that
$d(\mu_1,\mu_2)\to0$.) By splitting further the elements of this
partition we get a more and more refined sequence of partitions $\Cal
X_1\supset \Cal X_2\supset\cdots \Cal X_k\supset\cdots$ of the space
$X$ such that the elements of these partitions satisfy the relations
$\bar d\(A_{j_1,\dots,j_k}\)\le2^{-k}$ and
$\mu\(\partial A_{j_1,\dots,j_k}\)=0$. ($A_{j_1,\dots,j_k}\in
\Cal X_k$.) Let us define a similar sequence of more and more refined
partitions $\Cal Y_1\supset \Cal Y_2\supset\cdots\Cal Y_k\supset\cdots$
of the intervals $(0,1]$ (endowed with the Lebesgue measure) in the
following way:
\item{} Put $\Cal Y_1=\{B_1,\dots, B_k,\dots\}$,
$B_k=(b_{k-1},b_k]$, $b_k=\summ_{j=1}^k\mu(A_j)$, $k=1,2,\dots$, and
if the sets $B_{j_1,\dots,j_k}=
(b_{j_1,\dots,j_{k-1}},b_{j_1,\dots,j_k}]$, of the partition $\Cal Y_k$
are already defined, and they are defined so that the indentity
$b_{j_1,\dots,j_k}-b_{j_1,\dots,j_{k-1}}=\mu(A_{j_1,\dots,j_k})$ holds,
then split all intervals $B_{j_1,\dots,j_k}$ into subsequent
non-overlapping intervals
$$
B_{j_1,\dots,j_k,s}=(b_{j_1,\dots,j_k,s-1},b_{j_1,\dots,j_k,s}],
$$
of length $\mu\(A_{j_1,\dots,j_k,s}\)$. These smaller intervals will be
the elements of the partition $\Cal Y_{k+1}$ of the interval $[0,1]$.
After the definition of these partitions let us fix for all integers
$k\ge1$ and elements $A_{j_1,\dots,j_k}$ of the partition $\Cal X_k$
a point $x_{j_1,\dots,j_k}\in A_{j_1,\dots,j_k}$. Now for all
$k=1,2,\dots$ we define a random variable $\xi_k(y)$ on the probability
space $((0,1],\Cal B,\lambda)$ with values on the space $X$ by means of
the following formula: Put $\xi_k(y)=x_{j_1,\dots,j_k}$ if $y\in
B_{j_1,\dots,j_k}$. We claim that the limit
$\xi(y)=\limm_{k\to\infty}\xi_k(y)$ exists for all $y\in(0,1]$, and
it is a $\mu$ distributed random variable.
\item{} The above limit really exists, because for all $y\in (0,1]$
a uniquely determined sequence of decreasing intervals
$B_{j_1,\dots,j_k}$, $k=1,2,\dots$, exists in such a way that $y\in
B_{j_1,\dots,j_k}$. This implies that the sequence of points
$\xi_k(y)\in A_{j_1,\dots,j_k}$ is a Cauchy, hence also a convergent
sequence.
\item{} The union of the $\sigma$-algebras $\Cal X_k$, $k=1,2,\dots$,
generates the Borel $\sigma$-algebra in the space $(X,\rho)$. Hence to
prove that the above defined random variable $\xi$ has distribution
$\mu$ it is enough to show that $P\(\xi\in
A_{j_1,\dots,j_k}\)=\mu\(A_{j_1,\dots,j_k}\)$ for all
numbers $k=1,2,\dots$ and sets $A_{j_1,\dots,j_k}$. It follows from the
construction of the random variable $\xi$ that if
$y\in B_{j_1,\dots,j_k}$ then $\xi(y)\in \bar A_{j_1,\dots,j_k}$, where
$\bar A$ denotes the closure of the set $A$. First we show that it
follows from this fact and the relation
$$
P\(\xi(y)\in \partial A_{j_1,\dots,j_k}\)=0 \quad\text{for all numbers }
k=1,2,\dots \text{ and indices }j_1,\dots, j_k,
\tag$*$
$$
to be proven later that $\xi$ is $\mu$ distributed. Indeed, these
relations imply that
$$
\align
P\(\xi(y)\in A_{j_1,\dots,j_k}\)&\ge P\(\xi(y)\in \bar
A_{j_1,\dots,j_k}\) -P\(\xi(y)\in \partial A_{j_1,\dots,j_k}\)\\
&\ge \lambda\( B_{j_1,\dots,j_k}\) = \mu\( A_{j_1,\dots,j_k}\),
\endalign
$$
and summing up these inequalities we get that
$$
1=\sum_{j_1,\dots,j_k} P\(\xi(y)\in A_{j_1,\dots,j_k}\)
\ge \sum_{j_1,\dots,j_k} \mu\(A_{j_1,\dots,j_k}\)=1.
$$
As a consequence, the above inequalities are actually identities, and
the distribution of $\xi$ is $\mu$.
\item{} To prove formula $(*)$ let us observe that since
$\mu(\partial A_{j_1,\dots,j_k})=0$ for all numbers $\e>0$
there exists a number $\delta=\delta(\e)>0$ such that the neighbourhood
$(\partial A_{j_1,\dots,j_k})^\delta$ of radius $\delta$ of the set
$\partial A_{j_1,\dots,j_k}$ satisfies the inequality
$\mu\((\partial A_{j_1,\dots,j_k})^\delta\)<\e$. Let us choose an
integer $s$ so large that the diameters of the elements of the partition
$\Cal X_k$ satisfy the inequality $\bar
d(A_{j'_1,\dots,j'_s})<\frac\delta2$ for all sets
$A_{j'_1,\dots,j'_s}$. Since
$\rho(\xi_s(y),\xi(y))\le\max\limits_{j'_1,\dots,j'_s}\bar
d(A_{j'_1,\dots,j'_s})\le\frac\delta2$, hence in the case $\xi(y)\in
\partial A_{j_1,\dots,j_k}$ \ $\rho\(\xi_s(y),\partial
A_{j_1,\dots,j_k}\)<\frac\delta2$. This implies that if
$\xi(y)\in\partial A_{j_1,\dots,j_k}$, then the indices $j'_1,\dots,j'_s$
for which $y\in B_{j'_1,\dots,j'_s}$ are such that $\xi_s(y)\in
A_{j'_1,\dots,j'_s}$, $\rho\(A_{j'_1,\dots,j'_s},\partial
A_{j_1,\dots,j_k}\)<\frac\delta2$, hence $A_{j'_1,\dots,j'_s}\subset
(\partial A_{j_1,\dots,j_k})^\delta$. This implies that
$$
\align
&\left\{y\: \xi(y)\in\partial A_{j_1,\dots,j_k}\right\} \\
&\qquad \subset \left\{y\: \xi_s(y)\in
A_{j'_1,\dots,j'_s}\subset (\partial
A_{j_1,\dots,j_k})^\delta \text{ with some indices
}j'_1,\dots,j'_s\right\},
\endalign
$$
and
$$
\align
&\lambda\left\{y\: \xi(y)\in\partial A_{j_1,\dots,j_k}\right\}\le
\sum_{(j'_1,\dots,j'_s)\: A_{j'_1,\dots,j'_s}\subset (\partial
A_{j_1,\dots,j_k})^\delta} \lambda\(B_{j'_1,\dots,j'_s}\) \\
&\qquad =\sum_{(j'_1,\dots,j'_s)\: A_{j'_1,\dots,j'_s}\subset (\partial
A_{j_1,\dots,j_k})^\delta} \mu\(A_{j'_1,\dots,j'_s}\)\le\mu\((\partial
A_{j_1,\dots,j_k})^\delta\)\le\e.
\endalign
$$
Since this statement holds for all $\e>0$, this implies relation
$(*)$ and the statement of problem~3.
\item{4.)} We construct the random variables $\xi_n$ with
distribution $\mu_n$, $n=1,2,\dots$, and the random variable $\xi$
with distribution $\mu$ simultaneously by means of the construction
of problem~3. The only novelty is that we apply the same partitions
$\Cal X_1\subset\Cal X_2\subset\cdots$ in the construction of all
random variables $\xi_n$, $n=1,2,\dots$, and $\xi$, and demand that
the boundaries of the elements $A_{j_1,\dots,j_k}\in \Cal X_k$ of the
partitions satisfy the relation $\mu(\partial A_{j_1,\dots,j_k})=0$
and $\mu_n(\partial A_{j_1,\dots,j_k})=0$ for all $k=1,2,\dots$,
$A_{j_1,\dots,j_k}\in \Cal X_k$ and $n=1,2,\dots$. We remark that
it causes no extra difficulty to find partitions $\Cal X_k$ with
such elements whose boundary have the above property. Indeed, we can
for instance introduce a new probability measure $\bar\mu=\dfrac\mu2+
\summ_{n=1}^\infty 2^{-n-1}\mu_n$ and construct a sequence of more
and more refined partitions of the space $X$ in the same way as it
was done in the construction of problem~3 with the only difference
that we replace the measure $\mu$ by the measure $\bar\mu$. Then the
boundaries of the partitions satisfy the desired requirement for all
measures $\mu$ and $\mu_n$, $n=1,2,\dots$. We claim that if
$\mu_n\Rightarrow\mu$, then $\xi_n\to\xi$ with probability one for
the random variables constructed in the above way.
\item{} Let us observe that if $\mu_n\Rightarrow\mu$, then
$\limm_{n\to\infty}\mu_n(A_{j_1,\dots,j_k})=\mu(A_{j_1,\dots,j_k})$
for all $j_1,\dots,j_k$, since the boundaries of these sets have zero
$\mu$ measure. This relation and the structure of our construction
imply that $b_{j_1,\dots,j_k}(n)\to b_{j_1,\dots,j_k}$ for all
$k=1,2,\dots$ and indices $j_1,\dots,j_k$ as $n\to\infty$, where
$b_{j_1,\dots,j_k}(n)$ and $b_{j_1,\dots,j_k}$ denote the right-hand
side end-points of the intervals $B_{j_1,\dots,j_k}(n)$ and
$B_{j_1,\dots,j_k}$ which appear in the partitions $\Cal
Y_k(n)$ and $\Cal Y_k$ of the intervals $(0,1]$ introduced during the
definition of the random variables $\xi_n$ and $\xi$.
\item{} For a fixed small number $\e>0$ let us choose a number
$k=k(\e)$ for which $2^{-k}<\e$. Then the diameter of the elements
$A_{j_1,\dots,j_k}$ of the partition $\Cal X_k$ are less than
$2^{-k}<\e$. Let us choose a finite set $D_k$ of $k$-tuples
$\{j_1,\dots,j_k\}$ for which
$\summ_{(j_1,\dots,j_k)\:(j_1,\dots,j_k)\in
D_k}\mu(A_{j_1,\dots,j_k})>1-\dfrac\e2$. There exists an index
$n_0=n_0(\e)$ such that the set
$$
B(\e)=B(k,\e,n_0)=\bigcupp_{(j_1,\dots,j_k)\in D_k} \bigcupp_{n\ge n_0}
(B_{j_1,\dots,j_k}\,\triangle\, B_{j_1,\dots,j_k}(n))
$$
satisfies the inequality $\lambda (B(\e))<\dfrac\e2$, where
$A\,\triangle\,B=(A\setminus B)\cup(B\setminus A)$ denotes the
symmetric difference of two sets $A$ and $B$. The above relations imply
that the set $B_1(\e)=\bigcupp_{(j_1,\dots,j_k)\in D_k}
(B_{j_1,\dots,j_k}\setminus B(\e))$ satisfy the relation
$\lambda(B_1(\e))>1-\e$. Define the set $B_2(\e)=B_1(\e)\cap Y_0\cap
\bigcapp_{n=1}^\infty Y_{0,n}$, where
$Y_{0,n}=(0,1)\setminus\bigcupp_{k=1}^\infty \bigcupp_{j_1,\dots,j_k}
\{b_{j_1,\dots,j_k}(n)\}$, $n=1,2,\dots$, and
$Y_{0}=(0,1)\setminus\bigcupp_{k=1}^\infty
\bigcupp_{j_1,\dots,j_k} \{b_{j_1,\dots,j_k}(n)\}$.
The inequality $\lambda(B_2(\e))>1-\e$ also holds, since the set
$B_1(\e)\setminus B_2(\e)$ is countable. Furthermore, if $n>n_0$ and
$y\in B_2(\e)$, then $\rho(\xi_n(y),\xi(y))\le\e$, since in this case
$\xi_n(y)$ and $\xi(y)$ are in the closure of the same set
$A_{j_1,\dots,j_k}$. This implies that
$$
\lambda\(\limsup_{n\to\infty}|\xi_n-\xi|\ge\e\)<\e.
$$
Since this relation holds for all $\e>0$, it implies the statement of
problem~3.
\item{5.)} Put $\bar X=X\times[0,1]$, and let us define the measurable
space $(\Omega,\Cal A)$ as $\Omega=\bar X\times
X\times\cdots\times X\times\cdots$, and $\Cal A$ is the product of the
$\sigma$-algebras on the coordinate spaces $\bar X$ and $X$.
In the point $\oo=(x,u,x_1,x_2,\dots)\in\Omega$ let us define the random
variables $\xi$ and $\xi_n$ by the formulas $\xi(\oo)=x$,
$\xi_n(\oo)=x_n$, $n=1,2,\dots$. The measure $P$ will be defined
with the help of the measure $\bar\mu=\mu\times \lambda$ introduced on
the space $\bar X$ and appropriately defined conditional distributions
$Q_n((x,u),A)$, $n=1,2,\dots$, where $(x,u)\in \bar X=X\times[0,1]$, and
$A\subset X$ is a measurable subset of the space $X$.
(The function $Q_n((x,u),A)$ is called a conditional distribution
function if $Q_n((x,u),\cdot)$ is a probability measure on the space
$(X,\Cal A)$ for all points $(x,u)\in \bar X$, and
$Q_n(\cdot,A)$ is a measurable function for all sets $A\in\Cal A$.)
The measure $P$ will be defined in the following way: The distribution
of the first coordinate $(x,u)\in \bar X$ is $\bar\mu$, the coordinates
$x_n\in X$, $n=1,2, \dots$, are conditionally independent for fixed
$(x,u)$, and the conditional distribution of the coordinate $x_n$ under
this condition is $Q_n((x,u),\cdot)$.
In a formal way, put
$$
\align
P(A\times A_1\times\cdots\times A_n)&=\int_{(x,u)\in A}
Q_1((x,u),A_1)\dots Q_n((x,u),A_n))\,d\mu(x)\,du\\
&\qquad\text
{for all measurable sets }A\subset \bar X, \;A_j\subset X,\;j=1,\dots,n.
\endalign
$$
(By some non-trivial results of the measure theory the above formula
and its extension really defines a probability measure
$P$. It is worth mentioning that this fact follows from such a result
(the Tul\-cea--Io\-nes\-cu theorem), which does not demand some nice
topological properties of the space. This is the reason why this method
can be applied in non-complete metric spaces which may not have good
nice topological properties.)
\item{} For all numbers $k=1,2,\dots$, let us define such a partition
$\Cal A_k=\{A_{1,k},A_{2,k},\dots\}$ of $X$ for which
$\bar d(A_{j,k})<\dfrac1k$, and $\mu\(\partial A_{j,k}\)=0$,
$j=1,2,\dots$, where $\bar d(A)$ denotes the diameter and
$\partial A$ the boundary of the set $A$. Let us define for all
$k=1,2,\dots$ an index $m(k)$ such that
$\mu\(\bigcupp_{j\ge m(k)} A_{j,k}\)\le\dfrac1{k^2}$ and
$\mu(A_{j,k})>0$ for all $1\le j\le m(k)$. (To satisfy this latter
conditions we can re-index the sets $A_{j,k}$ if this is necessary.)
After this let us consider a series of numbers
$1=n_1\frac1k\)\le
\mu\times\lambda(X_1(k)) \\
&\qquad= \sum_{j=1}^{m(k)}\(1-\min_{n_k\le n\frac1k\)
\le\sum_{k=1}^\infty\frac2{k^2}<\infty
$$
it follows from the Borel--Cantelli lemma that $\xi_n\to\xi$ with
probability one if $n\to\infty$.
\item{6.)} Let $F$ be an arbitrary closed set. As
$F=\bigcapp_{\delta\to0}F^\delta=F$, where $F^\delta$ denotes the open
neighbourhood of radius $\delta$ of the set $F$, hence for all $\e>0$
there exists a $\delta=\delta(\e)>0$ such that
$\mu(F^\delta)<\mu(F)+\e$. As $\xi_n\Rightarrow\xi$ stochastically,
hence for all $\delta>0$ and $\e>0$ there exists such an index
$n_0=n_0(\e,\delta)$ for which $P\(\rho(\xi,\xi_n)>\delta\)<\e$.
Furthermore, $\{\oo\:\xi(\oo)\notin F^\delta\}\subset
\{\oo\:\xi_n(\oo)\notin
F\}\cup\{\oo\:\rho(\xi_n(\oo),\xi(\oo))>\delta\}$, hence
$1-\mu(F^\delta)\le 1-\mu_n(F)+\e$, and $\mu(F)+\e\ge\mu(F^\delta)
\ge\mu_n(F)-\e$, if $n\ge n_0$. This implies that $\limsupp_{n\to\infty}
\mu_n(F)\le \mu(F)+2\e$. Since this inequality holds for all $\e>0$,
this implies that $\limsupp_{n\to\infty}\mu_n(F)\le \mu(F)$ for all
closed sets $F$, i.e.\ $\mu_n\Rightarrow \mu$.
\item{7.)} To prove that $\xi=F^{-1}(\eta)$ is $F$ distributed it is
enough to show that
$$
\{\oo\: \eta(\oo)0$ for which
$F^{-1}(\eta)(\oo)=\sup\{u\: F(u)<\eta(\oo)\}0$. On the other hand, $\sup\{v\:F(v)0$, for which
$vx$, this implies that
$F^{-1}(\tilde F(\bar{\bar\xi}(\oo),\e(\oo)))\le
x=\bar{\bar\xi}(\oo)$ in this case. Problem~8 is solved.
\item{9.)} Let us remark that if $\xi$ and $\eta$ are defined by the
formula $\xi=F^{-1}(\zeta)$ and $\eta=G^{-1}(\zeta)$, where $\zeta$ is
a random variable with uniform distribution in the interval $[0,1]$,
that is these random variables are defined by means of the quantile
transform, then the two sides of the inequality investigated in this
problem are equal. Let us first prove this inequality in the special
case when the distributions of both random variables $\xi$ and $\eta$
are concentrated in a finite subset $X=\{x_1,\dots,x_n\}$,
$x_10.
$$
We show that in this case a new joint distributions $\tilde r(x_j,x_k)$,
$1\le j,k\le n$, can be introduced in such a way that the marginal
distribution of a random vector $(\tilde\xi,\tilde\eta)$ with this
distribution has marginal distributions $F$ and $G$, and
$E\Phi(\xi-\eta)\le E\Phi(\tilde\xi-\tilde\eta)$. Furthermore, if $\Phi$
is a strictly convex function then there is a strict inequality in the
last relation.
\item{} We shall define this probabilities in the following way.
$$
\align
\tilde r(x_j,y_j)&=r(x_j,y_j)+\tilde r \\
\tilde r(x_k,y_k)&=r(x_k,y_k)+\tilde r \\
\tilde r(x_j,y_k)&=r(x_j,y_k)-\tilde r \\
\tilde r(x_k,y_j)&=r(x_k,y_j)-\tilde r \\
\tilde r(x,y)&=r(x,y) \quad\text{otherwise.}
\endalign
$$
Then the marginal distributions of the random vector
$(\tilde\xi,\tilde\eta)$ are the prescribed ones, and
$$
E\Phi(\tilde\xi-\tilde\eta)-E\Phi(\xi-\eta)=\tilde r\(\Phi(x_j-y_j)+
\Phi(x_k-y_k)-\Phi(x_j-y_k)-\Phi(x_k-y_j)\).
$$
This expression is non-negative. Moreover, it is strictly positive, if
the function $\Phi(\cdot)$ is strictly convex. Indeed,
$$
\aligned
x_j-y_k <
\endaligned
\aligned
&x_j-y_j\\
&x_k-y_k
\endaligned
\aligned
0$ let us define the function
$\Phi_\e(x)=\Phi(x)+\e x^2$. Then $\Phi_e(\cdot)$ is a strictly
convex function, and we get by letting $\e\to0$ the relation
$$
\Phi(\xi-\eta)=\lim_{\e\to\e}\Phi_\e(\xi-\eta)\ge
\lim_{\e\to\e}\Phi_\e(\bar\xi-\bar\eta)=\Phi(\bar\xi-\bar\eta).
$$
This implies the statement of problem~9 in the case when the
distribution of the random variables $\xi$ and $\eta$ are
concentrated in finitely many points.
\item{} If $F$ and $G$ are two distribution functions concentrated to
a finite interval, then it is useful to approximate these distributions
by the discrete distributions $F_n(x)=F\(\frac{[nx]}n\)$ and
$G_n(x)=G\(\frac{[nx]}n\)$. More precisely, it is worthwhile to
approximate the two dimensional distribution $H(x,y)$ with marginal
distributions $F(x)$ and $G(y)$ by the distribution
$H_n(x,y)=H\(\frac{[nx]}n,\frac{[ny]}n\)$, where $[u]$ denotes the
integer part of the number~$u$. By applying the result of problem~9 in
the already proven case and letting $n\to\infty$ we get the desired
results in this case.
\item{} The general case can be reduced to the previous case by means
of an appropriate limiting procedure. Let us truncate our random
variables at the level $\pm u$. We do this truncation in such a way
that if the vector $(\xi,\eta)$ takes its value outside the square
$[-u,u]\times[-u,u]$, then the truncated version of this vector has its
value in the origin. Then carrying out the limit procedure
$u\to\infty$ we can get the result of problem~9 in the general case.
We make some comments which may be useful when carrying out this
limit procedure. The convexity of the function $\Phi$ implies that
$\Phi(x)\ge Ax+B$ for all real numbers $x$ with appropriate constants
$A$ and $B$, hence $E\Phi(\xi-\eta)\ge-|A|E(|\xi|+|\eta|-|B|>-\infty$
under the conditions of problem~9. Let us remark that by adding an
appropriate linear function to $\Phi(x)$ we can reduce the problem to
the case when $\Phi(x)\ge0$ for all real numbers $x$, and $\Phi(0)=0$.
We may also assume that $E\Phi(\xi-\eta)<\infty$, since the statement
of problem~9 is otherwise trivial. These remarks may simplify the limit
procedure. At the left-hand side of the inequality we can apply the
monotone convergence theorem, and at the right-hand side the
lemma Fatou. We omit the details.
\item{9a.)} As $f(x)=|x|$ is a convex function, the result of problem~9
can be applied in this case. To finish the proof it is enough to
show the identity
$$
\int_0^1 \left|F^{-1}(x)-G^{-1}(x)\right|\,dx=
\int_{-\infty}^\infty|F(x)-G(x)|\,dx.
$$
This relation can be seen for instance by considering the domain
$$
D=\{(x,y)\: -\infty0$, of the form
$\e=\dfrac1M$, where $M$ is a positive integer, and define
for all $k=1,2,\dots$, the subsequence $0=m_{k,0}m_{k,r}$ for which $u_{k,L}-u_{k,m_{k,r}}
\ge\e$, $u_{k,L-1}-u_{,m{k,r}}<\e$ if $u_{k,n_k}-u_{k,m_{k,r}}\ge\e$,
and $m_{k,r+1}=n_k$ and $r+1=s(k,\e)$ is otherwise. Put $\bar
u_{k,j}=\dfrac{u_{k,j}}{u_{k,n_k}}$, $1\le j\le n_k$. Let us observe
that $\limm_{k\to\infty}\supp_{1\le r\le s(k,\e)}\dfrac{|\bar u_{k,m_r}
-\bar u_{k,m_{r-1}}-\e|}\e=0$, and the random variables
$T_{k,j}=\dfrac{S_{k,m_{k,r_j}}-S_{k,m_{k,r_{j-1}}}}
{\sqrt{\bar u_{k,m_{k,j}} -\bar u_{k,m_{k,j-1}}}}$ satisfy the central
limit theorem with arbitrary indices $j$ for which $1\le r_j\le
s(k,\e)$, and $S_{k,r}=\summ_{j=1}^r\xi_{k,j}$. Let us also observe
that $s(k,\e)\le \e^{-1}$.
\item{} The above results together with the coupling results of this
paper (e.g.problem~2 or Statement~A can be applied) imply that such
pairs of independent random variables $(\tilde T_{k,j},X_{k,j})$, $1\le
j\le s(k,\e)$ can be constructed for all $k=1,2,\dots$ for which
the random variables $X_{k,j}$ have standard normal distribution,
the distributions of the random variables $T_{k,j}$ and $\tilde
T_{k,j}$ agree, and $\supp_{1\le j\le s(k,\e)}\left|\tilde
T_{k,j}-X_{k,j}\right| \Rightarrow0$, where $\Rightarrow$ denotes
convergence in probability. Let $Z_{k,l,\e}=\summ_{j=1}^l \sqrt{\bar
u_{k,m_{k,j}}-\bar u_{k,m_{k,j-1}}}\tilde T_{k,j}$,
$Y_{k,l,\e}=\summ_{j=1}^l \sqrt{\bar u_{k,m_{k,j}}-\bar
u_{k,m_{k,j-1}}} X_{k,j}$, $1\le l\le s(k,\e)$, and put $Z_{k,0,\e}=0$
and $Y_{k,0,\e}=0$, Then the relation $\supp_{1\le l\le
s(k,\e)}\left|Z_{k,l,\e} -Y_{k,l,\e}\right|\Rightarrow0$ also holds.
Since this relation holds for all numbers $\e$ of the form
$\e=\dfrac1M$, the relation
$$
\supp_{1\le l\le s(k,\e_k)} \left|Z_{k,l,\e_k}-Y_{k,l,\e_k} \right|
\Rightarrow0,\quad \text{and $\e_k\to0$ if } k\to\infty \tag1
$$
also holds with an appropriate sequence $\e_k$, $k=1,2,\dots$.
Because of the distribution of the sequences $(Z_{k,j,\e_k},\; 1\le
j\le s(k,\e_k))$, $(Y_{k,j,\e_k},\; 1\le j\le s(k,\e_k)$ and the result
of Problem~10 a triangular array $\tilde S_{k,j}$ and a sequence of
Wiener processes $W_k(t)$ can be constructed in such a way that the
distribution of the sequences $\tilde\xi_{k,j}$ and $\xi_{k,j}$, $1\le
j\le n_k$, agree for a fixed $k=1,2,\dots$, and beside this the
partial sums $\tilde S_{k,l}=\summ_{j=1}^l\tilde \xi_{k,j}$ satisfy
the identity $\tilde S_{k,m_{k,l}}=Z_{k,l,\e_k}$,
$W_k(\bar u_{k,m_{k,l}})=Y_{k,l,\e_k}$, $1\le l\le s(k,\e_k)$. (Actually
we construct directly the partial sums $\tilde S_{k,j}$ and not the
random variables $\xi_{k,j}$. We do this under the condition that
their values are prescribed for certain indices and their
distributions must agree with the distribution of the corresponding
partial sums of the random variables $\xi_{k,j}$. The Wiener
processes $W_k(\cdot)$ can be constructed in a simpler way by
using its Markov property and some basic facts about Wiener processes.)
\item{} We claim that the random broken lines $\tilde S_k(t)$
determined by the above constructed triangular array
$\tilde\xi_{k,j}$, $k=1,2,\dots$, $1\le j\le n_k$ together with the
Wiener processes $W_k(t)$ constructed together with it satisfy the
relation $\supp_{0\le t\le1}|\tilde S_k(t)-W_k(t)|\Rightarrow0$.
It follows from the construction and relation~(1) that
$$
\supp_{1\le r\le s(k,\e_k)}\left|\tilde
S_k(\bar u_{k,m_{k,r}})-W_k(\bar u_{k,m_{k,r}})\right|\Rightarrow0,
$$
and beside this $\limm_{k\to\infty}\supp_{1\le r\le
s_k}\(\bar u_{i,m_{k,r}}-\bar u_{k,m_{k,r-1}}\)=0$. Hence to prove the
functional central limit theorem it is enough to prove that
$$ \allowdisplaybreaks
\align
&P\(\sup_{0\le r< s(k,\e_k)}\sup_{\bar u_{k,m_{k,r}} \le u\le
\bar u_{k,m_{k,r+1}}}\left|W_k(u)-W_k(\bar u_{k,m_{k,r}})\right|>\e\) \\
&\qquad \le \sum_{0\le r< s(k,\e_k)} P\(\sup_{\bar u_{k,m_{k,r}} \le
u\le \bar u_{k,m_{k,r+1}}}\left|W_k(u)-W_k(\bar
u_{k,m_{k,r}}\right|>\e\) \to0 \tag2a \\
&P\(\sup_{0\le r< s(k,\e_k)}\sup_{\bar u_{k,m_{k,r}} \le u\le
\bar u_{k,m_{k,r+1}}}\left|\tilde S_k(u)-\tilde
S_k(\bar u_{k,m_{k,r}})\right|>\e\) \\
&\qquad\le \sum_{0\le r< s(k,\e_k)} P\(\sup_{\bar u_{k,m_{k,r}} \le u\le
\bar u_{k,m_{k,r+1}} }\left|\tilde S_k(u)-\tilde
S_k(\bar u_{k,m_{k,r}})\right|>\e\)\to0 \tag2b
\endalign
$$
as $k\to\infty$ for all $\e>0$. Relation (2.2b) can be rewritten because
of the structure of the random broken lines $\tilde S_k(t)$ as
$$
\sum_{0\le r< s(k,\e_k)} P\(\sup_{m_{k,r}+1 \le p\le
m_{k,r+1}}\left|\sum_{j=m_{k,r}+1}^p \tilde \xi_{k,j}\right|
>\e\)\to0\quad\text{for all }\e>0. \tag2c
$$
\item{} The proof of relation (2a) is relatively simple, since
the probabilities at its right-hand side can be calculated explicitly.
Indeed,
$$ \allowdisplaybreaks
\align
&P\(\supp_{\bar u_{k,m_{k,r}} \le u\le \bar u_{k,m_{k,r+1}}
}\left|W_k(u)-W_k(\bar u_{k,m_{k,r}}\right|>\e\)\\
&\qquad=P\(\supp_{0 \le u\le1} \left|W_k(u)\right| >\frac\e{\sqrt{\bar
u_{k,m_{k,r+1}}-\bar u_{k,m_{k,r}}}} \)\\
&\qquad \le 4\(1-\Phi\(\frac\e{\sqrt{\bar u_{k,m_{k,r+1}}-\bar
u_{k,m_{k,r}}}} \)\),
\endalign
$$
where $\Phi(\cdot)$ denotes the standard normal distribution function.
Because of the relation $\limm_{k\to\infty}\supp_{1\le r~~\delta_k)=0$. Let us fix such a sequence $\delta_k$,
and let us choose the decomposition $\tilde\xi_{k,j}=\bar\xi_{k,j}
+\bar{\bar\xi}_{k,j}$ with $\bar\xi_{k,j}=
\tilde\xi_{k,j}I(|\tilde\xi_{k,j}|>\delta_k)
-E\tilde\xi_{k,j}I(|\tilde\xi_{k,j}|>\delta_k)$ and
$\bar{\bar\xi}_{k,j}=\tilde\xi_{k,j}I(|\tilde\xi_{k,j}|\le\delta_k)
-E\tilde\xi_{k,j}I(|\tilde\xi_{k,j}|\le\delta_k)$. Then
$\limm_{k\to\infty}\summ_{j=1}^{n_k}E\bar\xi_{k,j}^2=0$,
$\limm_{k\to\infty}\summ_{j=1}^{n_k}E\bar{\bar\xi}_{k,j}^2=1$,
$E\bar{\bar\xi}_{k,j}^4\le\delta_k^2E\bar{\bar\xi}_{k,j}^2$, and
to prove relation (2c) it is enough to show that
$$
\align
\sum_{0\le r< s(k,\e_k)} &P\(\sup_{m_{k,r}+1 \le p\le
m_{k,r+1}}\left|\sum_{j=m_{k,r}+1}^p \bar \xi_{k,j}\right|
>\e\)\to0\quad\text{for all }\e>0, \tag3a \\
\sum_{0\le r< s(k,\e_k)} &P\(\sup_{m_{k,r}+1 \le p\le
m_{k,r+1}}\left|\sum_{j=m_{k,r}+1}^p \bar{\bar\xi}_{k,j}\right|
>\e\)\to0 \quad\text{for all }\e>0. \tag3b
\endalign
$$
We can write $P\(\supp_{m_{k,r}+1 \le p\le m_{k,r+1}}\left|
\summ_{j=m_{k,r}+1}^p \bar \xi_{k,j}\right| >\e\)\le\frac
{\summ_{j=m_{k,r}+1}^{m_{k,r+1}} E\bar\xi_{k,j}^2}{\e^2}$ by the
Kolmogorov inequality. We get relation (3.1) by summing up this
relation for $0\le r\le s(k,\e_k)$ and applying the relation
$\limm_{k\to\infty}\summ_{j=1}^{n_k}E\bar\xi_{k,j}^2=0$.
\item{} The proof of relation (3b) demands a more intricate argument.
Let us observe that the following version of the Kolmogorov inequality
holds.
$$
P\(\supp_{m_{k,r}+1 \le p\le m_{k,r+1}}\left|
\summ_{j=m_{k,r}+1}^p\bar{\bar \xi}_{k,j}\right| >\e\)\le\frac
{E\(\summ_{j=m_{k,r}+1}^{m_{k,r+1}} \bar{\bar\xi}_{k,j}\)^4}{\e^4}.
$$
Indeed, the sequence of the random variables $\(\summ_{j=m_{k,r}+1}^p
\bar{\bar\xi}_{k,j}\)^4$, $p=m_{k,r}+1$,\dots, $m_{k,r+1}$, is
a semimartingale. (This follows for instance from the fact that the
convex function of a martingale is a semimartingale.) Then the above
inequality can be proved similarly to the proof of the Kolmogorov
inequality with the help of this fact. We can prove formula (3b)
with the help of a good estimate on $E\(\summ_{j=m_{k,r}+1}^{m_{k,r+1}}
\bar{\bar\xi}_{k,j}\)^4$. Indeed, we have
$$
\align
E\(\summ_{j=m_{k,r}+1}^{m_{k,r+1}} \bar{\bar\xi}_{k,j}\)^4\!&\le
\summ_{j=m_{k,r}+1}^{m_{k,r+1}} E\bar{\bar\xi}_{k,j}^4+
3\(\summ_{j=m_{k,r}+1}^{m_{k,r+1}} E\bar{\bar\xi}_{k,j}^2\)^2 \\
&\le \(\delta_k^2+3\!\supp_{0\le
r~~~~\e\) \\
&\qquad\le \frac{\(\delta_k^2+3\supp_{0\le r~~~~|A|$.
\medskip
In case a.) we claim that the conditions of the theorem hold for both
pairs of sets $\bar Y=A$, $\bar Z=B(A)$ and $\bar Y=Y\setminus A$,
$\bar Z=Z\setminus B(A)$ and the restriction of the function $d(y,z)$
to these sets $\bar Y\times \bar Z$. Then the inductive hypothesis
implies the sufficiency of the condition in case~a.) In case
$\bar Y=A$, $\bar Z=B(A)$ this statement is obvious. If
$\bar Y=Y\setminus A$ and $\bar Z=Z\setminus B(A)$ let us consider the
set $C\subset Y\setminus A$. Put $\bar C=C\cup A$. Then $|C|=|\bar
C|-k$, $|B(\bar C)|\ge |\bar C|$, and $B(C)\cap \bar Z\supset B(\bar
C)\setminus B(A)$. Hence $|B(C)\cap \bar Z|\ge |B(\bar C)|-k\ge |\bar
C|-k$, and $|B(C)\cap \bar Z|\ge |C|$, as we have claimed.
In case b.) let us consider a point $z_j\in Z$, for which
$d(y_1,z_j)=1$. Let us pair the point $y_1$ with the point $z_j$.
To complete the proof of the theorem it is enough to show that
the sets $\bar Y=Y\setminus\{y_1\}$ and $\bar Z=Z\setminus \{z_j\}$
satisfy the conditions of the theorem in case~b.). But this is obvious,
since in case b.) $|B(A)\cap\bar Z|\ge |B(A)|-1\ge |A|$ for all sets
$A\subset Z$. \medskip \noindent
{\it The proof of the continuous version of the K\"onig--Hall
theorem.}\/ The necessity of the condition is obvious also in this
case. We prove the sufficiency of the conditions by means of the
original K\"onig--Hall theorem.
Let us first consider the special case when there is an integer $N$
such that $u(y_j)=\dfrac{k_j}N$ for all $y_j\in Y$ and
$v(z_l)=\dfrac{p_l}N$ for all $z_l\in Z$, where $k_j$ and $p_l$ are
integers. In this case let us define the following bipartitated graph:
$\bar Y=\{(y_j, m(j)),y_j\in Y,\;1\le m(j)\le k_j\}$, $\bar Z=\{(z_l,
n(l)), z_l\in Z,\;1\le n(l)\le p_l\}$, $\bar d((y_j,m(j)),(z_l,n(l)))
=d(y_j,z_l)$. Then the bipartitated graph $(\bar Y,\bar Z, \bar
d(\cdot,\cdot))$ satisfies the conditions of the K\"onig--Hall theorem.
Indeed, it is enough to check the conditions of the K\"onig--Hall
theorem for those sets $A\subset\bar Y$ for which in the case $\bar
y=(y_j,m(j))\in A$ with some points $(y_j,m(j))\in \bar Y$ also the
relation $(y_j,k)\in A$ holds for all points $(y_j,k)$, $1\le k\le
k_j$. On the other hand, this condition agrees with the relation
$\summ_{z\in B(A)} v(z)\ge \summ_{y\in A}u(y)$ for all $A\in Y$.
Hence there exists such a pairing of the elements of the sets
$\bar Y$ and $\bar Z$, in which the points $\bar y$ and $\bar z$ in
a pair satisfy the relation $\bar d(\bar y,\bar z)=1$. Put
$\bar w(\bar y,\bar z)=1$, if $\bar y$ and $\bar z$ are in a pair,
and $\bar w(\bar y,\bar z)=0$ otherwise. Then the function
$w(y_j,z_l)=\dfrac1N\summ\Sb \bar y=(y_j, m(j)),1\le m(j)\le k_j\\
\bar z=(z_l,n(l)),1\le n_l\le p_l\endSb \bar w(\bar y,\bar z)$
satisfies the statement of the theorem in this special case.
The general case can be reduced to the already proved situation with
an appropriate approximation. We can assume without violating the
generality that $\summ_{y\in Y} u(y)=\summ_{z\in Z} v(z)=1$. For all
$N=1,2,\dots$ let us define the following system approximating the
original one. Put $\bar Y=\bar Y_N=Y\cup \{r+1\}$, $\bar Z=\bar
Z_N=Z\cup \{s+1\}$, $\bar d(y,z)=\bar d_N(y,z)=d(y,z)$, if $y\in Y$,
$z\in Z$, and $\bar d(y_{r+1},z)=\bar d_N(y_{r+1},z)=\bar
d(y,z_{s+1})=\bar d_N(y,z_{s+1})=1$, (this means that the points
$y_{r+1}$ and $z_{s+1}$ are connected to all points of the other set),
$\bar u_N(y)=\dfrac{[Nu(y)]}N$, if $y\in Y$, $\bar v_N(z)=
\dfrac{[Nv(z)]}N$, if $z\in Z$, where $[u]$ denotes the integer
part of the number $u$, beside this $\bar u_N(y_{r+1})= \summ_{y\in
Y}\(u(y)-\bar u_N(y)\) =1-\summ_{y\in Y}\bar u_N(y)$, and $\bar
v_N(y_{s+1})=\summ_{z\in Z}\(v(z)-\bar v_N(z)\)=1-\summ_{z\in Z}\bar
v_N(z)$. Let us observe that this new system also satisfies the
conditions of the theorem. Indeed, if $A\subset Y_N$, then the set of
points $B_N(A)$ of $\bar Z_N$ connected to a set $A\subset Y_N$ is
$\bar B_N(A)=B(A)\cup\{z_{s+1}\}$, and $\summ_{y\in A}\bar u_N(y)\le
\summ_{y\in A}u(y)\le \summ_{z\in B(A)}v(z)\le\summ_{z\in B(A)}\bar
v_N(z)+\bar v_N(z_{s+1})\le\summ_{z\in \bar B(A)}\bar v_N(z)$. If
$y_{r+1}\in A$, then $\bar B_N(A)=\bar Z_N$, and $\summ_{y\in A}\bar
u_N(y)\le1=\summ_{z\in\bar Z_N}\bar v_N(z)$. Hence the already proved
part of the Theorem can be applied in this case.
Let us consider for all $N=1,2,\dots$ a ``transport function"
$\bar w_N(y,z)$, $y\in \bar Y$, $z\in \bar Z$ satisfying the theorem.
As $0\le \bar w_N(y,z)\le 1$ for all points $y\in\bar Y$, $z\in\bar
Z$, and numbers $N=1,2,\dots$ a subsequence $N_k\to\infty$ of the
integers can be chosen in such a way that the limit
$w(y,z)=\limm_{k\to\infty}w_{N_k}(y,z)$ exists for all points $y\in
\bar Y$ and $z\in \bar Z$. This function $w(y,z)$ satisfies the theorem.
\bye
~~