\magnification=\magstep1
\hsize=15truecm
\input amstex
\TagsOnRight
\parindent=20pt
\parskip=1.5pt plus 1pt
\define\const{\text{\rm const.}\,}
\define\Cov{\text{\rm Cov}\,}
\define\Var{\text{\rm Var}\,}
\font\script =cmcsc10
\font\kisbetu=cmr8
\centerline{\bf ESTIMATION OF MULTIPLE RANDOM INTEGRALS}
\centerline{\bf AND $U$-STATISTICS.}
\smallskip
\centerline{\it P\'eter Major}
\centerline{\it Alfr\'ed R\'enyi Alfr\'ed Mathematical Institute}
\centerline{\it of the Hungarian Academy of Sciences}
\medskip\noindent
{\narrower Here I give a short survey about some problems on
multiple random integrals and $U$-statistics together with
some other questions which occur during their study in a natural
way. I write down the main results, discuss their background
together with some pictures and mathematical ideas which explain
them better. \par}
\beginsection 1. Introduction. Formulation of the problems.
To formulate the problems first I introduce some notations.
\noindent
Let $\xi_1,\dots,\xi_n$ be a sequence of independent and identically
distributed random variables with some probability distribution $\mu$
on a measurable space $(X,\Cal X)$ and let $\mu_n$,
$$
\mu_n(A)=\dfrac1n\#\{j\:\,\xi_j\in A,\;1\le j\le n\},\quad A\in\Cal X,
$$
denote its empirical distribution. Let a measurable function
$f(x_1,\dots,x_k)$ of $k$-variables be given on the product space
$(X^k,\Cal X^k)$. Take the $k$-fold direct product of the normalized
version $\sqrt n(\mu_n-\mu)$ of this empirical measure $\mu_n$ and
consider the integral of the function $f$ with respect to it.
More explicitly, the (random) integral
$$
\aligned
J_{n,k}(f)&=\frac{n^{k/2}}{k!} \int'
f(x_1,\dots,x_k)(\mu_n(\,dx_1)-\mu(\,dx_1))\dots
(\mu_n(\,dx_k)-\mu(\,dx_k)),\\
&\;\;\text{where the prime in $\tsize\int'$ means that
the diagonals } x_j=x_l,\; 1\le ju)$ under appropriate conditions on the function $f$.
\medskip
\item{} (The omission of the diagonals $x_j=x_l$, $j\neq l$, from
the domain of integration turned out to be natural in possible
applications.)
The second, more general problem is the following one.
\medskip
\item{} {\it Problem B).} Let a nice class $\Cal F$ of functions
$f(x_1,\dots,x_k)$ be given on the space $(X^k,\Cal X^k)$. Give a
good estimate on the probabilities
$P\left(\sup\limits_{f\in\Cal F}J_{n,k}(f)>u\right)$, where
$J_{n,k}(f)$ denotes the integral of the function $f$ defined
in formula (1.1).
\medskip
It turned out useful to study these two problems together
with their $U$-statistic analogues. To formulate them first
I recall the definition of $U$-statistics.
\medskip\noindent
{\bf The definition of $U$-statistics.} {\it Let a sequence
$\xi_1,\dots,\xi_n$ of independent and identically distributed
random variables be given with values in some measurable space
$(X,\Cal X)$ together with a function $f(x_1,\dots,x_k)$ on the
$k$-fold product space $(X^k,\Cal X^k)$ with some $k\le n$. The
expression
$$
I_{n,k}(f)=\frac1{k!}\sum\limits\Sb 1\le j_s\le n,\; s=1,\dots, k\\
j_s\neq j_{s'} \text{ if } s\neq s'\endSb
f\left(\xi_{j_1},\dots,\xi_{j_k}\right) \tag1.2
$$
is called a $U$-statistic of order $k$ with kernel function $f$.}
\medskip
I formulated the following versions of the above two problems.
\item{} {\it Problem A$'$).} Give a good estimate on the
probabilities $P(n^{-k/2}I_{n,k}(f)>u)$ under appropriate conditions
on the function $f$.
\medskip
\item{} {\it Problem B$'$).} Let a nice class $\Cal F$ of functions
$f(x_1,\dots,x_k)$ be given on a (product) space $(X^k,\Cal X^k)$
together with a sequence of independent and identically distributed
random variables $\xi_1,\dots,\xi_n$ with values in $(X,\Cal X)$.
Give a good estimate on the probabilities
$P\left(\sup\limits_{f\in\Cal F}n^{-k/2}I_{n,k}(f)>u\right)$,
where $I_{n,k}(f)$ denotes the $U$-statistic of order $k$
with kernel function $f$ defined in formula~(1.2).
\medskip
It may be useful to remark that a $U$-statistic of order $k$
with the kernel function $f$ can be rewritten as
$$
I_{n,k}(f)=\frac{n^k}{k!}\int'
f(x_1,\dots,x_k)\mu_n(\,dx_1)\dots\mu_n(\,dx_k),
$$
where $\mu_n$ is the empirical distribution of the sequence
$\xi_1,\dots,\xi_n$. This shows that the essential difference
between the random integrals introduced in formula (1.1) and the
$U$-statistics is that in the random integrals $J_{n,k}(f)$
integration is taken with respect to the `normalized' measures
$\mu_n-\mu$, while in the integral representation of the
$U$-statistics $I_{n,k}(f)$ with respect to the `non-normalized'
measures $\mu_n$.
\medskip
I met the above problems when tried to adapt a simple method
applied in the study of the asymptotic behaviour of maximum
likelihood estimates to the investigation of harder problems.
In the study of maximum likelihood estimates the root of the
so-called maximum likelihood equation has to be well estimated.
This can be done by means of a good approximation of the
function in the maximum likelihood equation which is obtained
with the help of its Taylor expansion if the high order terms
of this expansion are omitted. It has to be shown that such an
approximation causes only a negligibly small error. But this
can be proved relatively simply.
I tried to apply a similar method in the study of some so-called
non-parametric maximum likelihood estimation problems. Such a
problem arises for instance if we want to estimate an unknown
distribution function by means of some partial information. In
the study of the error of such an estimate in a fixed point a
version of the Taylor expansion can be applied together with the
omission of the higher order terms. But in this case it is much
harder to show that such an approximation causes only a negligibly
small error. To prove this a good solution of Problem~A is
needed. If we want to bound the error of the estimation in all
points simultaneously, then we need a good solution of Problem~B.
In the investigation of general non-parametric estimation problems
several additional difficulties have to be overcome, but the
solution of Problems~A and~B is especially important. Beside this,
these problems are related to a better understanding of some
fundamental probabilistic phenomena. Hence I found useful a
detailed study of the above questions.
\beginsection 2. An overview of the problems. The study of the
one-variate case.
Before a detailed discussion it is worth thinking
over what kind of results can be expected. Let us observe that
the normalized signed measures $\sqrt n(\mu_n-\mu)$ converge to
a Gaussian field as $n\to\infty$. Hence it is natural to expect
that under very general conditions such results hold both in the
solution of Problem~A and Problem~B which their Gaussian counterparts
suggest. But we have to understand the answer to the following
two questions.
\medskip
\item{1.)} What kind of estimates do the Gaussian counterparts of
these problems suggest?
\item{2.)} What does the expression `under very general conditions'
mean?
\medskip
To clarify the above questions it is useful to study first Problem~A
in the case $k=1$ when the distribution of sums of independent
random variables has to be bounded. Such a bound is given in
the following classical result called Bernstein's inequality.
\medskip\noindent
{\bf Bernstein's inequality.} {\it Let $\xi_1,\dots,\xi_n$ be
independent random variables which satisfy the relations
$P(|\xi_j|\le1)=1$ and $E\xi_j=0$, $1\le j\le n$. Let us introduce
the notation $\sigma_j^2=E\xi_j^2$, $1\le j\le n$,
$S_n=\sum\limits_{j=1}^n \xi_j$ and $V_n^2=\text{\rm Var}\, S_n
=\sum\limits_{j=1}^n\sigma_j^2$. The inequality
$$
P(S_n>u)\le\exp\left\{-\frac{u^2}{2V_n^2\left(1+\frac u{3V_n^2}
\right)}\right\} \tag2.1
$$
holds for all numbers $u>0$.}
\medskip
Bernstein's inequality yields an estimate on the distribution of
sums of independent random variables suggested by the central
limit theorem, although the coefficient $1+\frac u{3V_n^2}$ in the
denominator of the upper bound slightly modifies the picture. In
the next remark the effect of this factor is considered in
different cases.
\medskip
\item{a)} If $u\le \varepsilon V_n^2$ with some small number
$\varepsilon>0$, then $P(S_n>u)\le e^{-(1-\varepsilon)u^2/2V_n^2}$.
This is almost such a good estimate as the estimate obtained by a
formal application of the central limit theorem.
\item{b)} If $u\le 3 V_n^2$, then
$P(S_n>u)\le e^{-\text{const.}\, u^2/2V_n^2}$. This is a bound
similar to that suggested by the central limit theorem, only it
has a worse constant in the exponent.
\item{c)} If $u\gg V_n^2$, then
$$
P(S_n>u)\le e^{-u}. \tag2.2
$$
This is a very bad bound. In particular, it does not depend on the
variance of the sum.
\medskip
The question arises whether Bernstein's inequality can be improved
in the `bad case' $u\gg V_n^2$. To this question a positive answer
can be given. Bennett's inequality formulated below yields a
slight improvement of Bernstein's inequality in this case.
\medskip\noindent
{\bf Bennett's inequality.} {\it Let $\xi_1,\dots,\xi_n$ be a
sequence of independent random variables which satisfy the
relations $P(|\xi_j|\le1)=1$ and $EX_j=0$, $1\le j\le n$. Put
$\sigma_j^2=E\xi_j^2$, $1\le j\le n$, $S_n=\sum\limits_{j=1}^n \xi_j$
and $V_n^2=\text{\rm Var}\,S_n=\sum\limits_{j=1}^n\sigma_j^2$.
Then the inequality
$$
P(S_n>u)\le\exp\left\{-V^2_n\left[\left(1+\frac u{V^2_n}\right)
\log\left(1+\frac u{V^2_n}\right)-\frac u{V^2_n}\right]\right\}
$$
holds for all numbers $u>0$.
Hence there exists a constant $B=B(\varepsilon)>0$ for all
$\varepsilon>0$ such that
$$
P(S_n>u)\le\exp\left\{-(1-\varepsilon)u\log \frac u{V^2_n}
\right\}\quad\text{if } u>BV_n^2,
$$
and there exists a number $K>0$ such that
$$
P(S_n>u)\le\exp\left\{-Ku\log \frac u{V^2_n}
\right\}\quad\text{if }u\ge3V_n^2. \tag2.3
$$}
\medskip
Formula (2.3) yields a slight improvement of formula (2.2), but even
this bound is very far from the estimate suggested by the central
limit theorem. On the other hand, as the next example shows,
this estimate cannot be improved.
\medskip\noindent
{\bf Lower bound for the distribution of sums of independent
random variables in an appropriate example.} {\it Let us fix a
positive integer $n$ together with two positive numbers $u$ and
$\sigma^2$ which satisfy the relations $0<\sigma^2\le\frac18$,
$n>3u\ge6$ and $u>3n\sigma^2$. Let us introduce the quantity
$V_n^2=n\sigma^2$, and consider a sequence of independent and
identically distributed random variables $\xi_1,\dots,\xi_n$ such
that $P(\xi_j=1)=P(\xi_j=-1)=\frac{\sigma^2}2$ and
$P(\xi_j=0)=1-\sigma^2$. Put $S_n=\sum\limits_{j=1}^n \xi_j$. In
this example $ES_n=0$, $\text{\rm Var}\,S_n=V_n^2$, and
$$
P(S_n\ge u)>\exp\left\{-Ku\log \frac u{V^2_n}\right\} \tag2.4
$$
with some appropriate number $K>0$.}
\medskip
In formula (2.4) that probability is bounded from below in a
special case which is bounded from above in~(2.3). These two
estimates are very similar. The only difference between them is
that they may contain a different constant $K>0$. The above
results can be summarized in the following way.
For small numbers $u>0$ the probability $P(S_n>u)$ satisfies a
good estimate suggested by the central limit theorem. Such a
situation holds if $u\le \varepsilon V_n^2$. This probability
satisfies a slightly weaker estimate for not too large numbers~$u$
(if $\varepsilon V_n^2\le u\le CV_n^2$ with some fixed number
$C>0$), and it satisfies only very weak estimates for large
numbers~$u$ (if $u\gg V_n^2$).
\beginsection 3. Some results useful in the study of the general
case.
In the solution of Problem A) in the general case $k\ge1$ similar
results hold as in the special case $k=1$ discussed before.
To understand their similarity better it is useful to study first
the following two questions.
\medskip\noindent
{\it Question 1.}\/ In the case $k=1$ the sum of independent
random variables with {\it zero expectation} was considered. What
kind of normalization corresponds to this zero expectation in the
case $k\ge2$?
\medskip\noindent
{\it Question 2.}\/ In the case $k=1$ the central limit theorem
and the behaviour of the normal distribution function were in the
background of the estimates. What kind of limit theorem and
estimation do take their part in the case $k\ge2$?
\medskip\noindent
{\it Discussion of the first question.}\/
\medskip\noindent
It is useful to consider first the second moment of the expressions
we are investigating. In the case $k=1$ independent random variables
of expectation zero are summed up. In this case the identity
$$
\text{Var}\,\left(\sum\limits_{k=1}^n\xi_k\right)=
\sum\limits_{k=1}^n\text{Var}\,\xi_k,
$$
holds because of the identity $E\xi_i\xi_j=0$ for all pairs $i\neq j$.
The multivariate version of this identity (in the case of
$U$-statistics) would be the identity
$$
\align
I_{n,k}(f)&=\text{Var}\,\left(\frac1{k!}
\sum \Sb 1\le j_s\le n,\; s=1,\dots, k,\; \\
j_s\neq j_{s'} \text{ if } s\neq s' \endSb
f(\xi_{j_1},\dots,\xi_{j_k})\right) \\
&=\frac1{k!} \sum\Sb 1\le j_s\le n,\; s=1,\dots, k,\; \\
j_s\neq j_{s'} \text{ if } s\neq s'\endSb
\text{Var}\, f(\xi_{j_1},\dots,\xi_{j_k})
\endalign
$$
This identity holds if
$$
Ef(\xi_{j_1},\dots,\xi_{j_k})f(\xi_{j_1'},\dots,\xi_{j_k'})=0
$$
for all pairs of $k$-tuples such that
$\{j_1,\dots,j_k\}\neq \{j_1',\dots,j_k'\}$. The above relation
holds for the degenerate $U$-statistics introduced below.
\medskip\noindent
{\bf Definition of degenerate $U$-statistics.} {\it Take a
$U$-statistic $I_{n,k}(f)$ determined by a sequence of independent
and identically distributed random variables $\xi_1,\dots,\xi_n$
with distribution $\mu$ and a kernel function $f(x_1,\dots,x_k)$.
This $U$-statistic is degenerate if
$$
\align
&E(f(\xi_1,\dots,\xi_k)|\xi_1=x_1,\dots,\xi_{j-1}=x_{j-1},
\xi_{j+1}=x_{j+1},\dots,\xi_k=x_k)=0 \\
&\qquad\qquad \text{for all indices }1\le j\le k\text{ and values }
x_s\in X, \; s\in\{1,\dots,k\}\setminus\{j\}.
\endalign
$$
}\medskip
A $U$-statistic is degenerate if its kernel function is canonical,
i.e. it satisfies the following property.
\medskip \noindent
{\bf Definition of canonical functions.} {\it A function
$f(x_1,\dots,x_k)$ defined on the $k$-fold direct product
$(X^k,\Cal X^k)$ is canonical with respect to a probability
measure $\mu$ on the space $(X,\Cal X)$ if
$$
\align
&\int f(x_1,\dots,x_{j-1},u,x_{j+1},\dots,x_k)\mu(\,du)=0 \\
&\qquad\text{for all indices \ } 1\le j\le k\text{ and values }
x_s\in X,\; s\in\{1,\dots,k\}\setminus\{j\}.
\endalign
$$
\medskip}
The notion of degenerate $U$-statistics is useful, because in some
sense such $U$-statistics behave so as sums of independent random
variables with {\it expectation zero}. Beside this, the study of
general $U$-statistics can be reduced to the study of degenerate
$U$-statistics by means of the following Hoeffding-decomposition.
\medskip\noindent
{\bf Hoeffding decomposition of general $U$-statistics.} {\it All
$U$-statistics $I_{n,k}(f)$ of order $k$ can be written in the
form of a linear combination
$$
I_{n,k}(f)=\sum_{j=0}^k n^{k-j} I_{n,j}(f_j) \tag3.1
$$
of {\rm degenerate $U$-statistics} $I_{n,j}(f_j)$. The (canonical)
kernel functions $f_j$ (of $j$ variables) of the degenerate
$U$-statistics $I_{n,j}(f_j)$, $0\le j\le k$, can be calculated
explicitly. It can be shown that they satisfy the inequality
$$
\int f_j^2(x_1,\dots,x_j)\mu(\,dx_1)\dots\mu(\,dx_j)
\le\int f^2(x_1,\dots,x_k)\mu(\,dx_1)\dots\mu(\,dx_k)
$$
for all indices $0\le j\le k$.}
\medskip
The problems about the behaviour of the multiple random
integrals $J_{n,k}(f)$ defined in formula (1.1) can also be reduced
to problems about the behaviour of degenerate $U$-statistics by
means of their appropriate decomposition. Such expressions can
be written as the linear combination
$$
J_{n,k}(f)=\sum_{j=0}^k c(n,j) n^{-j/2}I_{n,j}(f_j) \tag3.2
$$
of degenerate $U$-statistics with the same kernel functions $f_j$
which appear in formula (3.1) and with some appropriate coefficients
$c(n,j)$ such that $c(n,j)u)$ for all numbers $u>0$.
\medskip
\item{} {\it Problem B$''$).} Let a nice class $\Cal F$ of functions
$f(x_1,\dots,x_k)$ of $k$ variables be given. Take the
Wiener--It\^o integral $Z_{\mu,k}(f)$ of all functions $f\in\Cal F$
with respect to a white noise $\mu_W$. Give a good estimate on the
distribution of the supremum of these random integrals, i.e. on the
probability $P\left(\sup\limits_{f\in\Cal F}Z_{\mu,k}(f)>u\right)$
for all numbers $u>0$.
\beginsection 4. Results about the distribution of random integrals
and $U$-statistics.
It is worth considering first Problem~$A'')$ about the estimation of
Wiener--It\^o integrals. I present a result in this direction.
\medskip\noindent
{\bf Estimation about the tail distribution of Wiener--It\^o
integrals.} {\it Let a white noise $\mu_W$ be given with reference
measure $\mu$ together with a function $f(x_1,\dots,x_k)$ of $k$
variables on a measurable space $(X,\Cal X)$ such that
$$
\int f^2(x_1,\dots,x_k)\mu(\,dx_1)\dots\mu(\,dx_k)\le\sigma^2
$$
with some number $\sigma^2<\infty$. The Wiener--It\^o integral
$$
Z_{\mu,k}(f)=\frac1{k!}\int f(x_1,\dots,x_k)\mu_W(\,dx_1)\dots\mu_W(\,dx_k)
$$
introduced in formula (3.3) satisfies the inequality
$$
P(k!|Z_{\mu,k}(f)|>u)\le C \exp\left\{-\frac12\left(\frac
u\sigma\right)^{2/k}\right\}
$$
for all numbers $u>0$ with some constant $C=C(k)>0$ depending only
on the multiplicity $k$ of the integral.}
\medskip
The next example shows that the above estimate is sharp.
\medskip\noindent
{\bf Lower bound on the tail distribution of a special Wiener--It\^o
integral.} {\it Let a $\sigma$-finite measure $\mu$ be given on a
measurable space $(X,\Cal X)$ together with a white noise $\mu_W$
on $(X,\Cal X)$ with this reference measure $\mu$. Let $f_0(x)$ be
a real valued function on the space $(X,\Cal X)$ such that $\int
f_0(x)^2\mu(\,dx)=1$. Let us introduce the function
$f(x_1,\dots,x_k)=\sigma f_0(x_1)\cdots f_0(x_k)$ with some number
$\sigma>0$, and consider the Wiener--It\^o integral $Z_{\mu,k}(f)$
introduced in formula (3.3). Then the identity
$$
\int f(x_1,\dots,x_k)^2\,\mu(\,dx_1)\dots\,\mu(\,dx_k)=\sigma^2
$$
holds, and the Wiener--It\^o integral $Z_{\mu,k}(f)$ satisfies
the inequality
$$
P(k!|Z_{\mu,k}(f)|>u)\ge \frac{\bar C}{\left(\frac u\sigma\right)^{1/k}+1}
\exp\left\{-\frac12\left(\frac u\sigma\right)^{2/k}\right\}
$$
for all numbers $u>0$ with some appropriate constant $\bar C>0$.}
\medskip
\medskip
The integral
$\sigma^2=\int f(x_1,\dots,x_k)^2\,\mu(\,dx_1)\dots\,\mu(\,dx_k)$
in the above results agrees with the variance of the random integral
$(k!)^{-1/2}Z_{\mu,k}(f)$. Hence these results can be interpreted so
that
$$
P(Z_{k,\mu}(f)>u)\le\text{const.}\, P(\sigma\eta^k>u)
$$
for all numbers $u>0$, where $\eta$ is a standard normal random
variable, and $\sigma^2=(k!)^{-1/2}EZ_{\mu,k}(f)^2$. Furthermore,
this estimate is sharp. This sharpness means that if we have no
more information about the kernel function $f$ than its $L_2$ norm,
i.e. the variance of the Wiener--It\^o integral determined by it,
then we cannot get a better estimate, than the above formulated
inequality. On the other hand, in some special cases a
considerably better estimate can be proved under some appropriate
additional information about the behaviour of the kernel
function~$f$. But this question, which does not appear in the
statistical problems which gave the motivation for the study of
the problems considered in this note will be not discussed here.
Similar, but slightly weaker estimates hold for degenerate
$U$-statistics and multiple random integrals with respect to
normalized empirical distributions.
\medskip\noindent
{\bf Estimate on the tail distribution of degenerate $U$-statistics.}
{\it Let $\xi_1,\dots,\xi_n$ be a sequence of independent and
identically distributed random variables on a measurable space
$(X,\Cal X)$ with distribution $\mu$. Take a function
$f(x_1,\dots,x_k)$ on the space $(X^k,\Cal X^k)$ canonical with
respect to the measure $\mu$ which satisfies the conditions
$$
\align
\|f\|_\infty&=\sup_{x_j\in X, \,1\le j\le k}|f(x_1,\dots,x_k)|\le 1
\tag4.1 \\
\|f\|^2_2&=\int f^2(x_1,\dots,x_k)\mu(\,dx_1)\dots\mu(\,dx_k)
\le\sigma^2 \tag4.2
\endalign
$$
with some number $0<\sigma^2\le1$, and consider the (degenerate)
$U$-statistic defined in formula (1.2) with the help of these
quantities. Then there exist some constants $A=A(k)>0$ and
$B=B(k)>0$ depending only on the order $k$ of the $U$-statistic
such that the inequality
$$
P(k!n^{-k/2}|I_{n,k}(f)|>u)\le A\exp\left\{-\frac{u^{2/k}}{2\sigma^{2/k}
\left(1+B\left(un^{-k/2}\sigma^{-(k+1)}\right)^{1/k}\right)}\right\}
\tag4.3
$$
holds for all numbers $0\le u\le n^{k/2}\sigma^{k+1}$.}
\medskip
The above estimate can be considered as a multivariate
generalization of Bernstein's inequality. For multiple integrals
with respect to normalized empirical distributions the following
similar estimate holds.
\medskip\noindent
{\bf Estimate about the tail distribution of random integrals with
respect to normalized empirical distributions.} {\it Let a sequence
$\xi_1,\dots,\xi_n$ of independent and identically distributed
random variables be given with distribution $\mu$ which take their
values in a measurable space $(X,\Cal X)$ together with a function
$f(x_1,\dots,x_k)$ on the $k$-fold product space $(X^k,\Cal X^k)$
which satisfy relations (4.1) and (4.2) with some constant
$0<\sigma\le1$. Then there exist some constants $C=C_k>0$ and
$\alpha=\alpha_k>0$ depending only on the multiplicity $k$ of the
integral $J_{n,k}(f)$ defined in formula (1.1) such that the
inequality
$$
P\left(|J_{n,k}(f)|>u\right)\le C \exp\left\{-\alpha
\left(\frac u\sigma\right)^{2/k}\right\} \quad \text{for all
numbers } 0__u)$ of the normalized sum $n^{-1/2}S_n$ of $n$
independent, identically distributed, bounded random variables
with expectation zero satisfies only a very weak estimate if
$u\gg n^{1/2}\sigma^2$, an estimate which is very far from the
bound suggested by the central limit theorem.
Similarly, in the case $k\ge2$ the tail distribution
$P(k!n^{-k/2}I_{n,k}(f)>u)$ of degenerate $U$-statistics
satisfies a much weaker estimate than the bound suggested by the
behaviour of Wiener--It\^o integrals if $u\gg n^{k/2}\sigma^{k+1}$.
This means that the previous estimates about the tail distribution
of degenerate $U$-statistics and integrals with respect to
normalized empirical distributions are sharp also in that sense
that they give the domain where the sharp estimate suggested by
the behaviour of Wiener--It\^o integrals holds.
For the sake of completeness I present such a degenerate
$U$-statistic in the case $k=2$ whose tail-distribution satisfies
only a much weaker estimate than formula~(4.3) if $u\gg n\sigma^3$.
\medskip\noindent
{\bf Lower bound for the tail distribution of a special degenerate
$U$-statistic in the case $k=2$.} {\it Let $\xi_1,\dots,\xi_n$ be
a sequence of independent, identically distributed random variables
with values on the two-dimensional Euclidean space. Let
$\xi_j=(\eta_{j,1},\eta_{j,2})$, $1\le j\le n$, where $\eta_{j,1}$
and $\eta_{j,2}$ are independent random variables,
$P(\eta_{j,1}=1)=P(\eta_{j,1}=-1)=\frac{\sigma^2}8$, and
$P(\eta_{j,1}=0)=1-\frac{\sigma^2}4$,
$P(\eta_{j,2}=1)=P(\eta_{j,2}=-1)=\frac12$ for all indices
$1\le j\le n$. Let us introduce the function
$f(x,y)=f((x_1,x_2),(y_1,y_2))=x_1y_2+x_2y_1$,
$x=(x_1,x_2)\in R^2$, $y=(y_1,y_2)\in R^2$, and define the
$U$-statistic of order 2
$$
I_{n,2}(f)=\sum_{1\le j,k\le n,\,j\neq k}
(\eta_{j,1}\eta_{k,2}+\eta_{k,1}\eta_{j,2})
$$
with this kernel function $f$ and the independent random variables
$\xi_1,\dots,\xi_n$. The expression $I_{n,2}(f)$ is a degenerate
$U$-statistic. Furthermore, if $u\ge B_1n\sigma^3$ with some
appropriate constant $B_1>0$, $B_2^{-1}n\ge u\ge B_2n^{-2}$ with
some sufficiently large number $B_2>0$, and
$\frac1n\le\sigma\le1$, then the inequality
$$
\align
P(n^{-1}I_{n,2}(f)>u)&\ge \exp\left\{-Bn^{1/3}u^{2/3}\log
\left(\frac u{n\sigma^3}\right)\right\} \\
&= \exp\left\{-B\frac u\sigma\left(\frac{n\sigma^3} u\right)^{1/3}
\log\left(\frac u{n\sigma^3}\right)\right\}
\endalign
$$
holds with some constant $B>0$, which depends neither on the
number~$n$ nor on the parameter $\sigma$.}
\medskip
\beginsection 5. A brief explanation of the results.
It is worth showing that the high even order moments
$EI_{n,k}(f)^{2M}$ of a degenerate $U$-statistic $I_{n,k}(f)$ of
order $k$ satisfy such estimates as the moments $E\eta^{2kM}$
of a Gaussian random variable $\eta$ with expectation zero and
appropriate variance. Such estimates (together with the
Markov inequality) imply the inequalities we want prove, and
beside this there is a method which enables us to bound such
moments.
Such moments can be estimated by means of the so-called diagram
formula about random integrals. This formula makes possible to
express the moments we are interested in as the sum of certain
integrals defined with the help of some diagrams. To give a good
estimate on the moments we want to bound it has to be shown that
the ``diagrams corresponding to the Gaussian effect'' yield the
main contribution to them. In such a way we can get an explanation
why the tail distribution of degenerate $U$-statistics and
random integrals satisfy such an estimate which the behaviour
of Wiener--It\^o integrals (i.e. the Gaussian case) suggests.
The explanation of the details in the estimation of multiple
random integrals or degenerate $U$-statistics of order $k\ge2$
demands the application of rather complicated notations. This
requires much work which cannot be done in a short summary paper.
Hence I omit its discussion. On the other hand, I consider a
special case of this problem, the estimation of the moments of
sums of independent random variables. This may explain very
much also about the general case.
Let $\xi_1,\dots,\xi_n$ be a sequence of independent and
identically distributed random variables such that $E\xi_1=0$,
$\text{Var}\,\xi_1=\sigma^2$, and let us estimate the even
moments of the sum $S_n=\sum\limits_{j=1}^n \xi_j$. The identity
$$
ES_n^{2M}=\sum\Sb (j_1,\dots,j_s,l_1,\dots,l_s)\\ j_1+\dots+j_s=2M,\,
j_u\ge 2,\text{ for all indices } 1\le u\le s \\
l_u\neq l_{u'} \text { if } u\neq u'\endSb
E\xi_{l_1}^{j_1}\cdots E\xi_{l_s}^{j_s} \tag5.1
$$
holds.
Simple combinatorial considerations show that in the sum at the
right-hand side of the identity~(5.1) most terms are indexed with
such a vector
$$
(j_1,\dots,j_M,l_1,\dots,l_M)
$$
for which $j_u=2$ for all numbers $1\le u\le M$. The number of
such terms equals
$\binom nM\frac{(2M)!}{2^M}\sim n^M\frac{(2M)!}{2^MM!}$. Hence it
is natural to expect that in typical cases $ES_n^{2M}\sim
\left(n\sigma^2\right)^M\frac{(2M)!}{2^MM!}$. This consideration
suggests the estimate
$$
\sum_{1\le l_10$ there exists a
subclass $\Cal G_{\varepsilon}=\{g_1,\dots,g_m\}\subset\Cal G$
in the space $L_2(Y,\Cal Y,\nu)$ consisting of
$m\le D\varepsilon^{-L}$ elements such that
$\inf\limits_{g_j\in \Cal G_\varepsilon}\int |g-g_j|^2\,d\nu
<\varepsilon^2$ for all functions~$g\in \Cal G$.}
\medskip
The other useful notion is the following one.
\medskip\noindent
{\bf Definition of $L_2$-dense classes of functions.} {\it Let us
have a measurable space $(Y,\Cal Y)$ and a set $\Cal G$ of
$\Cal Y$-measurable, real valued functions on this space. We call
$\Cal G$ an $L_2$-dense class of functions with parameter $D$ and
exponent $L$ if it is $L_2$-dense with parameter $D$ and exponent
$L$ with respect to all probability measures $\nu$ on $(Y,\Cal Y)$.}
\medskip
It is useful to consider first Problem~B$''$) about the supremum of
Wiener--It\^o integrals, then to describe the results on Problems~B)
and~B$'$) about the supremum of random integrals with respect to
normalized empirical distribution and degenerate $U$-statistics
and to compare these results.
\medskip\noindent
{\bf Estimate about the tail distribution of the supremum of
Wiener--It\^o integrals.} {\it Let us consider a measurable space
$(X,\Cal X)$ together with a $\sigma$-finite non-atomic
measure~$\mu$ on it, and let $\mu_W$ be a white noise with reference
measure $\mu$ on $(X,\Cal X)$. Let $\Cal F$ be a countable and
$L_2$-dense class of functions $f(x_1,\dots,x_k)$ on $(X^k,\Cal X^k)$
with some parameter $D$ and exponent $L$ with respect to the product
measure $\mu^k$ such that
$$
\int f^2(x_1,\dots,x_k)\mu(\,dx_1)\dots \mu(\,dx_k)\le\sigma^2
\quad \text{\rm with some } 0<\sigma\le1 \text { \rm for all }
f\in \Cal F.
$$
Let us consider the multiple Wiener integrals $Z_{\mu,k}(f)$
introduced in formula (3.3) for all~$f\in\Cal F$. The inequality
$$
P\left(\sup_{f\in \Cal F}|Z_{\mu,k}(f)|>u\right)\le C(D+1)
\exp\left\{-\alpha\left(\frac u\sigma\right)^{2/k}\right\}
$$
holds with some universal constants $C=C(k)>0$ and
$\alpha=\alpha(k)>0$ if
$$
\left(\frac u\sigma\right)^{2/k}\ge ML \log\frac2\sigma \tag6.1
$$
with some appropriate constant $M=M(k)>0$.}
\medskip
In the above result --- disregarding the value of the universal
constants appearing in it --- the same estimate is obtained about
the tail distribution of Wiener--It\^o integrals (under
appropriate conditions) as in the estimate about the tail
distribution of a single Wiener--It\^o integral. The only
essential difference between these two results is that in the
present case an additional condition formulated in formula~(6.1)
had to be imposed. It is not difficult to present such an example
which shows that such a condition is really needed. But here I
omit its description.
The next result is an estimate on the tail-distribution
of the supremum of random integrals $J_{n,k}(f)$ defined in
formula (1.1).
\medskip\noindent
{\bf Estimate on the tail distribution of the supremum of multiple
integrals with respect to a normalized empirical distribution.}
{\it Let us have a probability measure $\mu$ on a measurable space
$(X,\Cal X)$ together with a countable and $L_2$-dense class
$\Cal F$ of functions $f=f(x_1,\dots,x_k)$ of $k$ variables with
some parameter~$D$ and exponent~$L$, $L\ge1$, on the product space
$(X^k,\Cal X^k)$ such that
$$
\|f\|_\infty=\sup\limits_{x_j\in X,\;1\le j\le k}|f(x_1,\dots,x_k)|\le 1,
$$
and
$$
\|f\|_2^2=Ef^2(\xi_1,\dots,\xi_k)=\int f^2(x_1,\dots,x_k)
\mu(\,dx_1)\dots\mu(\,dx_k)\le \sigma^2
$$
for all functions $f\in \Cal F$ with some constant $0<\sigma\le1$.
Then there exist some constants $C=C(k)>0$, $\alpha=\alpha(k)>0$
and $M=M(k)>0$ depending only on the parameter $k$ such that the
supremum of the random integrals $J_{n,k}(f)$, $f\in \Cal F$,
defined by formula (1.1) satisfies the inequality
$$
P\left(\sup_{f\in\Cal F}|J_{n,k}(f)|\ge u\right)\le CD
\exp\left\{-\alpha\left(\frac u{\sigma}\right)^{2/k}\right\},
$$
provided that
$$
n\sigma^2\ge \left(\frac u\sigma\right)^{2/k}\ge
M(L+\beta)^{3/2}\log\frac2\sigma,
$$
where $\beta=\max\left(\frac{\log D}{\log n},0\right)$ and the
numbers $D$ and $L$ agree with the parameter and exponent of
the $L_2$-dense class~$\Cal F$.}
\medskip
A similar estimate holds for the supremum of degenerate
$U$-statistics $I_{n,k}(f)$, $f\in\Cal F$. The only difference
in comparison with the above result that in the case of
the supremum of $U$-statistics the additional condition has
to be imposed that the $U$-statistics $I_{n,k}(f)$ must be
degenerate.
An essential difference between the results about the estimation
of the supremum of Wiener--It\^o integrals $Z_{\mu,k}(f)$ and
integrals with respect to normalized empirical distribution
$J_{n,k}(f)$ is that in the first case the class of functions
$\Cal F$ had to be $L_2$-dense with respect to the product
measure $\mu^k$, while in the second case a more restrictive
condition was imposed. In the case of supremum of integrals
with respect to a normalized empirical distribution the class
of functions $\Cal F$ had to satisfy the $L_2$-property, i.e. it
had to be $L_2$-dense with respect to all probability measures.
The question arises what the cause of this difference is.
The supremum of Wiener--It\^o integrals can be bounded by means of
a simple and natural method, the so-called `chaining argument'.
In the case of the random integrals $J_{n,k}(f)$ this method is
not strong enough to solve the problem, it only yields some
partial results. To get a complete solution some additional
methods have to be applied, and their application demands some
additional restrictions.
The elaboration of the details would demand much work and the
application of methods essentially different from previous ones.
Hence I shall present only a brief sketch of the main ideas.
The main emphasize will be put on the explanation of the main
problems and methods. First I briefly explain the `chaining
argument'.
\medskip\noindent
{\it The `chaining argument', and the boundary of this method.}
\medskip
Let us apply the notation worked out in the formulation of
the results about the supremum of Wiener--It\^o integrals. Let us
take for all indices $N=1,2,\dots$ such a system of increasing
subsets $\Cal F_1\subset\Cal F_2\subset\cdots\subset\Cal F_N
\subset\cdots\subset\Cal F$ of the class of functions $\Cal F$
with relatively small cardinalities which satisfy the relation
$$
\inf_{g\in\Cal F_N}\int \left(f(x_1,\dots,x_k)-g(x_1,\dots,x_k)
\right)^2\mu(\,dx_1)\dots\mu(\,dx_k)\le 2^{-2N}\sigma^2.
$$
for all functions $f\in\Cal F$.
The probabilities
$$
P\left(\sup_{g\in\Cal F_N}Z_{\mu,k}(g)>u\left(1-2^{-N}\right)\right)
$$
can be well estimated by means of a recursion for $N=1,2,\dots$,
since for all functions $g\in \Cal F_{N+1}$ there exists a
function $g'\in\Cal F_N$ (close to it) for which
$$
\int\left(g(x_1,\dots,x_k)-g'(x_1,\dots,x_k)\right)^2
\mu(\,dx_1)\dots\mu(\,dx_k) \le 2^{-2N}\sigma^2.
$$
Hence the probability
$$
P(|Z_{\mu,k}(g)-Z_{\mu,k}(g')|>2^{-N}u)=P(|Z_{\mu,k}(g-g')|>2^{-N}u)
$$
can be well estimated by means of the previous results about the
tail distribution Wiener--It\^o integrals. By working out the
details the result about the tail distribution of the supremum
of Wiener--It\^o integrals can be relatively simply proved.
\medskip
The above considered `chaining argument' is not strong enough to
estimate the supremum of integrals with normalized empirical
distribution or of degenerate $U$-statistics. It only makes possible
to reduce the problem to the case when the expressions
$
\sigma^2(f)=\int f^2(x_1,\dots,x_k)\mu(\,dx_1)\dots\mu(\,dx_k)
$
are small for all functions $f\in\Cal F$.
The `chaining argument' is a weak method in the study of this
problem for the following reason.
There are only very weak estimates on probabilities of the form
$$
P(I_{n,k}(f)>u) \quad \text{or} \quad P(J_{\mu,k}(f)>u)
$$
if $\sigma^2(f)$ is very small, and the number $u$ is relatively
large. This is the consequence of the previously discussed fact
that there cannot be given such a good estimate for the
tail distribution of degenerate $U$-statistics or random integrals
with respect to a normalized empirical distribution function with
small variance as the Gaussian comparison would suggest.
This difficulty can be overcome by means of a different method,
by means of a symmetrization argument. This method consists of
reducing the estimation of a probability of the type
$$
P\left(\frac1{k!} \sup_{f\in\Cal F}
\sum \Sb 1\le j_s\le n,\; s=1,\dots, k,\\
j_s\neq j_{s'} \text{ if } s\neq s'\endSb
f(\xi_{j_1},\dots,\xi_{j_k})>u\right)
$$
to a probability of the type
$$
P\left(\frac1{k!} \sup_{f\in\Cal F}
\sum\Sb 1\le j_s\le n,\; s=1,\dots, k, \\
j_s\neq j_{s'} \text{ if } s\neq s'\endSb
\varepsilon_{j_1}\dots\varepsilon_{j_k}
f(\xi_{j_1},\dots,\xi_{j_k})>u\right), \tag6.2
$$
where $\varepsilon_1,\dots,\varepsilon_n$ are independent
random variables with distribution
$P(\varepsilon_j=1)=P(\varepsilon_j=-1)=\frac12$ for all indices
$1\le j\le n$. Beside this, they are also independent of the
random variables $\xi_1,\dots,\xi_n$.
The probabilities in formula (6.2) can be well estimated by means
of a `conditioning argument'. In the application of this method
the conditional probability of the investigated event has to
be bounded under the condition $\xi_1=x_1$,\dots, $\xi_n=x_n$
for all possible values $x_1,\dots,x_n$. There are good
methods to estimate such type of conditional probabilities, but
they are not discussed here. On the other hand, these methods
work only if the class of functions $\Cal F$ is $L_2$-dense.
This is the reason why this property appears in this problem.
\medskip\noindent
There is a point which should be emphasized even in this sketchy
discussion of the problems. In the study of the supremum of random
integrals with respect to a normalized empirical distribution or
of degenerate $U$-statistics a different method was applied in the
estimation of random variables with relatively large and small
variance. In the case of relatively large variance the `chaining
argument' works, while in the case of small variance an appropriate
symmetrization argument was applied. Behind the different
approaches in these two cases there is a deeper reason.
The `chaining argument' works well only in the study of the
supremum of degenerate $U$-statistics or random integrals with
respect to a normalized empirical measures with not too small
variance; in the case when the $U$-statistics and random integrals
satisfy such estimates which their `Gaussian type limits' suggest.
There can be defined some `irregular events' whose appearance
implies that the $U$-statistics or random integrals take extremely
large values. But in the case of random variables with not too
small variances the probability of such irregular events is very
small, and their effect can be disregarded. The case of $U$-statistics
or random integrals with a small variance is different. In this
case the probability of these irregularities (compared to the
events of regular events) is relatively large, and in the
estimation of the probabilities we are interested in their
effect is dominant.
In the estimation of the tail distribution of the supremum of
multiple integrals a simultaneous application of the two above
mentioned arguments is needed.
The `chaining argument' helps to reduce the problem to such
a case when the supremum of random variables with very small
variance have to be bounded. In this situation the effect of the
irregularities is non-negligible, and we can estimate this
supremum `with non-Gaussian behaviour' with the help of some
symmetrization type arguments.
\medskip\noindent
In this work I tried to describe briefly the result of an
important subject together with the heuristic picture behind
the results. A more detailed discussion of this subject can be
found in my work~[1]. This work also contains a fairly detailed
list of references.
\beginsection References
\item{1.)} P\'eter Major:
Tail behaviour of multiple random integrals and $U$-statistics.
{\it Probability Reviews.} 448--505, (2005)
\bye
__