Theorem 3.2.
Let, for all
$\alpha >0$
,
${Q}_{\alpha}:[0,\parallel {A}_{m}{\parallel}^{2}]\to IR$
be a piecewise continuous and nonincreasing function of
$\lambda $
on the segment
$[0,\parallel {A}_{m}{\parallel}^{2}]$
. Assume also that there is a
$C>0$
such that
$$\left\lambda {Q}_{\alpha}\right(\lambda \left)\right\le C,$$
and
$${lim}_{\alpha \to \infty}{Q}_{\alpha}\left(\lambda \right)=\frac{1}{\lambda}$$
for all
$\lambda \in (0,\parallel {A}_{m}{\parallel}^{2}).$
Then, for all
$y\in D\left({A}_{m}^{\u2020}\right)$
,
$${lim}_{\alpha \to \infty}{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}^{*}{\Pi}_{{Y}_{m}}^{n}A\stackrel{~}{f}={\stackrel{~}{f}}_{m}$$
holds with
${\stackrel{~}{f}}_{m}={A}_{m}^{\u2020}{y}_{m}.$
Remark 3.3.
In order to assume convergence as
$\alpha \to \infty $
, it is necessary to choose
${Q}_{\alpha}$
such that it approximates
$1/\lambda $
for all
$\lambda \in (0,\parallel {A}_{m}{\parallel}^{2}]$
. Also, note that the condition
$\left\lambda {Q}_{\alpha}\right(\lambda \left)\right\le C$
implies that
$\parallel {A}_{m}{R}_{\alpha}\parallel =\parallel {A}_{m}{A}_{m}^{*}{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right)\parallel \le C$
, i.e,
$\parallel {A}_{m}{R}_{\alpha}\parallel $
is uniformly bounded.

Proof.
As in [9] , if
${\stackrel{~}{f}}_{m}$
is defined by 2.8 , then by 2.2 the residual norm has the representation
$$\parallel {\stackrel{~}{f}}_{m}{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}^{*}{A}_{m}\stackrel{~}{f}{\parallel}^{2}=\parallel (I{Q}_{\alpha}({A}_{m}^{*}{A}_{m}\left){A}_{m}^{*}{A}_{m}\right){\stackrel{~}{f}}_{m}{\parallel}^{2}$$
From the formula 2.10 , it follows that
$$\parallel {\stackrel{~}{f}}_{m}{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}^{*}{A}_{m}\stackrel{~}{f}{\parallel}^{2}={\int}_{0}^{\parallel {A}_{m}{\parallel}^{2}+}(1\lambda {Q}_{\alpha}(\lambda ){)}^{2}d\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}.$$
Since
$(1\lambda {Q}_{\alpha}(\lambda ){)}^{2}$
is bounded by the constant
$(1+C{)}^{2}$
, which is integrable with respect to the measure
$d\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2},$
then by the Dominated Convergence Theorem,
$$\begin{array}{c}{lim}_{\alpha \to \infty}{\int}_{0}^{\parallel {A}_{m}{\parallel}^{2}+}(1\lambda {Q}_{\alpha}(\lambda ){)}^{2}d\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}={\int}_{0}^{\parallel {A}_{m}{\parallel}^{2}+}{lim}_{\alpha \to \infty}(1\lambda {Q}_{\alpha}\left(\lambda \right){)}^{2}d\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}.\end{array}$$ 
(3.2)

Since for
$\lambda >0$
,
${lim}_{\alpha \to \infty}(1\lambda {Q}_{\alpha}(\lambda \left)\right)=0$
then the integral on the righthand side of 3.2 equals to
$0$
. On the other hand, if
$\lambda =0$
,
${lim}_{\alpha \to \infty}(1\lambda {Q}_{\alpha}(\lambda \left)\right)=1$
then the equation 3.2 has the form
$$\begin{array}{c}{lim}_{\alpha \to \infty}{\int}_{0}^{\parallel {A}_{m}{\parallel}^{2}+}(1\lambda {Q}_{\alpha}(\lambda ){)}^{2}d\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}={lim}_{\lambda \to {0}^{+}}\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}\parallel {E}_{0}{\stackrel{~}{f}}_{m}{\parallel}^{2}\end{array}$$ 
(3.3)

which is equal the jump of
$\parallel {E}_{\lambda}{\stackrel{~}{f}}_{m}{\parallel}^{2}$
at
$\lambda =0$
. Since
${\stackrel{~}{f}}_{m}\in \mathcal{N}({A}_{m}{)}^{\perp}$
, the term on the righthand side of 3.3 equals to 0. Thus,
${R}_{\alpha}{A}_{m}\stackrel{~}{f}$
converges to
${\stackrel{~}{f}}_{m}$
as
$\alpha \to \infty $
for
${y}_{m}\in D\left({A}_{m}^{\u2020}\right),$
which ends the proof.
□
Let
$Tr\left(B\right)$
the trace of the selfadjoint operator
${B}^{t}B$
for any square matriz
$B$
, which is defined by
$$Tr\left(B\right)=\frac{1}{n}{\sum}_{j\in m}{b}_{j}$$
for
${b}_{j}$
eigenvalues of
${B}^{t}B$
.
We then have the following result,
Theorem 3.4.
Let
${Q}_{\alpha}$
be as in theorem 3.2 . Let
$\mu ,\rho >0$
and let
${\omega}_{\mu}:(0,{\alpha}_{0})\to R$
be such that for all
$\alpha \in (0,{\alpha}_{0})$
and
$\lambda \in [0,{\sigma}_{1}^{2}]$
$${sup}_{0\le \lambda \le {\sigma}_{1}^{2}}{\lambda}^{\mu}1\lambda {Q}_{\alpha}(\lambda \left)\right\le {\omega}_{\mu}\left(\alpha \right)$$
holds. Then for
$\stackrel{~}{f}\in {A}_{\mu ,\rho},$
the following inequality holds true
$$\begin{array}{c}IE\parallel {\stackrel{~}{f}}_{m}{f}_{\alpha ,m}{\parallel}^{2}\le 2{\omega}_{\mu}(\alpha {)}^{2}{\rho}^{2}+2{\sigma}^{2}Tr({Q}_{\alpha}^{2}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}{A}_{m}^{*}).\end{array}$$ 
(3.4)


Proof.
The proof of this inequality is based on the definition of the estimator
${f}_{m,\alpha}$
and on the assumptions over this function. We have that the
${L}^{2}$
norm of the difference between the regularized function and the true data function can be bounded by
$$\begin{array}{c}IE\parallel {\stackrel{~}{f}}_{m}{f}_{m,\alpha}{\parallel}^{2}\le 2IE\parallel {\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}{\parallel}^{2}+2IE\parallel {R}_{\alpha}{A}_{m}\stackrel{~}{f}{f}_{m,\alpha}{\parallel}^{2}\end{array}$$ 
(3.5)

where
${f}_{m,\alpha}={R}_{\alpha}{y}_{m}$
.
This is the typical biasvariance decomposition. The first term on the righthand side is an approximation error, which corresponds to the bias, whereas the second term, variance, is a stability bound on the regularizing operator
${R}_{\alpha}$
. Note that by the Theorem 3.2 , the first term in 3.5 goes to 0 if
${y}_{m}\in \mathcal{D}\left({T}_{m}^{\u2020}\right)$
.
Let
$\omega \in {F}_{m}$
with
$\parallel \omega \parallel \le \rho $
. Since
$\stackrel{~}{f}\in {F}_{m}$
then
${\Pi}_{{F}_{m}}\stackrel{~}{f}=({A}_{m}^{*}{A}_{m}{)}^{\mu}\omega $
. On the other hand,
${\lambda}^{\mu}{sup}_{\lambda}1\lambda {Q}_{\alpha}(\lambda \left)\right\le {\omega}_{\mu}\left(\alpha \right)$
, then the first term in this equation can be bounded by
$$\begin{array}{cc}IE\parallel {\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}{\parallel}^{2}& =IE\parallel {\stackrel{~}{f}}_{m}{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}^{*}{A}_{m}{\stackrel{~}{f}}_{m}{\parallel}^{2}\end{array}$$  
$$\begin{array}{cc}& =IE\parallel (I{Q}_{\alpha}({A}_{m}^{*}{A}_{m}\left){A}_{m}^{*}{A}_{m}\right){\stackrel{~}{f}}_{m}{\parallel}^{2}\end{array}$$  
$$\begin{array}{cc}& \le {\omega}_{\mu}^{2}{\rho}^{2}.\end{array}$$  
$$\begin{array}{}\end{array}$$  
In order to control the term corresponding to the variance we used that the data perturbation is white noise. Thus,
$$\begin{array}{cc}IE\parallel {R}_{\alpha}{A}_{m}\stackrel{~}{f}{f}_{m,\alpha}{\parallel}^{2}& =IE\langle \varepsilon ,\left({Q}_{\alpha}\right({A}_{m}^{*}{A}_{m}\left){A}_{m}^{*}{)}^{*}{Q}_{\alpha}\right({A}_{m}^{*}{A}_{m}){A}_{m}^{*}\varepsilon \rangle \end{array}$$  
$$\begin{array}{cc}& =IE\langle \varepsilon ,{Q}_{\alpha}^{2}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}{A}_{m}^{*}\varepsilon \rangle \end{array}$$  
$$\begin{array}{cc}& ={\sigma}^{2}Tr\left({Q}_{\alpha}^{2}\right({A}_{m}^{*}{A}_{m}\left){A}_{m}{A}_{m}^{*}\right)\end{array}$$  
$$\begin{array}{}\end{array}$$  
which yields the desired result. □
The next result will be useful when studying iterative methods.
Theorem 3.5.
Let
${Q}_{\alpha}$
be as in theorem 3.2 . Assume also that
${Q}_{\alpha}$
is continuously differentiable and that the function
$1\lambda {Q}_{\alpha}(\lambda \left){}^{\prime}\right\lambda {Q}_{\alpha}\left(\lambda \right)1{}^{1}$
doest not decrease. Then the estimates are valid
$${sup}_{0\le \lambda \le {\sigma}_{1}^{2}}\left{Q}_{\alpha}\right(\lambda \left)\right={Q}_{\alpha}\left(0\right),$$
and
$${sup}_{0\le \lambda \le {\sigma}_{1}^{2}}{\lambda}^{\mu}1\lambda {Q}_{\alpha}(\lambda \left)\right<{\mu}^{\mu}(\mu +1{)}^{1}{\omega}_{\mu}(\alpha )$$
where
${\omega}_{\mu}\left(\alpha \right)={Q}_{\alpha}(0{)}^{\mu}.$

Proof.
The proof can be carried out by standard techniques. A proof of this result can be found in [11] . □
4 Rates of convergence for the regularized estimator
In any regularization method, the regularization parameter
$\alpha $
plays a crucial role. For choosing the parameter, there are general methods of parameter selection. For example, the Discrepancy Principe [
20]
, CrossValidation [
7]
and the Lcurve [
10]
. They differ in the amount of a priori information required as well as in the decision criteria. The appropriate choice of regularization parameter is a difficult problem. We would like too choose
$\alpha $
, based on the data in such a way that optimal rates are maintained. This choice should not depend on a priori regularity assumptions.
Our goal is to introduce adaptive methods in the context of statistical inverse problems.
In this section we introduce our adaptive estimator, for a fixed
$m={m}_{0}$
. We choose
${m}_{0}$
such that
$\parallel \stackrel{~}{f}{\Pi}_{{F}_{{m}_{0}}}\stackrel{~}{f}{\parallel}^{2}$
satisfies the optimal rates with high probability since we know
$$\parallel \stackrel{~}{f}{\Pi}_{{F}_{{m}_{0}}}\stackrel{~}{f}{\parallel}^{2}<\parallel I{\Pi}_{{Y}_{{m}_{0}}}{\parallel}^{4\mu}=\mathcal{O}\left({d}_{{m}_{0}}^{4\mu p}\right)$$
for a certain
$p$
and
$0<\mu \le 1/2$
. It is satisfied if the dimension of the set is such that
$$\begin{array}{c}{d}_{{m}_{0}}\ge {n}^{\frac{1}{2p+1}}.\end{array}$$ 
(4.1)

This leads to the rate
$$\parallel \stackrel{~}{f}{f}_{m,\alpha}{\parallel}^{2}=\mathcal{O}\left({n}^{\frac{4\mu p}{4\mu p+2p+1}}\right).$$
Analogous results are obtained in the case of Hilbert scales ([
12]
,[
16]
).
Adaptive model selection is a technique which penalizes the regularization parameter, in such a way that we choose
${\hat{f}}_{{m}_{0},{\alpha}_{\hat{k}}}$
by minimizing
$$arg{min}_{k\in \mathcal{K},f\in {F}_{m}}\left(\parallel {R}_{{\alpha}_{k}}({y}_{m}{A}_{m}f){\parallel}^{2}+pen\left({\alpha}_{k}\right)\right)$$
where
$$\hat{k}=arg{min}_{k\in \mathcal{K}}\left(\parallel {R}_{{\alpha}_{k}}({y}_{m}{A}_{m}f){\parallel}^{2}+pen\left({\alpha}_{k}\right)\right)$$
and
$$pen\left({\alpha}_{k}\right)=r{\sigma}^{2}(1+{L}_{k})\left[Tr\right({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}})+{\rho}^{2}({R}_{{\alpha}_{k}}\left)\right],$$
with
$r>2$
and
${L}_{k}$
is a sequence which is incorporated in order to control the complexity of the set
$\mathcal{K}=\{1,2,\dots ,{k}_{n}\}$
, of all possible index up to
${k}_{n}$
. Here
${\rho}^{2}\left(B\right)=\rho \left({B}^{t}B\right)$
is the spectral radius of the selfadjoint operator
${B}^{t}B$
for any square matriz
$B$
, which is defined by
$${\rho}^{2}\left(B\right)=\frac{1}{n}{max}_{j\in m}{b}_{j}$$
for
${b}_{j}$
eigenvalues of
${B}^{t}B$
.
Thus,
$\hat{k}$
is selected by minimizing
$$\begin{array}{c}arg{min}_{k\in \mathcal{K}}\left(\parallel {R}_{{\alpha}_{k}}({y}_{m}{A}_{m}f){\parallel}^{2}+\frac{r{\sigma}^{2}(1+{L}_{k})}{n}\left[{\sum}_{j\in m}{Q}^{2}\left({\lambda}_{j}\right){\lambda}_{j}+{max}_{j\in m}{Q}^{2}\left({\lambda}_{j}\right){\lambda}_{j}\right]\right).\end{array}$$ 
(4.2)

The strategy as proposed in this article automatically provides the optimal order of accuracy.
The regularized estimator has a rate of convergence less or equal than the best rate achieved by the best estimator for a selected model. We have the following result,
Theorem 4.1.
For any
$f\in {F}_{m}$
and any
${\alpha}_{k}$
the following inequality holds true for
$d$
a positive constant that depends on r (as in Lemma 4.4 ),
$$\begin{array}{c}IE\parallel {\stackrel{~}{f}}_{m}{\hat{f}}_{{\alpha}_{\hat{k}}}{\parallel}^{2}\le \frac{1}{(1\nu )}{inf}_{k\in \mathcal{K}}\left[C\right(1+\nu )\parallel {\stackrel{~}{f}}_{m}{f}_{{\alpha}_{k}}{\parallel}^{2}+2pen({\alpha}_{k}\left)\right]+\frac{{C}_{1}\left(d\right)}{n}\end{array}$$ 
(4.3)

where
${C}_{1}\left(d\right)=4{\sigma}^{2}{\sum}_{k}\frac{n{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}{d}\left[\sqrt{dr{L}_{k}\left[\frac{Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)}{{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}+1\right]}+1\right]{e}^{\sqrt{dr{L}_{k}\left[\frac{Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)}{{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}+1\right]}}.$
Remark 4.2.
An important issue is that equation 4.3 is non asymptotic. The goodness of fit of the estimator is defined by trace,
$Tr\left({R}_{\alpha}^{t}{R}_{\alpha}\right)$
, and spectral radius,
${\rho}^{2}\left({R}_{\alpha}\right)$
. Also, the estimator is optimal in the sense that the adaptive estimator achieves the best rate of convergence among all the regularized estimators.
Remark 4.3.
Remark that under our assumptions, namely that the basis is orthonormal for the fixed design, both
$n{\rho}^{2}\left({R}_{k}\right)$
and
$Tr\left({R}_{k}^{t}{R}_{k}\right)/{\rho}^{2}\left({R}_{k}\right)$
do not depend on n.

Proof.
For any
${f}_{{\alpha}_{k}}$
and any
$k\in IN$
$$\begin{array}{ccc}& \parallel {R}_{{\alpha}_{\hat{k}}}({y}_{m}{A}_{m}{\hat{f}}_{{\alpha}_{\hat{k}}}){\parallel}^{2}+pen\left({\alpha}_{\hat{k}}\right)\le \parallel {R}_{{\alpha}_{k}}({y}_{m}{A}_{m}{f}_{{\alpha}_{k}}){\parallel}^{2}+pen\left({\alpha}_{k}\right)& \end{array}$$  
and
$$\begin{array}{ccc}& \parallel {R}_{{\alpha}_{k}}({y}_{m}{A}_{m}{f}_{{\alpha}_{k}}){\parallel}^{2}=\parallel {R}_{{\alpha}_{k}}{A}_{m}(\stackrel{~}{f}{f}_{{\alpha}_{k}}){\parallel}^{2}+2\langle {R}_{{\alpha}_{k}}{A}_{m}(\stackrel{~}{f}{f}_{{\alpha}_{k}}),{R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon \rangle +\parallel {R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon {\parallel}^{2}& \end{array}$$  
Thus, following standard arguments we have
$$\begin{array}{ccc}& & \parallel {R}_{{\alpha}_{\hat{k}}}{A}_{m}(\stackrel{~}{f}{\hat{f}}_{{\alpha}_{\hat{k}}}){\parallel}^{2}\end{array}$$  
$$\begin{array}{ccc}& \le & \parallel {R}_{{\alpha}_{k}}{A}_{m}(\stackrel{~}{f}{f}_{{\alpha}_{k}}){\parallel}^{2}2<{R}_{{\alpha}_{\hat{k}}}{A}_{m}(\stackrel{~}{f}{\hat{f}}_{{\alpha}_{\hat{k}}}),{R}_{{\alpha}_{\hat{k}}}{\Pi}_{{Y}_{m}}^{n}\varepsilon >\end{array}$$  
$$\begin{array}{ccc}& & +2<{R}_{{\alpha}_{k}}{A}_{m}(\stackrel{~}{f}{f}_{{\alpha}_{k}}),{R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon >\parallel {R}_{{\alpha}_{\hat{k}}}{\Pi}_{{Y}_{m}}^{n}\varepsilon {\parallel}^{2}+\parallel {R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon {\parallel}^{2}+pen\left({\alpha}_{k}\right)+pen\left({\alpha}_{\hat{k}}\right).\end{array}$$  
Let
$0<\nu <1$
. Since the algebraic inequality
$2ab\le \nu {a}^{2}+\frac{1}{\nu}{b}^{2}$
holds for all
$a,b\in IR$
, we find that
$$\begin{array}{ccc}& & (1\nu )\parallel {R}_{{\alpha}_{\hat{k}}}{A}_{m}(\stackrel{~}{f}{\hat{f}}_{{\alpha}_{\hat{k}}}){\parallel}^{2}\end{array}$$  
$$\begin{array}{ccc}& \le & (1+\nu )\parallel {R}_{{\alpha}_{k}}{A}_{m}(\stackrel{~}{f}{f}_{{\alpha}_{k}}){\parallel}^{2}+2pen\left({\alpha}_{k}\right)+2{sup}_{{\alpha}_{k}}\{\frac{1}{\nu}\parallel {R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon {\parallel}^{2}pen({\alpha}_{k}\left)\right\},\end{array}$$  
holds for any
$k$
and
${f}_{{\alpha}_{k}}\in {F}_{m}$
.
On the other hand, using that is
$1\le \parallel {R}_{{\alpha}_{k}}A\parallel \le C$
, we have that for any
${f}_{{\alpha}_{k}}\in {F}_{{m}_{0}}$
and any
$k\in IN$
,
$$\begin{array}{ccc}& & (1\nu )\parallel {\stackrel{~}{f}}_{m}{\hat{f}}_{{\alpha}_{\hat{k}}}{\parallel}^{2}\le C(1+\nu )\parallel {\stackrel{~}{f}}_{m}{f}_{{\alpha}_{k}}{\parallel}^{2}\end{array}$$  
$$\begin{array}{ccc}& +& 2pen\left({\alpha}_{k}\right)+2{C}_{1}{sup}_{{\alpha}_{k}}\{\parallel {R}_{{\alpha}_{k}}{\Pi}_{{Y}_{m}}^{n}\varepsilon {\parallel}^{2}pen({\alpha}_{k}\left)\right\}.\end{array}$$  
The proof then follows directly from the following technical lemma ([
3]
,[
16]
) which characterizes the supremum of an empirical process by the regularization family.
Lemma 4.4.
Let
$\eta \left(A\right)=\sqrt{{\varepsilon}^{t}{A}^{t}A\varepsilon}=\parallel A\varepsilon \parallel .$
Then, there exists a positive constant
$d$
that depends on
$r/2$
such that the following inequality holds
$$\begin{array}{ccc}& & P\left({\eta}^{2}\right(A)\ge {\sigma}^{2}[Tr\left({A}^{t}A\right)+\rho \left({A}^{t}A\right)]r/2(1+L)+{\sigma}^{2}u)\end{array}$$ 
(4.4)

$$\begin{array}{ccc}& & \le exp\{\sqrt{d(1/\rho ({A}^{t}A)u+r/2L[Tr\left({A}^{t}A\right)/\rho \left({A}^{t}A\right)+1\left]\right)}\}.\end{array}$$  
With the above notation,
$$\eta \left({R}_{{\alpha}_{k}}\right)=\parallel {R}_{k}{\varepsilon}_{m}\parallel $$
where
${\varepsilon}_{m}={\Pi}_{{Y}_{m}}^{n}\varepsilon $
.
Now, with this lemma we have
$$\begin{array}{ccc}& & P({sup}_{{\alpha}_{k}}\parallel {R}_{{\alpha}_{k}}{\varepsilon}_{m}{\parallel}^{2}pen({\alpha}_{k})>{\sigma}^{2}x)\end{array}$$  
$$\begin{array}{ccc}& \le & {\sum}_{k}P\left[{\eta}^{2}\right({R}_{{\alpha}_{k}})\ge r{\sigma}^{2}(1+{L}_{k}\left)\right[Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)+{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)]+{\sigma}^{2}x]\end{array}$$  
$$\begin{array}{ccc}& \le & {\sum}_{k}exp\{\sqrt{d(1/{\rho}^{2}({R}_{{\alpha}_{k}})x+r{L}_{k}[Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)/{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)+1\left]\right)}\}\end{array}$$  
Since for
$X$
positive
$IEX={\int}_{0}^{\infty}P(X>u)du$
, we then have that
$$\begin{array}{ccc}& & IE[{sup}_{{\alpha}_{k}}\parallel {R}_{{\alpha}_{k}}{\varepsilon}_{m}{\parallel}^{2}pen({\alpha}_{k}\left)\right]={\int}_{0}^{\infty}P[{sup}_{{\alpha}_{k}}\parallel {R}_{{\alpha}_{k}}{\varepsilon}_{m}{\parallel}^{2}pen({\alpha}_{k})\ge x]dx\end{array}$$  
$$\begin{array}{ccc}& =& {\sigma}^{2}{\int}_{0}^{\infty}P[{sup}_{{\alpha}_{k}}\parallel {R}_{{\alpha}_{k}}{\varepsilon}_{m}{\parallel}^{2}pen({\alpha}_{k})\ge {\sigma}^{2}u]du\end{array}$$  
$$\begin{array}{ccc}& \le & {\sigma}^{2}{\sum}_{k}{\int}_{0}^{\infty}exp\{\sqrt{{k}_{1}u+{k}_{2}}\}du.\end{array}$$  
where
${k}_{1}=d/{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)$
and
${k}_{2}=dr{L}_{k}\left[Tr\right({R}_{{\alpha}_{k}}^{T}{R}_{{\alpha}_{k}})/{\rho}^{2}({R}_{{\alpha}_{k}})+1]$
.
Let
$w={k}_{1}u+{k}_{2},$
then
$$\begin{array}{cc}IE[{sup}_{{\alpha}_{k}}\parallel {R}_{{\alpha}_{k}}{\varepsilon}_{m}{\parallel}^{2}pen({\alpha}_{k}\left)\right]& \le {\sigma}^{2}{\sum}_{k}{\int}_{{k}_{2}}^{\infty}\frac{1}{{k}_{1}}exp\{\sqrt{w}\}dw\end{array}$$  
$$\begin{array}{cc}& ={\sigma}^{2}{\sum}_{k}\frac{2}{{k}_{1}}[\sqrt{{k}_{2}}+1]exp\{\sqrt{{k}_{2}}\}\end{array}$$  
$$\begin{array}{}\end{array}$$  
Finally, we have the desired result.
$$\begin{array}{cc}IE\parallel {\stackrel{~}{f}}_{m}{\hat{f}}_{{\alpha}_{\hat{k}}}{\parallel}^{2}\le \frac{1}{(1\nu )}{inf}_{k\in \mathcal{K}}\left[C\right(1+\nu )\parallel {\stackrel{~}{f}}_{m}{f}_{{\alpha}_{k}}{\parallel}^{2}+2pen({\alpha}_{k}\left)\right]+\frac{{C}_{1}\left(d\right)}{n}& \end{array}$$  
$$\begin{array}{}\end{array}$$  
where
${C}_{1}\left(d\right)=4{\sigma}^{2}{\sum}_{k}\frac{n{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}{d}\left[\sqrt{dr{L}_{k}\left[\frac{Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)}{{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}+1\right]}+1\right]{e}^{\sqrt{dr{L}_{k}\left[\frac{Tr\left({R}_{{\alpha}_{k}}^{t}{R}_{{\alpha}_{k}}\right)}{{\rho}^{2}\left({R}_{{\alpha}_{k}}\right)}+1\right]}},$
□
5 Regularization by iterative methods
Iterative regularization methods, are very competitive methods for linear inverse problems.
In iterative regularization, one picks an initial guess
${f}_{0}$
for the unknown
$\stackrel{~}{f}$
, and then one iteratively constructs updated approximations via a regularization scheme. The regularization parameter associated with iterative regularization is thus the
$\u201c$
stopping point”of the iterative sequence, and an important part of the mathematical theory is the development of stopping criteria for terminating the iteration. In other words, the iteration index plays the role of the regularization parameter
$\alpha $
, and the stopping criteria plays of the parameter selection method.
5.1 Descent Methods for Linear Inverse Problems
As an example of iterative regularization, we consider descent methods. Descendent methods have become quite popular in the last years for the solution of linear inverse problems and for nonlinear inverse problems [
11]
. In this subsection we consider two examples.
As an approximation of
${\stackrel{~}{f}}_{m}$
we will choose
${f}_{m,\alpha}$
such that
$$\begin{array}{c}{f}_{m,\alpha}=[I{A}_{m}^{*}{A}_{m}{Q}_{\alpha}({A}_{m}^{*}{A}_{m}\left)\right]{f}_{0}+{Q}_{\alpha}\left({A}_{m}^{*}{A}_{m}\right){A}_{m}^{*}{\Pi}_{{Y}_{m}}^{n}y\end{array}$$ 
(5.1)

where
${f}_{0}\in {F}_{m}$
is an initial approach and this
${f}_{0}\in \mathcal{N}({A}_{m}{)}^{\perp}$
[
11]
.
Most iterative methods for approximating
$\stackrel{~}{f}$
are based on a transformation of the normal equation into equivalent fixed point equations like
$$f=f+{A}_{m}^{*}({A}_{m}fy)$$
If
$\parallel {A}_{m}{\parallel}^{2}<2$
then the corresponding fixed point operator
$I{A}_{m}^{*}{A}_{m}$
is nonexpansive and one may apply the method of successive approximations. It must be emphasized that
$I{A}_{m}^{*}{A}_{m}$
is no contradiction if our inverse problem is illposed, since the spectrum of
${A}_{m}^{*}{A}_{m}$
clusters at the origin.
5.2 Landweber iteration
In this subsection we presented the wellknown Landweber iteration, which arises from converting the necessary conditions for minimizing 2.1 into a fixed point iteration. Much development in the last few years has taken place in advancing the theory of Landweber iteration for linear and nonlinear inverse problems.
Using the terminology of the last sections, we introduce the function
$$\begin{array}{c}{Q}_{k}\left(\lambda \right)={\sum}_{j=0}^{k1}(1\lambda {)}^{j}={\lambda}^{1}(1(1\lambda {)}^{k})\end{array}$$ 
(5.2)

We call
${Q}_{k}$
the iteration polynomial of degree
$k1$
. Associated with it is the polynomial
$${r}_{k}\left(\lambda \right)=1\lambda {Q}_{k}\left(\lambda \right)=(1\lambda {)}^{k}$$
of degree
$k$
, which is called the residual polynomial since it determines the residual
$y{A}_{m}{f}_{m,k}$
.
Thus, inserting the equation
5.2 in 5.1 we obtain recursively,
$$\begin{array}{c}{f}_{m,k+1}={f}_{m,k}{A}_{m}^{*}({A}_{m}{f}_{m,k}{y}_{m}),k=0,1,\dots \end{array}$$ 
(5.3)

starting from an initial guess
${f}_{0}$
. This is a steepest descent method called the linear version of Landweber's iteration. Each step of the iterative process 5.3 is carried out along the direction opposite to the direction of the gradient of the quadratic functional
$J\left(f\right)$
in 2.1 . It is known that there is the greatest decrease of the functional along this direction.
If
$\parallel {A}_{m}\parallel \le 1$
, we considerer
$\lambda \in (0,1]$
such that in this interval
$\lambda {Q}_{k}\left(\lambda \right)$
is uniformly bounded and since
${Q}_{k}\left(\lambda \right)$
converge to
$1/\lambda $
as
$k\to \infty $
then according to Theorem 3.2 the sequences
${f}_{m,k}$
converge to
${\stackrel{~}{f}}_{m}$
when
$y\in \mathcal{D}\left({A}_{m}^{\u2020}\right)$
. If
$\parallel {A}_{m}\parallel $
is not bounded by one, then we introduce a relaxation parameter
$0<\tau <\parallel {A}_{m}{\parallel}^{2}$
in front of
${A}_{m}^{*}$
in 5.3 , i.e, we would iterate
$$\begin{array}{c}{f}_{m,k+1}={f}_{m,k}\tau {A}_{m}^{*}({A}_{m}{f}_{m,k}y),k=0,1,\dots \end{array}$$ 
(5.4)

If
$\tau \equiv {\tau}_{k}$
, one can obtain various variants of the method of steepest descent depending on a choice of the sequence
${\tau}_{k}$
. The Landweber iteration 5.4 is usually called a method of simple iteration.
In the following we derive a simple estimate for the error propagation in the Landweber iteration. We then have the following result,
Corollary 5.1.
Let
$\tau =1/(2\parallel {A}_{m}{\parallel}^{2})<1/{\lambda}_{1}$
. If
$y\in \mathcal{\mathcal{R}}\left({A}_{m}\right)$
, then the Landweber iteration is an order optimal regularization method, i.e,
$$\parallel {\stackrel{~}{f}}_{m}{f}_{k\left(m\right)}{\parallel}^{2}\le 2{c}_{1}{k}^{2\mu}+2{c}_{2}\frac{{\sigma}^{2}}{n}(\tau k{)}^{(2p+1)/2p},$$
where
${c}_{1}={\rho}^{2}(\frac{\mu}{\tau e}{)}^{\mu}$
and
${c}_{2}=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}.$

Proof.
To apply Theorem 3.4 we have to study the terms of the bias,
$IE\parallel {\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}{\parallel}^{2}$
, and variance
$IE\parallel {R}_{\alpha}{A}_{m}\stackrel{~}{f}{f}_{k\left(m\right)}\parallel $
. By 5.2 we have
$${\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}=(I{A}_{m}^{*}{A}_{m}{Q}_{k}({A}_{m}^{*}{A}_{m}\left)\right){\stackrel{~}{f}}_{m}=(I{A}_{m}^{*}{A}_{m}{)}^{k}{\stackrel{~}{f}}_{m}$$
We have to study the residual polynomial
${r}_{k}\left(\lambda \right)=(1\lambda {)}^{k}$
of the Landweber iteration.
For
$0\le \lambda \le \parallel {A}_{m}{\parallel}^{2}$
the function
$${\lambda}^{\mu}1\lambda {Q}_{k}(\lambda \left)\right$$
assumes its maximum for
$\lambda ={\tau}^{1}\mu (\mu +k{)}^{1}$
.
Thus, we have
$$\begin{array}{cc}{\lambda}^{\mu}1\lambda {Q}_{k}(\lambda \left)\right& \le {max}_{0\le \lambda \le \parallel {A}_{m}{\parallel}^{2}}{\lambda}^{\mu}1\lambda {Q}_{k}(\lambda \left)\right\end{array}$$  
$$\begin{array}{cc}& <\frac{{\mu}^{\mu}}{{\tau}^{\mu}(\mu +k{)}^{\mu}}\frac{{k}^{k}}{(\mu +k{)}^{k}}\end{array}$$  
$$\begin{array}{cc}& <{\left(\frac{\mu}{\tau e}\right)}^{\mu}{k}^{\mu}\end{array}$$  
$$\begin{array}{}\end{array}$$  
This leads to numbers
${\omega}_{\mu}\left(k\right)$
as introduced in Theorem 3.4
$${\omega}_{\mu}\left(k\right)={\left(\frac{\mu}{\tau e}\right)}^{\mu}{k}^{\mu}$$
Thus, the term corresponding to the bias is bounded by
$$\parallel {\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}{\parallel}^{2}\le {\rho}^{2}(\frac{\mu}{\tau e}{)}^{2\mu}{k}^{2\mu}.$$
Next, we establish bounds for the variance term. By assumption, the singular values satisfy
${\lambda}_{j}\approx {j}^{2p}.$
Note that for small values of
${\lambda}_{j}$
we have
${Q}_{k}^{2}\left(\lambda \right)\le (\tau k{)}^{2}$
$\forall $
$j>{m}^{\prime}$
and for big values of
${\lambda}_{j}$
(
${\lambda}_{j}\approx {\lambda}_{1}$
)
${Q}_{k}^{2}\left({\lambda}_{j}\right)\le {\lambda}_{j}^{2}$
$\forall $
$j<{m}^{\prime}$
.
Consequently,
$$\begin{array}{cc}nTr\left({Q}_{k}^{2}\right({A}_{m}{A}_{m}^{*}\left){A}_{m}{A}_{m}^{*}\right)={\sum}_{j=1}^{m}{Q}_{k}^{2}\left({\lambda}_{j}\right){\lambda}_{j}& \le {\sum}_{j=1}^{{m}^{\prime}}{\lambda}_{j}^{1}+{\sum}_{j>{m}^{\prime}}(\tau k{)}^{2}{\lambda}_{j}\end{array}$$  
$$\begin{array}{cc}& \le {\int}_{0}^{{m}^{\prime}}{s}^{2p}ds+(\tau k{)}^{2}{\int}_{0}^{{m}^{\prime}}{s}^{2p}ds\end{array}$$  
$$\begin{array}{}\end{array}$$  
This suggest searching
${m}^{\prime}\approx c(\tau k{)}^{1/2p}$
for
$p>1/2$
, where
$c=(\frac{2p+1}{2p+1}{)}^{1/4p}$
. Hence we have,
$$\begin{array}{cc}IE\parallel {R}_{\alpha}{A}_{m}\stackrel{~}{f}{f}_{k\left(m\right)}{\parallel}^{2}& =Tr\left({Q}_{k}^{2}\right({A}_{m}{A}_{m}^{*}\left){A}_{m}{A}_{m}^{*}\right)\end{array}$$  
$$\begin{array}{cc}& \le \frac{{c}^{2p+1}}{2p+1}\frac{(\tau k{)}^{(2p+1)/2p}}{n}.\end{array}$$  
$$\begin{array}{}\end{array}$$  
Finally, this implies
$$IE\parallel {\stackrel{~}{f}}_{m}{f}_{k\left(m\right)}{\parallel}^{2}\le 2{c}_{1}{k}^{2\mu}+2{c}_{2}\frac{{\sigma}^{2}}{n}(\tau k{)}^{(2p+1)/2p},$$
where
${c}_{1}={\rho}^{2}(\frac{\mu}{\tau e}{)}^{\mu}$
and
${c}_{2}=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}$
.
□
Remark 5.2.
Note that under the above inequality is satisfied if the dimension of the set is such that
${d}_{{m}_{0}}\approx {n}^{\frac{1}{4\mu p+2p+1}}$
. Here, the optimal choice of regularization sequence, depending on
$p$
and
$\mu $
. The optimal rates are of order
$IE\parallel \stackrel{~}{f}{f}_{k\left(m\right)}{\parallel}^{2}=\mathcal{O}\left({n}^{\frac{4\mu p}{4\mu p+2p+1}}\right)$
. Analogous results are obtained in the illposed problem literature, see for example [
5]
, where typically in a Hilbert scale setting optimal rates are of order
$\mathcal{O}\left({n}^{\frac{2s}{2s+2p+1}}\right)$
, with
$s=2\mu p$
.
We are ready to state our main result for the Landweber iteration, which bounds the mean squared error of the select estimate
${\hat{f}}_{\hat{k}}$
basically by the smallest mean squared error among the estimates
${f}_{k}$
plus a remainder term of order
$1/n$
. The result follows from Theorem 4.1 .
Corollary 5.3.
Let
$\tau =1/(2\parallel {A}_{m}{\parallel}^{2})<1/{\lambda}_{1}$
. Next assume
$\hat{k}$
as in 4.2 and
${d}_{{m}_{0}}$
as in 4.1 . If
$y\in \mathcal{\mathcal{R}}\left({A}_{m}\right)$
then for any
$f\in {F}_{m}$
and any
$k$
, the following inequality holds true
$$\begin{array}{cc}IE\parallel {\stackrel{~}{f}}_{m}{\hat{f}}_{\hat{k}}{\parallel}^{2}& \le \frac{1}{(1\nu )}{inf}_{k\in \mathcal{K}}\left[C(1+\nu )\parallel {\stackrel{~}{f}}_{m}{f}_{k}{\parallel}^{2}+\frac{2r{\sigma}^{2}(1+{L}_{k})\left(c\right(\tau k{)}^{\frac{2p+1}{2p}}+\tau k)}{n}\right]\end{array}$$  
$$\begin{array}{cc}& +\frac{4{\sigma}^{2}}{n}{\sum}_{k}\frac{\tau k}{d}\left[\sqrt{dr{L}_{k}\left[c(\tau k{)}^{1/2p}+1\right]}+1\right]{e}^{\sqrt{dr{L}_{k}\left[c(\tau k{)}^{1/2p}+1\right]}},\end{array}$$  
$$\begin{array}{}\end{array}$$  
for some
$C>0$
and
$c=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}$
.

Proof.
For fixed
${\lambda}_{j}$
and
$m$
we have that the terms of the trace and spectral radius are bounded by the follows expression
$$\begin{array}{c}Tr\left({R}_{k}^{t}{R}_{k}\right)={\sum}_{j=1}^{m}{Q}_{k}^{2}\left({\lambda}_{j}\right){\lambda}_{j}\le {\sum}_{j=1}^{{m}^{\prime}}{\lambda}_{j}^{1}+{\sum}_{j>{m}^{\prime}}(\tau k{)}^{2}{\lambda}_{j}\end{array}$$ 
(5.5)

and
$$\begin{array}{c}{\rho}^{2}\left({R}_{k}^{t}{R}_{k}\right)={max}_{j\in m}{Q}_{k}^{2}\left({\lambda}_{j}\right){\lambda}_{j}\le {max}_{j\le {m}^{\prime}}{\lambda}_{j}^{1}+{max}_{j>{m}^{\prime}}(\tau k{)}^{2}{\lambda}_{j}.\end{array}$$ 
(5.6)

Balancing both terms in 5.5 and 5.6 gives the optimal choice of the trace and the spectral radius, respectively. Thus, we have
$$Tr\left({R}_{k}^{t}{R}_{k}\right)\approx \frac{1}{2p+1}{\left(\frac{2p+1}{2p1}\right)}^{\frac{(2p+1)}{4p}}\frac{(\tau k{)}^{\frac{2p+1}{2p}}}{n}$$
and
$${\rho}^{2}\left({R}_{k}^{t}{R}_{k}\right)\approx \frac{\tau k}{n}$$
Note that the penalization term is roughly proportional to
$$\frac{1}{n}\left[\frac{1}{2p+1}{\left(\frac{2p+1}{2p1}\right)}^{\frac{(2p+1)}{4p}}(\tau k{)}^{\frac{2p+1}{2p}}+\tau k\right]$$
On the other hand
$$\frac{Tr\left({R}_{\alpha}^{t}{R}_{\alpha}\right)}{{\rho}^{2}\left({R}_{\alpha}\right)}=\frac{1}{2p+1}{\left(\frac{2p+1}{2p1}\right)}^{(2p+1)/4p}(\tau k{)}^{1/2p}.$$
The result then follows directly from Theorem 4.1 . □
5.3 Nonlinear multistep iterative process
Many approximate methods widely used in practice are nonlinear. We cite a important example of nonlinear approximate method. We considerer a nonlinear multistep iterative process, which have error residual
$$1\lambda {Q}_{k}\left(\lambda \right){=}^{k}{\prod}_{i=1}(1{\tau}_{ik}^{1}\lambda )$$
with
${\tau}_{ik}={\tau}_{ik}({f}_{0},A,y)>0,0<{\tau}_{1k}\le {\tau}_{2k}\dots \le {\tau}_{kk}\le {\lambda}_{1}$
. Then for
$\lambda >0$
,
${Q}_{k}\left(\lambda \right)$
have the following representation
$${Q}_{k}\left(\lambda \right)={\lambda}^{1}\left[1{}^{k}{\prod}_{i=1}\right(1{\tau}_{ik}^{1}\lambda \left)\right]$$
The following corollary is established.
Corollary 5.4.
Let
${\tau}_{ik}={\tau}_{ik}({f}_{0},A,y)>0,$
with
$0<{\tau}_{1k}\le {\tau}_{2k}\dots \le {\tau}_{kk}\le {\lambda}_{1}$
. If
$y\in \mathcal{\mathcal{R}}\left({A}_{m}\right)$
, then the nonlinear multistep iterative process is an order optimal regularization method, i.e,
$$IE\parallel {\stackrel{~}{f}}_{m}{f}_{k\left(m\right)}{\parallel}^{2}\le 2{c}_{1}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{2\mu}+2{c}_{2}\frac{{\sigma}^{2}}{n}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{(2p+1)/2p},$$
where
${c}_{1}={\rho}^{2}{\mu}^{\mu}(\mu +1{)}^{1}$
and
${c}_{2}=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}.$

Proof.
As before, we investigate the behavior of the bias and the variance.
In the relation
$${\lambda}^{\mu}(1\lambda {Q}_{k}(\lambda \left)\right)={{\lambda}^{\mu}}^{k}{\prod}_{i=1}(1{\tau}_{ik}^{1}\lambda )$$
the least upper can not be reached at the points
$\lambda =0$
and
$\lambda ={\tau}_{1k}$
, since the estimated function is not equal to zero identically.
On the other hand
$$\begin{array}{c}[1\lambda {Q}_{k}(\lambda \left){]}^{\prime}\right[\lambda {Q}_{k}\left(\lambda \right)1{]}^{1}={\sum}_{i=1}^{k}\frac{1}{{\tau}_{ik}\lambda}\end{array}$$ 
(5.7)

Since the function in the righthand of 5.7 does not decrease as a function of
$\lambda $
on the halfinterval
$[0,{\tau}_{1k})$
then, the estimates of the Theorem 3.5 are valid.
Thus, for
$0\le \lambda \le {\tau}_{1k}$
, we have
$$\begin{array}{cc}{sup}_{0\le \lambda \le {\tau}_{1k}}\left{Q}_{k}\right(\lambda \left)\right& ={Q}_{k}\left(0\right)={\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}\end{array}$$  
$$\begin{array}{}\end{array}$$  
and
$$\begin{array}{cc}{sup}_{0\le \lambda \le {\tau}_{1k}}{\lambda}^{\mu}1\lambda {Q}_{k}(\lambda \left)\right& <{\mu}^{\mu}(\mu +1{)}^{1}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{\mu}\end{array}$$  
$$\begin{array}{}\end{array}$$  
Note that
${\omega}_{\mu}\left(k\right)=({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{\mu}.$
Thus, the bias is bounded by
$$\parallel {\stackrel{~}{f}}_{m}{R}_{\alpha}{A}_{m}\stackrel{~}{f}{\parallel}^{2}\le {c}_{1}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{2\mu},$$
where
${c}_{1}={\rho}^{2}{\mu}^{2\mu}(\mu +1{)}^{2}$
.
On the order hand, it is not difficult to see that
$$\begin{array}{cc}Tr\left({Q}_{k}^{2}\right({A}_{m}{A}_{m}^{*}\left){A}_{m}{A}_{m}^{*}\right)& \le \frac{1}{n}\left[{\sum}_{1\le j\le {m}^{\prime}}{j}^{2p}+({\sum}_{i=1}^{k}{\tau}_{ik}^{1}{)}^{2}{\sum}_{j>{m}^{\prime}}{j}^{2p}\right]\le \frac{1}{n}\left[\frac{{{m}^{\prime}}^{2p+1}}{2p+1}+({\sum}_{i=1}^{k}{\tau}_{ik}^{1}{)}^{2}\frac{{{m}^{\prime}}^{2p+1}}{2p1}\right]\end{array}$$  
$$\begin{array}{}\end{array}$$  
This suggest searching
$${m}^{\prime}\approx c({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{1/2p}$$
with
$c=(\frac{2p+1}{2p+1}{)}^{1/4p}$
.
Thus, we have that the term variance is bounded by
$$IE\parallel {R}_{\alpha}{A}_{m}\stackrel{~}{f}{f}_{k\left(m\right)}\parallel ={\sigma}^{2}Tr\left({Q}_{k}^{2}\right({A}_{m}^{*}{A}_{m}\left){A}_{m}^{*}{A}_{m}\right)\le {c}_{2}\frac{{\sigma}^{2}}{n}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{(2p+1)/2p}$$
where
${c}_{2}=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}$
Finally we have
$$IE\parallel {\stackrel{~}{f}}_{m}{f}_{k\left(m\right)}{\parallel}^{2}\le 2{c}_{1}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{2\mu}+2{c}_{2}\frac{{\sigma}^{2}}{n}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{(2p+1)/2p}$$
Balancing the bias and variance terms gives the optimal choice
$$IE\parallel {\stackrel{~}{f}}_{m}{f}_{k\left(m\right)}{\parallel}^{2}=\mathcal{O}\left({n}^{\frac{4\mu p}{4\mu p+2p+1}}\right).$$
□
We have the following result.
Corollary 5.5.
Let
${\tau}_{ik}$
be as in corollary 5.4 . Next assume
$\hat{k}$
as in 4.2 and
${d}_{{m}_{0}}$
as in 4.1 . If
$y\in \mathcal{\mathcal{R}}\left({A}_{m}\right)$
then for any
$f\in {F}_{m}$
and any
$k$
, the following inequality holds true
$$\begin{array}{cc}IE\parallel {\stackrel{~}{f}}_{m}{\hat{f}}_{\hat{k}}{\parallel}^{2}& \le \frac{1}{(1\nu )}{inf}_{k\in \mathcal{K}}\left[C(1+\nu )\parallel {\stackrel{~}{f}}_{m}{f}_{k}{\parallel}^{2}+\frac{2r{\sigma}^{2}(1+{L}_{k})\left(c\right({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{\frac{2p+1}{2p}}+{\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1})}{n}\right]\end{array}$$  
$$\begin{array}{cc}& +\frac{4{\sigma}^{2}}{n}{\sum}_{k}\frac{{\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}}{d}\left[\sqrt{dr{L}_{k}\left[c({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{1/2p}+1\right]}+1\right]{e}^{\sqrt{dr{L}_{k}\left[c({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{1/2p}+1\right]}},\end{array}$$  
$$\begin{array}{}\end{array}$$  
for some
$C>0$
and
$c=\frac{1}{2p+1}(\frac{2p+1}{2p1}{)}^{(2p+1)/4p}$
.

Proof.
First observe that
$$Tr\left({R}_{k}^{t}{R}_{k}\right)\approx \frac{1}{2p+1}{\left(\frac{2p+1}{2p1}\right)}^{\frac{(2p+1)}{4p}}\frac{({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{\frac{2p+1}{2p}}}{n}$$
and
$${\rho}^{2}\left({R}_{k}^{t}{R}_{k}\right)\approx \frac{{\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}}{n}$$
Consequently
$$\frac{Tr\left({R}_{k}^{t}{R}_{k}\right)}{\rho \left({R}_{k}\right)}\approx \frac{1}{2p+1}{\left(\frac{2p+1}{2p1}\right)}^{\frac{(2p+1)}{4p}}({\sum}_{i=1}^{k}{\tau}_{{i}_{k}}^{1}{)}^{\frac{1}{2p}}$$
Note that both
$n{\rho}^{2}\left({R}_{k}\right)$
and
$Tr\left({R}_{k}^{t}{R}_{k}\right)/{\rho}^{2}\left({R}_{k}\right)$
do not depend on n. The proof then follows directly from theorem 4.1 .
□
References

Barron A. et al., (1999). “Risk Bounds for Model Selection via Penalization ”. Probab. Theory and Related Fields. 113, pp. 467493.

Birg L., and Massart P., (1978). “Minimum Contrast Estimators on Sieves: Exponential Bounds and Rates of Convergence ”. Bernoulli, Vol4.N3, pp 329395.

Bousquet O., (2002). “ Concentration Inequalities for SubAdditive Functions Using of Entropy Method.”

Burger, M., (2001). “A level set method for inverse problems ”. Inverse Problems 17, pp. 13271356.

Cavalier L., Golubev G., Picard D., and Tsybakov A., (2002). “Oracle inequalities for inverse problem ”. Ann. Statist, Vol30. N3, pp 843874.

Deuflhard P., Engl H. and Scherzer O., (1998) “A convergence analysis of iterative methods for the solution of nonlinear illposed problems under affinely invariant conditions ”. Inverse Problems Vol14, pp. 10811106.

Dey A.K. et al, (1996) “CrossValidation for Parameter Selection in inverse estimation problems ”. Scand. J. Statist., Vol23. N4, pp. 609620.

Engl H., (1993) “Regularization methods for the stable solution of inverse problems ”. Surveys on Mathematics for Industry 3, pp. 71143.

Engl H., Hanke M. and Neubauer A., (1996) “Reguralization of Inverse Problems ”. Kluwer Academic Publishers.

Engl H. and Grever W., (1994) “Using the Lcurve for Determinig Optimal Regularization parameters ”. Numer. MAth. Vol. 69, pp. 2531.

Gilyazov S.F., and Gol'dman N.L., (2000). “Reguralization of IllPosed Problems by Iteration Methods”. Kluwer Academic Publishers.

Cohen A., Hoffmann M., and Reiss M. “Adaptive Wavelet Galerkin Methods for Linear Inverse Problems”. SIAM J. Numer. Anal. Vol. 42, No. 4, pp. 14791501.

Kilmer M.E, and O'leary D.P., (2001). “Choosing Reguralization Parameters in iterative Methods for IllPosed Problems”. SIAM J. MATRIX ANAL. APPL. Vol. 22, N4, pp. 12041221.

Lamm P.K., (1999).“Some Recent Developments and Open Problems in Solution Methods for Mathematical Inverse Problems”. Department of Mathematics, Michihan state University, USA.

Ledoux, M., and Talagrand, (1996). “Deviation Inequalities for Product Measures ”. ESAIM: Probabilities and Statistics 1, pp. 6387.

Loubes J.M. and Lude
$\stackrel{~}{n}$
a C., (2004). “Penalized Estimators for Nonlinear Inverse Problems. ”.

Loubes J.M., and Van De Geer S., (2002). “Adaptive estimation in regression, using soft thresholding type penalties ”. Statistica Neerlandica, 56, pp 453478.

Lude
$\stackrel{~}{n}$
a C., and Rios,(2003).“Penalized Model Selection for Illposed Linear Problems. ”.

Mathé P., and Pereverezev S.V., (2003) “Discretization Strategy for Linear Illposed problems in variable Hilbert Scales ”. Inverse Problems, Vol.19, N6, pp. 12631279.

Morozov V.A., (1966). “On the Solution of Functional Equations by the Method of Regularization ”. Soviet Math. Dokl.,7, pp. 414417.

F. O'sullivan., (1986).“A Statistical Perspective on IllPosed Inverse Problems ”. Statistical Science. Vol. 1, N4, pp. 502527.

Pereverzev S. and Schock E., (2003). “On the adaptive selection of the parameter in regularization of illposed problems ”.

Tikhonov, A.,and Arsenin, V., (1977).“Solutions of IllPosed Problems ”. Wiley, New York.
Escuela de Matematicas, Facultad de Ciencias, UCV, Av. Los Ilustres, Los Chaguaramos, Codigo Postal 1020A, Caracas Venezuela. Telf.: (58)2126051481.
Departamento de Matematicas, IVIC, Carretera Panamericana KM. 11, Aptdo. 21827, Codigo Postal 1020A, Caracas Venezuela. Telf.: (58)2125041412.
Email address : afermin@euler.ciens.ucv.ve Email address : cludena@ivic.ve