While Robinson did not discuss implementational issues of such a generalized SQP approach, the recent progress in the solution of linear SDPs makes this approach especially interesting for the solution of NLSDPs. We first present a derivation of a generalized SQP method, namely the sequential semidefinite programming (SSP) method, for solving NLSDPs. In order to analyze the convergence of the SSP method, we present a sensitivity result for certain local optimal solutions of general, possibly nonconvex, quadratic semidefinite programs. We then use this result to derive a selfcontained proof of local quadratic convergence of the SSP method under the assumptions that the optimal solution is locally unique, strictly complementary, and satisfies a secondorder sufficient condition.
One of the first numerical approaches for solving a class of NLSDPs was given in [
23,
24]
.
Other recent approaches for solving NLSDPs are the program package LOQO [
33]
based on a primaldual method; see also [
34]
. Another promising approach for solving largescale SDPs is the modifiedbarrier method proposed in [
17]
. This modifiedbarrier approach does not require the barrier parameter to converge to zero, and may thus overcome some of the problems related to illconditioning in traditional interiorpoint methods. Further approaches to solving NLSDPs have been presented in [
11,
12,
31]
. In [
11]
, the augmented Lagrangian method is applied to NLSDPs, while the approach proposed in [
12]
is based on an SQP method generalized to NLSDPs. The paper [
12]
also contains a proof of local quadratic convergence. However, in contrast to this paper, the algorithm [
12]
is not derived from a comparison with interiorpoint algorithms, and the proof of convergence does not use any differentiability properties of the optimal solutions. In [
10]
, Correa and Ramírez present a proof of global convergence of a modification of the method proposed in [
12]
. The modification employs certain merit functions to control the step lengths of the SQP algorithm.
The remainder of this paper is organized as follows. In Section
2 , we introduce some notation. In Section 3 , we describe a class of nonlinear semidefinite programs that arise in passive reducedorder modeling. In Section 4 , we recall known results for linear SDPs in a form that can be easily transferred to NLSDPs. In Section 5 , we discuss primaldual systems for NLSDPs, and in Section 6 , the SSP method is introduced as a generalized SQP method. In Section 7 , we present sensitivity results, first for a certain class of quadratic SDPs, and then for general NLSDPs. Based on these sensitivity results, in Section 8 , we give a selfcontained proof of local quadratic convergence of the SSP method. In Section 9 , we present results of some numerical experiments. Finally, in Section 10 , we make some concluding remarks.
2 Notation
Throughout this article, all vectors and matrices are assumed to have real entries. As usual,
${Y}^{T}=\left[\begin{array}{c}{y}_{ji}\end{array}\right]$
denotes the transpose of the matrix
$Y=\left[\begin{array}{c}{y}_{ij}\end{array}\right]$
. The vector norm
$\parallel x\parallel :=\sqrt{{x}^{T}x}$
is always the Euclidean norm and
$\parallel Y\parallel :={max}_{\parallel x\parallel =1}\parallel Yx\parallel $
is the corresponding matrix norm. For vectors
$x\in {\mathbb{R}}^{n}$
,
$x\ge 0$
means that all entries of
$x$
are nonnegative, and
$Diag\left(x\right)$
denotes the
$n\times n$
diagonal matrix the diagonal entries of which are the entries of
$x$
. The
$n\times n$
identity matrix is denoted by
${I}_{n}$
.
The trace inner product on the space of real
$n\times m$
matrices is given by
$$\langle Z,Y\rangle :=Z\bullet Y:=trace\left({Z}^{T}Y\right)={\sum}_{i=1}^{n}{\sum}_{j=1}^{m}{z}_{ij}{y}_{ij}$$
for any pair
$Y=\left[\begin{array}{c}{y}_{ij}\end{array}\right]$
and
$Z=\left[\begin{array}{c}{z}_{ij}\end{array}\right]$
of
$n\times m$
matrices. The space of real symmetric
$m\times m$
matrices is denoted by
${\mathcal{S}}^{m}$
. The notation
$Y\succcurlyeq 0$
(
$Y\succ 0$
) is used to indicate that
$Y\in {\mathcal{S}}^{m}$
is symmetric positive semidefinite (positive definite).
Semidefiniteness constraints are expressed by means of matrixvalued functions from
${\mathbb{R}}^{n}$
to
${\mathcal{S}}^{m}$
. We use the symbol
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
if such a function is linear, and the symbol
$\mathcal{\mathcal{B}}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
if such a function is nonlinear.
Note that any linear function
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
can be expressed in the form
$$\begin{array}{c}\mathcal{A}\left(x\right)={\sum}_{i=1}^{n}{x}_{i}{A}^{\left(i\right)}\text{for all}x\in {\mathbb{R}}^{n},\end{array}$$ 
(1)

with symmetric matrices
${A}^{\left(i\right)}\in {\mathcal{S}}^{m}$
,
$i=1,2,...,n$
. Based on the representation ( 1 ) we introduce the norm
$$\begin{array}{c}\parallel \mathcal{A}\parallel :={\left({\sum}_{i=1}^{n}{\parallel {A}^{\left(i\right)}\parallel}^{2}\right)}^{\frac{1}{2}}\end{array}$$ 
(2)

of
$\mathcal{A}$
. The adjoint operator
${\mathcal{A}}^{*}:{\mathcal{S}}^{m}\to {\mathbb{R}}^{n}$
with respect to the trace inner product is defined by
$$\langle \mathcal{A}\left(x\right),Y\rangle =\langle x,{\mathcal{A}}^{*}\left(Y\right)\rangle ={x}^{T}{\mathcal{A}}^{*}\left(Y\right)\text{for all}x\in {\mathbb{R}}^{n}\text{and}Y\in {\mathcal{S}}^{m}.$$
It turns out that
$$\begin{array}{c}{\mathcal{A}}^{*}\left(Y\right)=\left[\begin{array}{ccc}{A}^{\left(1\right)}\bullet Y& ...& {A}^{\left(n\right)}\bullet Y\end{array}\right]\text{for all}Y\in {\mathcal{S}}^{m}.\end{array}$$ 
(3)

We always assume that nonlinear functions
$\mathcal{\mathcal{B}}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
are at least
${\mathcal{C}}^{2}$
differentiable. We denote by
$${B}^{\left(i\right)}\left(x\right):=\frac{\partial}{\partial {x}_{i}}\mathcal{\mathcal{B}}\left(x\right)\text{and}{B}^{(i,j)}\left(x\right):=\frac{{\partial}^{2}}{\partial {x}_{i}\partial {x}_{j}}\mathcal{\mathcal{B}}\left(x\right),i,j=1,2,...,n,$$
the first and second partial derivatives of
$\mathcal{\mathcal{B}}$
, respectively. For each
$x\in {\mathbb{R}}^{n}$
, the derivative
${D}_{x}\mathcal{\mathcal{B}}$
at
$x$
induces a linear function
${D}_{x}\mathcal{\mathcal{B}}\left(x\right):{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, which is given by
$${D}_{x}\mathcal{\mathcal{B}}\left(x\right)[\Delta x]:={\sum}_{i=1}^{n}(\Delta x{)}_{i}{B}^{\left(i\right)}(x)\in {\mathcal{S}}^{m}\text{for all}\Delta x\in {\mathbb{R}}^{n}.$$
In particular,
$$\mathcal{\mathcal{B}}(x+\Delta x)\approx \mathcal{\mathcal{B}}\left(x\right)+{D}_{x}\mathcal{\mathcal{B}}\left(x\right)[\Delta x],\Delta x\in {\mathbb{R}}^{n},$$
is the linearization of
$\mathcal{\mathcal{B}}$
at the point
$x$
. For any linear function
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, we have
$$\begin{array}{c}{D}_{x}\mathcal{A}\left(x\right)[\Delta x]=\mathcal{A}(\Delta x)\text{for all}x,\Delta x\in {\mathbb{R}}^{n}.\end{array}$$ 
(4)

We always use the expression on the righthand side of ( 4 ) to describe derivatives of linear functions.
We remark that for any fixed matrix
$Y\in {\mathcal{S}}^{m}$
, the map
$x\mapsto \mathcal{\mathcal{B}}\left(x\right)\bullet Y$
is a scalarvalued function of
$x\in {\mathbb{R}}^{n}$
. Its gradient at
$x$
is given by
$$\begin{array}{c}{\nabla}_{x}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)={\left({D}_{x}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)\right)}^{T}=\left[\begin{array}{ccc}{B}^{\left(1\right)}\left(x\right)\bullet Y& ...& {B}^{\left(n\right)}\left(x\right)\bullet Y\end{array}\right]\in {\mathbb{R}}^{n}\end{array}$$ 
(5)

and its Hessian by
$${\nabla}_{x}^{2}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)=\left[\begin{array}{ccccccccc}{B}^{(1,1)}\left(x\right)\bullet Y& \cdot \cdot \cdot & {B}^{(1,n)}\left(x\right)\bullet Y& ...& & ...& {B}^{(n,1)}\left(x\right)\bullet Y& \cdot \cdot \cdot & {B}^{(n,n)}\left(x\right)\bullet Y\end{array}\right]\in {\mathcal{S}}^{n}.$$
In particular, for any linear function
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, in view of ( 1 ), ( 3 ), and ( 5 ), we have
$$\begin{array}{c}{\nabla}_{x}\left(\mathcal{A}\left(x\right)\bullet Y\right)={\mathcal{A}}^{*}\left(Y\right).\end{array}$$ 
(6)

3 An application in passive reducedorder modeling
We remark that applications of linear SDPs include relaxations of combinatorial optimization problems and problems related to Lyapunov functions or the positive real lemma in control theory; we refer the reader to [
1,
3,
8,
14,
30,
32]
and the references given there. In this section, we describe an application in passive reducedorder modeling that leads to a class of NLSDPs. Roughly speaking, a system is called passive if it does not generate energy. For the special case of timeinvariant linear dynamical systems, passivity is equivalent to positive realness of the frequencydomain transfer function associated with the system. More precisely, consider transfer functions of the form
$$\begin{array}{c}{Z}_{n}\left(s\right)={B}_{2}^{T}{\left(G+sC\right)}^{1}{B}_{1},s\in \mathbb{C},\end{array}$$ 
(7)

where
$G,C\in {\mathbb{R}}^{n\times n}$
and
${B}_{1},{B}_{2}\in {\mathbb{R}}^{n\times m}$
are given data matrices. The integer
$n$
is the statespace dimension of the timeinvariant linear dynamical system, and
$m$
is the number of inputs and outputs of the system. In ( 7 ), the matrix pencil
$G+sC$
is assumed to be regular, i.e., the matrix
$G+sC$
is singular for only finitely many values of
$s\in \mathbb{C}$
. Note that
${Z}_{n}$
is an
$m\times m$
matrixvalued rational function of the complex variable
$s\in \mathbb{C}$
.
In
reducedorder modeling, one is given a largescale timeinvariant linear dynamical system of statespace dimension
$N$
, and the problem is to construct a “good” approximation of that system of statespace dimension
$n\ll N$
; see, e.g., [
13]
and the references given there. If the largescale system is passive, then for certain applications, it is crucial that the reducedorder model of statespace dimension
$n$
preserves the passivity of the original system. Unfortunately, some of the most efficient reducedorder modeling techniques do not preserve passivity. However, the reducedorder models are often “almost” passive, and passivity of the models can be enforced by perturbing the data matrices of the models. Next, we describe how the problem of constructing such perturbations leads to a class of NLSDPs. An
$m\times m$
matrixvalued rational function
$Z$
is called positive real if the following three conditions are satisfied
$:$

(i)
$Z$
is analytic in
${\mathbb{C}}_{+}:=\{s\in \mathbb{C}Re\left(s\right)>0\}$
;

(ii)
$Z\left(\overline{s}\right)=\overline{Z\left(s\right)}$
for all
$s\in \mathbb{C}$
;

(iii)
$Z\left(s\right)+{\left(Z\left(\overline{s}\right)\right)}^{T}\succcurlyeq 0$
for all
$s\in {\mathbb{C}}_{+}$
.
For functions
${Z}_{n}$
of the form ( 7 ) positive realness (and thus passivity of the system associated with
${Z}_{n}$
) can be characterized via linear SDPs; see, e.g., [
3,
14]
and the references given there. More precisely, if the linear SDP
$$\begin{array}{c}\begin{array}{cc}{P}^{T}G+{G}^{T}P& \succcurlyeq 0,\\ {P}^{T}C={C}^{T}P& \succcurlyeq 0,\\ {P}^{T}{B}_{1}& ={B}_{2},\end{array}\end{array}$$ 
(8)

has a solution
$P\in {\mathbb{R}}^{n\times n}$
, then the transfer function ( 7 ),
${Z}_{n}$
, is positive real. Conversely, under certain additional assumptions (see [
14]
), positive realness of
${Z}_{n}$
implies the solvability of the linear SDP ( 8 ).
Now assume that
${Z}_{n}$
in ( 7 ) is the transfer function of a nonpassive reducedoder model of a passive largescale system. Our goal is to perturb some of data matrices in ( 7 ) so that the perturbed transfer function is positive real. For the special case
$m=1$
, such an approach is discussed in [
5]
. In this case, there is a simple eigenvaluebased characterization [
4]
of positive realness. However, this characterization cannot be extended to the general case
$m\ge 1$
.
Another special case, which leads to linear SDPs, is described in [
9]
.
In the general case
$m\ge 1$
, we employ perturbations
${X}_{G}$
and
${X}_{C}$
of the matrices
$G$
and
$C$
in ( 7 ). The resulting perturbed transfer function is then of the form
$$\begin{array}{c}{\stackrel{~}{Z}}_{n}\left(s\right)={B}_{2}^{T}{\left(G+{X}_{G}+s(C+{X}_{C})\right)}^{1}{B}_{1},\end{array}$$ 
(9)

and the problem is to construct the perturbations
${X}_{G}$
and
${X}_{C}$
such that
${\stackrel{~}{Z}}_{n}$
is positive real.
Applying the characterization (
8 ) of positive realness to ( 9 ), we obtain the following nonconvex NLDSP:
$$\begin{array}{c}\begin{array}{cc}{P}^{T}(G+{X}_{G})+(G+{X}_{G}{)}^{T}P& \succcurlyeq 0,\\ {P}^{T}(C+{X}_{C})=(C+{X}_{C}{)}^{T}P& \succcurlyeq 0,\\ {P}^{T}{B}_{1}& ={B}_{2}.\end{array}\end{array}$$ 
(10)

Here, the unknowns are the matrices
$P,{X}_{G},{X}_{C}\in {\mathbb{R}}^{n\times n}$
. If ( 10 ) has a solution
$P,{X}_{G},{X}_{C}$
, then choosing the matrices
${X}_{G}$
and
${X}_{C}$
as the perturbations in ( 9 ) guarantees passivity of the reducedorder model given by the transfer function
${\stackrel{~}{Z}}_{n}$
.
4 Linear semidefinite programs
In this section, we briefly review the case of linear semidefinite programs.
Given a linear function
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, a vector
$b\in {\mathbb{R}}^{n}$
, and a matrix
$C\in {\mathcal{S}}^{m}$
, a pair of primal and dual linear semidefinite programs is as follows:
$$\begin{array}{c}\begin{array}{cc}\text{maximize}C\bullet Y\text{subject to}& Y\in {\mathcal{S}}^{m},Y\succcurlyeq 0,\\ & {\mathcal{A}}^{*}\left(Y\right)+b=0,\end{array}\end{array}$$ 
(11)

and
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}x\text{subject to}& x\in {\mathbb{R}}^{n},\\ & \mathcal{A}\left(x\right)+C\preccurlyeq 0.\end{array}\end{array}$$ 
(12)

We remark that this formulation is a slight variation of the standard pair of primaldual programs. We chose the above version in order to facilitate the generalization of problems of the form ( 12 ) to nonlinear semidefinite programs in standard form.
If there exists a matrix
$Y\succ 0$
that is feasible for ( 11 ), then we call
$Y$
strictly feasible for ( 11 ) and say that ( 11 ) satisfies Slater's condition. Likewise, if there exists a vector
$x$
such that
$\mathcal{A}\left(x\right)+C\prec 0$
, then we call ( 12 ) strictly feasible and say that ( 12 ) satisfies Slater's condition.
The following optimality conditions for linear semidefinite programs are well known; see, e.g., [
27]
. If problem ( 11 ) or ( 12 ) satisfies Slater's condition, then the optimal values of ( 11 ) and ( 12 ) coincide. Furthermore, if in addition both problems are feasible, then optimal solutions
${Y}^{opt}$
and
${x}^{opt}$
of both problems exist and
$Y:={Y}^{opt}$
and
$x:={x}^{opt}$
satisfy the complementarity condition
$$\begin{array}{c}YS=0,\text{where}S:=C\mathcal{A}\left(x\right).\end{array}$$ 
(13)

Conversely, if
$Y$
and
$x$
are feasible points for ( 11 ) and ( 12 ), respectively, and satisfy the complementarity condition ( 13 ), then
${Y}^{opt}:=Y$
is an optimal solution of ( 11 ) and
${x}^{opt}:=x$
is an optimal solution of ( 12 ).
These optimality conditions can be summarized as follows. If problem (
12 ) satisfies Slater's condition, then for a point
$x\in {\mathbb{R}}^{n}$
to be an optimal solution of ( 12 ) it is necessary and sufficient that there exist matrices
$Y\succcurlyeq 0$
and
$S\succcurlyeq 0$
such that
$$\begin{array}{c}\begin{array}{cc}\mathcal{A}\left(x\right)+C+S& =0,\\ {\mathcal{A}}^{*}\left(Y\right)+b& =0,\\ YS& =0.\end{array}\end{array}$$ 
(14)

Note that, in view of ( 6 ), the second equation in ( 14 ) can also be written in the form
$$\begin{array}{c}{\nabla}_{x}\left((\mathcal{A}(x)+C)\bullet Y\right)+b={\mathcal{A}}^{*}\left(Y\right)+b=0.\end{array}$$ 
(15)

Furthermore, the last equation in ( 14 ) is equivalent to its symmetric form,
$YS+SY=0$
; see, e.g., [
2]
. In the case of strict complementarity, the derivatives of
$YS=0$
and
$YS+SY=0$
are also equivalent. For later use, we state these facts in the following lemma.
Lemma 1
Let
$Y,S\in {\mathcal{S}}^{m}$
.

a)
If
$Y\succcurlyeq 0$
or
$S\succcurlyeq 0$
, then
$$\begin{array}{c}YS=0\u27faYS+SY=0.\end{array}$$ 
(16)


b)
If
$Y$
and
$S$
are strictly complementary, i.e.,
$Y,S\succcurlyeq 0$
,
$YS=0$
, and
$Y+S\succ 0$
, then for any
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
,
$$\begin{array}{c}Y\dot{S}+\dot{Y}S=0\u27faY\dot{S}+\dot{Y}S+\dot{S}Y+S\dot{Y}=0.\end{array}$$ 
(17)

Moreover,
$Y,S$
have representations of the form
$$\begin{array}{c}Y=U\left[\begin{array}{cccc}{Y}_{1}& 0& 0& 0\end{array}\right]{U}^{T},S=U\left[\begin{array}{cccc}0& 0& 0& {S}_{2}\end{array}\right]{U}^{T},\end{array}$$ 
(18)

where
$U$
is an
$m\times m$
orthogonal matrix,
${Y}_{1}\succ 0$
is a
$k\times k$
diagonal matrix, and
${S}_{2}\succ 0$
is an
$(mk)\times (mk)$
diagonal matrix, and any matrices
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
satisfying 17
$\left(\right)$
are of the form
$$\begin{array}{c}\dot{Y}=U\left[\begin{array}{cccc}{\dot{Y}}_{1}& {\dot{Y}}_{3}& {\dot{Y}}_{3}^{T}& 0\end{array}\right]{U}^{T},\dot{S}=U\left[\begin{array}{cccc}0& {\dot{S}}_{3}& {\dot{S}}_{3}^{T}& {\dot{S}}_{2}\end{array}\right]{U}^{T},\text{where}{Y}_{1}{\dot{S}}_{3}+{\dot{Y}}_{3}{S}_{2}=0.\end{array}$$ 
(19)


Proof.
The equivalence ( 16 ) is well known; see, e.g., [2,Page749] .
We now turn to the proof of part b). The strict complementarity of
$Y$
and
$S$
readily implies that
$Y$
and
$S$
have representations of the form ( 18 ); see, e.g., [
18,Page62]
. Any matrices
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
can be written in the form
$$\begin{array}{c}\dot{Y}=U\left[\begin{array}{cccc}{\dot{Y}}_{1}& {\dot{Y}}_{3}& {\dot{Y}}_{3}^{T}& {\dot{Y}}_{2}\end{array}\right]{U}^{T},\dot{S}=U\left[\begin{array}{cccc}{\dot{S}}_{1}& {\dot{S}}_{3}& {\dot{S}}_{3}^{T}& {\dot{S}}_{2}\end{array}\right]{U}^{T},\end{array}$$ 
(20)

where
$U$
is the matrix from ( 18 ) and the block sizes in ( 20 ) are the same as in ( 18 ). Using ( 18 ) and ( 20 ), it follows that the equation on the lefthand side of ( 17 ) is satisfied if, and only if,
$${Y}_{1}{\dot{S}}_{1}=0,{\dot{Y}}_{2}{S}_{2}=0,{Y}_{1}{\dot{S}}_{3}+{\dot{Y}}_{3}{S}_{2}=0.$$
Since
${Y}_{1}$
and
${S}_{2}$
are in particular nonsingular, the first two relations imply
${\dot{S}}_{1}=0$
and
${\dot{Y}}_{2}=0$
.
Thus, any matrices
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
satisfying the equation on the lefthand side of ( 17 ) are of the form ( 19 ). Similarly, using ( 18 ) and ( 20 ), it follows that the equation on the righthand side of ( 17 ) is satisfied if, and only if,
$${Y}_{1}{\dot{S}}_{1}+{\dot{S}}_{1}{Y}_{1}=0,{\dot{Y}}_{2}{S}_{2}+{S}_{2}{\dot{Y}}_{2}=0,{Y}_{1}{\dot{S}}_{3}+{\dot{Y}}_{3}{S}_{2}=0.$$
Since
${Y}_{1}\succ 0$
and
${S}_{2}\succ 0$
, the first two relations imply
${\dot{S}}_{1}=0$
and
${\dot{Y}}_{2}=0$
, and so
$\dot{Y},\dot{S}$
are again of the form ( 19 ).
5 Nonlinear semidefinite programs
In this section, we consider nonlinear semidefinite programs, which are extensions of the dual linear semidefinite programs ( 12 ).
Given a vector
$b\in {\mathbb{R}}^{n}$
and a matrixvalued function
$\mathcal{\mathcal{B}}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, we consider problems of the following form:
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}x\text{subject to}& x\in {\mathbb{R}}^{n},\\ & \mathcal{\mathcal{B}}\left(x\right)\preccurlyeq 0.\end{array}\end{array}$$ 
(21)

Here, the function
$\mathcal{\mathcal{B}}$
is nonlinear in general, and thus ( 21 ) represents a class of nonlinear semidefinite programs. We assume that the function
$\mathcal{\mathcal{B}}$
is at least
${\mathcal{C}}^{2}$
differentiable.
For simplicity of presentation, we have chosen a simple form of problem (
21 ). We stress that problem ( 21 ) may also include additional nonlinear equality and inequality constraints.
The corresponding modifications are detailed at the end of this paper. Furthermore, the choice of the linear objective function
${b}^{T}x$
in ( 21 ) was made only to simplify notation. A nonlinear objective function can always be transformed into a linear one by adding one artificial variable and one more constraint. In particular, all statements about ( 21 ) in this paper can be modified so that they apply to additional nonlinear equality and inequality constraints and to nonlinear objective functions.
Note that the class (
21 ) reduces to linear semidefinite programs of the form ( 12 ) if
$\mathcal{\mathcal{B}}$
is an affine function.
The Lagrangian
$\mathcal{\mathcal{L}}:{\mathbb{R}}^{n}\times {\mathcal{S}}^{m}\to \mathbb{R}$
of ( 21 ) is defined as follows:
$$\begin{array}{c}\mathcal{\mathcal{L}}(x,Y):={b}^{T}x+\mathcal{\mathcal{B}}\left(x\right)\bullet Y.\end{array}$$ 
(22)

Its gradient with respect to
$x$
is given by
$$\begin{array}{c}g(x,Y):={\nabla}_{x}\mathcal{\mathcal{L}}(x,Y)=b+{\nabla}_{x}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)\end{array}$$ 
(23)

and its Hessian by
$$\begin{array}{c}H(x,Y):={\nabla}_{x}^{2}\mathcal{\mathcal{L}}(x,Y)={\nabla}_{x}^{2}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right).\end{array}$$ 
(24)

If the problem ( 21 ) is convex and satisfies Slater's condition [
19]
, then for each optimal solution
$x$
of ( 21 ) there exists an
$m\times m$
matrix
$Y\succcurlyeq 0$
such that the pair
$(x,Y)$
is a saddle point of the Lagrangian ( 22 ),
$\mathcal{\mathcal{L}}$
.
More generally, for nonconvex problems (
21 ), let
$x\in {\mathbb{R}}^{n}$
be a feasible point of ( 21 ), and assume that the Robinson or MangasarianFromovitz constraint qualification [
19,
25,
26]
is satisfied at
$x$
, i.e., there exists a vector
$\Delta x\ne 0$
such that
$\mathcal{\mathcal{B}}\left(x\right)+{D}_{x}\mathcal{\mathcal{B}}\left(x\right)[\Delta x]\prec 0$
. Then, if
$x$
is a local minimizer of ( 21 ), the firstorder optimality condition is satisfied, i.e., there exist matrices
$Y,S\in {\mathcal{S}}^{m}$
such that
$$\begin{array}{c}\begin{array}{cc}\mathcal{\mathcal{B}}\left(x\right)+S& =0,\\ g(x,Y)& =0,\\ YS& =0,\\ Y,S& \succcurlyeq 0.\end{array}\end{array}$$ 
(25)

The system ( 25 ) is a straightforward generalization of the optimality conditions ( 14 ) and ( 15 ), with the affine function
$\mathcal{A}\left(x\right)+C$
in ( 15 ) replaced by the nonlinear function
$\mathcal{\mathcal{B}}\left(x\right)$
. Primaldual interiorpoint methods for solving ( 21 ) roughly proceed as follows. For some sequence of duality parameters
${\mu}_{k}>0$
,
${\mu}_{k}\to 0$
, the solutions of the perturbed primaldual system,
$$\begin{array}{c}\begin{array}{cc}\mathcal{\mathcal{B}}\left(x\right)+S& =0,\\ g(x,Y)& =0,\\ YS& ={\mu}_{k}{I}_{m},\\ Y,S& \succ 0,\end{array}\end{array}$$ 
(26)

are approximated by some variant of Newton's method. Since Newton's method does not preserve any inequalities, the parameters
${\mu}_{k}>0$
are used to maintain strict feasibility, i.e.,
$Y,S\succ 0$
for all iterates.
The solutions of (
26 ) coincide with the solutions of the standard logarithmicbarrier problems for ( 21 ). Moreover, the logarithmicbarrier approach for solving ( 21 ) can be interpreted as a certain choice of the `symmetrization operator' for the equation,
$YS={\mu}_{k}{I}_{m}$
, in the third row of ( 26 ); see Section 6 below. With this choice, the barrier function yields a very natural criterion for the stepsize control in trustregion algorithms. The authors have implemented various versions of predictorcorrector trustregion barrier methods for solving ( 21 ). For a number of examples, the running times of the resulting algorithms were comparable to the behavior of interiorpoint methods for convex programs. However, the authors also encountered several instances in which the number of iterations for these methods was very high compared to the typical number of iterations needed for solving linear SDPs. For such negative examples it may be more efficient to solve a sequence of linear SDPs in order to obtain an approximate solution of ( 21 ). This observation motivated the SSP method described in the next section.
6 An SSP method for nonlinear semidefinite programs
In this section, we introduce the sequential semidefinite programming (SSP) method, which is a generalization of the SQP method for standard nonlinear programs to nonlinear semidefinite programs of the form ( 21 ). For an overview of SQP methods for standard nonlinear programs, we refer the reader to [
6]
and the references given there.
In analogy to the SQP method, at each iteration of the SSP method one solves a subproblem that is slightly more difficult than the linearization of (
25 ) at the current iterate. More precisely, let
$({x}^{k},{Y}^{k})$
denote the current point at the beginning of the
$k$
th iteration. One then determines corrections
$(\Delta x,\Delta Y)$
and a matrix
$S$
such that
$$\begin{array}{c}\begin{array}{cc}\mathcal{\mathcal{B}}\left({x}^{k}\right)+{D}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right)[\Delta x]+S& =0,\\ b+{H}^{k}\Delta x+{\nabla}_{x}\left(\mathcal{\mathcal{B}}\left({x}^{k}\right)\bullet ({Y}^{k}+\Delta Y)\right)& =0,\\ ({Y}^{k}+\Delta {Y}^{k})S& =0,\\ {Y}^{k}+\Delta Y,S& \succcurlyeq 0.\end{array}\end{array}$$ 
(27)

Here and in the sequel, we use the notation
$$\begin{array}{c}{H}^{k}:=H\left({x}^{k},{Y}^{k}\right).\end{array}$$ 
(28)

Recall from ( 23 ) and ( 24 ) that
$g(x,Y)$
and
$H(x,Y)$
denote the gradient and Hessian with respect to
$x$
, respectively, of the Lagrangian ( 22 ),
$\mathcal{\mathcal{L}}(x,Y)$
, of the nonlinear semidefinite program ( 21 ). Moreover, from ( 23 ) it follows that the linearization of
$g(x,Y)$
at the point
$({x}^{k},{Y}^{k})$
is given by
$$\begin{array}{cc}g\left({x}^{k}+\Delta x,{Y}^{k}+\Delta Y\right)& \approx b+{H}^{k}\Delta x+{\nabla}_{x}\left(\mathcal{\mathcal{B}}\left({x}^{k}\right)\bullet ({Y}^{k}+\Delta Y)\right).\end{array}$$  
Thus the second equation in ( 27 ) is just the linearization of the second equation in ( 25 ).
Furthermore, the first equation of (
27 ) is a straightforward linearization of the first equation in ( 25 ). This linearization is used in the same way in primaldual interior methods.
The last two rows in (
27 ) and ( 25 ) are identical when
$Y$
in ( 25 ) is rewritten as
$Y={Y}^{k}+\Delta Y$
.
In analogy to SQP methods for standard nonlinear programs, the problem of how to guarantee the nonnegativity constraints, namely
$\mathcal{\mathcal{B}}\left(x\right)\preccurlyeq 0$
, is thus shifted to the subproblem ( 27 ). If the iterates
${x}^{k}$
generated from ( 27 ) converge, then their limit
$x$
automatically satisfies
$\mathcal{\mathcal{B}}\left(x\right)\preccurlyeq 0$
.
In contrast, interior methods use perturbations, symmetrizations, and linearizations for the last two rows in (
25 ), resulting in cheaper linear subproblems that are typically less `powerful' than the subproblems ( 27 ). An important aspect for interior methods for linear SDPs is the choice of the symmetrization procedure for the bilinear equation
$YS={\mu}_{k}{I}_{m}$
; see, e.g., [
20]
.
For convex SDPs, theoretical convergence analyses are well developed and also supported by numerical evidence of rapid convergence. However, the generalization of these convergence results to nonlinear nonconvex subproblems is far from obvious. The proposed SSP method allows to apply the symmetrization to linear SDPs, thus reducing this aspect to a wellstudied topic.
In summary, both the problem of choosing a suitable symmetrization scheme and the problem of how to guarantee the nonnegativity constraints are shifted to the subproblem (
27 ).
Note that the conditions (
27 ) are the optimality conditions for the problem
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}\Delta x+\frac{1}{2}(\Delta x{)}^{T}{H}^{k}\Delta x\text{subject to}& \Delta x\in {\mathbb{R}}^{n},\\ & \mathcal{\mathcal{B}}\left({x}^{k}\right)+{D}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right)[\Delta x]\preccurlyeq 0.\end{array}\end{array}$$ 
(29)

The conditions ( 27 ) and ( 29 ) have been considered in [
26,Equations(2.1)and(2.2)]
, with the remark that they have “been found to be an appropriate approximation of ” ( 21 ) “for numerical purposes”.
In order to be able to solve the subproblem (
27 ) efficiently, in practice, one replaces the matrix
${H}^{k}$
in ( 27 ), respectively ( 29 ), by a positive semidefinite approximation
${\hat{H}}^{k}$
of
${H}^{k}$
. As in the case of standard SQP methods, a BFGS update for the Hessian of the Lagrangian ( 22 ),
$\mathcal{\mathcal{L}}$
, can be used to approximate
${H}^{k}$
by some positive semidefinite matrix
${\hat{H}}^{k}$
. Given an estimate
${\hat{H}}^{k}$
of
${H}^{k}$
for the current,
$k$
th, SSP iteration, the quasiNewton condition to generate a BFGS update
${\hat{H}}^{k+1}$
approximating the matrix
${H}^{k+1}$
for the next,
$(k+1)$
st, SSP iteration can be derived as follows:
$$\begin{array}{c}\begin{array}{cc}{\hat{H}}^{k+1}\Delta x& ={\nabla}_{x}\left(\mathcal{\mathcal{B}}\left({x}^{k+1}\right)\bullet {Y}^{k+1}\right){\nabla}_{x}\left(\mathcal{\mathcal{B}}\left({x}^{k}\right)\bullet {Y}^{k+1}\right)\\ & ={\nabla}_{x}\mathcal{\mathcal{L}}\left({x}^{k+1},{Y}^{k+1}\right){\nabla}_{x}\mathcal{\mathcal{L}}\left({x}^{k},{Y}^{k+1}\right)\\ & \approx {\nabla}_{x}^{2}\mathcal{\mathcal{L}}\left({x}^{k+1},{Y}^{k+1}\right)\left({x}^{k+1}{x}^{k}\right).\end{array}\end{array}$$ 
(30)

If
${\hat{H}}^{k}$
is positive semidefinite, the BFGS update with the above condition can be suitably damped such that
${\hat{H}}^{k+1}$
is also positive semidefinite. At each iteration of the SSP method, problem ( 29 ) is solved with
${H}^{k}$
replaced by the matrix
${\hat{H}}^{k+1}$
that is obtained by the BFGS update of
${\hat{H}}^{k}$
from the previous SSP iteration. If
${\hat{H}}^{k+1}$
is positive semidefinite, problem ( 29 ) essentially reduces to a linear SDP, since the convex quadratic term in the objective function can be written as a semidefiniteness constraint or a secondordercone constraint.
While the formulation as a secondordercone constraint is more efficient, and for example, can be specified as input for the program package SeDuMi [
28]
in order to solve ( 29 ), it was pointed out by [
22]
that it may be most efficient to use a program that is designed for SDPs with linear constraints and a convex quadratic objective function.
It seems that many results for standard SQP methods carry over in a straightforward fashion to the SSP method. For example, the SSP method can be augmented by a penalty term in case that the subproblems (
29 ) become infeasible. In this case, the righthand sides, “
$0$
”, of the first three rows in ( 27 ) are replaced by weaker, penalized righthand sides. Moreover, the convergence analysis of the method proposed in [
10,
12]
yields results that are comparable to the ones for standard SQP methods.
The standard analysis of quadratic convergence of SQP methods for nonlinear programs that satisfy strict complementarity conditions proceeds by first showing that the active constraints will be identified correctly in the final stages of the algorithm and then using the equivalence of the SQP iteration and the Newton iteration for the simplified KKTsystem in which only the active constraints are used.
For nonlinear semidefinite programs the situation is slightly more complicated since it is difficult to identify active constraints. The paper [
12]
presents a proof that is based on a new approach by Bonnans et al. [
7]
and uses some general results due to Robinson [
26]
. It does not require strict complementarity and allows for approximate Hessian matrices in ( 30 ).
In the next two sections, we present a more elementary and selfcontained approach to analyze convergence of the SSP method under a strict complementarity condition.
7 Sensitivity results
In this section, we establish sensitivity results, first for the special case of quadratic semidefinite programs and then for general nonlinear semidefinite programs of the form ( 21 ).
More precisely, we show that strictly complementary solutions of such problems depend smoothly on the problem data.
We start with quadratic semidefinite programs of the form
$$\begin{array}{c}\begin{array}{cc}\text{minimize}f\left(x\right)\text{subject to}& x\in {\mathbb{R}}^{n},\\ & \mathcal{A}\left(x\right)+C\preccurlyeq 0.\end{array}\end{array}$$ 
(31)

Here,
$\mathcal{A}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
is a linear function,
$C\in {\mathcal{S}}^{m}$
, and
$f:{\mathbb{R}}^{n}\to \mathbb{R}$
is a quadratic function defined by
$f\left(x\right)={b}^{T}x+\frac{1}{2}{x}^{T}Hx$
, where
$b\in {\mathbb{R}}^{n}$
and
$H\in {\mathcal{S}}^{n}$
. Note that we make no further assumptions on the matrix
$H$
. Thus, problem ( 31 ) is a general, possibly nonconvex, quadratic semidefinite program.
The problem (
31 ) is described by the data
$$\begin{array}{c}\mathcal{D}:=[\mathcal{A},b,C,H].\end{array}$$ 
(32)

In Theorem 1 below, we present a sensitivity result for the solutions 31
$\left(\right)$
when the data
$\mathcal{D}$
is changed to
$\mathcal{D}+\Delta \mathcal{D}$
where
$$\begin{array}{c}\Delta \mathcal{D}:=[\Delta \mathcal{A},\Delta b,\Delta C,\Delta H]\end{array}$$ 
(33)

is a sufficiently small perturbation. We use the norm
$$\parallel \mathcal{D}\parallel :={\left(\parallel \mathcal{A}{\parallel}^{2}+\parallel b{\parallel}^{2}+\parallel C{\parallel}^{2}+\parallel H{\parallel}^{2}\right)}^{\frac{1}{2}}$$
for data ( 32 ) and perturbations ( 33 ). Recall that
$\parallel \mathcal{A}\parallel $
is defined in ( 2 ).
We denote by
$${\mathcal{\mathcal{L}}}^{(q)}(x,Y):=f\left(x\right)+\left(\mathcal{A}\left(x\right)+C\right)\bullet Y$$
the Lagrangian of problem ( 31 ). Note that
${\nabla}_{x}f\left(x\right)=b+Hx$
. Together with ( 6 ), it follows that
$${\nabla}_{x}{\mathcal{\mathcal{L}}}^{(q)}(x,Y)=b+Hx+{\mathcal{A}}^{*}\left(Y\right)\text{and}{\nabla}_{x}^{2}{\mathcal{\mathcal{L}}}^{(q)}(x,Y)=H.$$
Recall that problem ( 31 ) is said to satisfy Slater's condition if there exists a vector
$x\in {\mathbb{R}}^{n}$
with
$\mathcal{A}\left(x\right)+C\prec 0$
. Moreover, the triple
$(\overline{x},\overline{Y},\overline{S})$
, where
$\overline{x}\in {\mathbb{R}}^{n}$
and
$\overline{Y},\overline{S}\in {\mathcal{S}}^{m}$
, is called a stationary point of ( 31 ) if
$$\begin{array}{c}\begin{array}{cc}\mathcal{A}\left(\overline{x}\right)+C+\overline{S}& =0,\\ b+H\overline{x}+{\mathcal{A}}^{*}\left(\overline{Y}\right)& =0,\\ \overline{Y}\overline{S}+\overline{S}\overline{Y}& =0,\\ \overline{Y},\overline{S}& \succcurlyeq 0.\end{array}\end{array}$$ 
(34)

Here, we have used equivalence ( 16 ) of Lemma 1 and replaced
$\overline{Y}\overline{S}=0$
by its symmetric version, which is stated as the third equation of ( 34 ). If in addition to ( 34 ), one has
$$\begin{array}{c}\overline{Y}+\overline{S}\succ 0,\end{array}$$ 
(35)

then
$(\overline{x},\overline{Y},\overline{S})$
is said to be a strictly complementary stationary point of 31
$\left(\right)$
. Let
$\overline{x}\in {\mathbb{R}}^{n}$
be a feasible point of ( 31 ). We say that
$h\in {\mathbb{R}}^{n}$
,
$h\ne 0$
, is a feasible direction at
$\overline{x}$
if
$x=\overline{x}+\epsilon h$
is a feasible point of ( 31 ) for all sufficiently small
$\epsilon >0$
. Following [
26,Definition2.1]
, we say that the secondorder sufficient condition holds at
$\overline{x}$
with modulus
$\mu >0$
if for all feasible directions
$h\in {\mathbb{R}}^{n}$
at
$\overline{x}$
with
${h}^{T}(b+H\overline{x})={h}^{T}{\nabla}_{x}f\left(\overline{x}\right)=0$
one has
$$\begin{array}{c}{h}^{T}Hh={h}^{T}\left({\nabla}_{x}^{2}{\mathcal{\mathcal{L}}}^{(q)}(\overline{x},\overline{Y})\right)h\ge \mu \parallel h{\parallel}^{2}.\end{array}$$ 
(36)

After these preliminaries, our main result of this section can be stated as follows.
Theorem 1
Assume that problem 31
$\left(\right)$
satisfies Slater's condition. Let
$(\overline{x},\overline{Y},\overline{S})$
be a locally unique and strictly complementary stationary point of 31
$\left(\right)$
with data 32
$\left(\right)$
,
$\mathcal{D}$
, and assume that the secondorder sufficient condition holds at
$\overline{x}$
with modulus
$\mu >0$
. Then, for all sufficiently small perturbations 33
$\left(\right)$
,
$\Delta \mathcal{D}$
, there exists a locally unique stationary point
$\left(\overline{x}(\Delta \mathcal{D}),\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\right)$
of the perturbed program 31
$\left(\right)$
with data
$\mathcal{D}+\Delta \mathcal{D}$
. Moreover, the point
$\left(\overline{x}(\Delta \mathcal{D}),\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\right)$
is a differentiable function of the perturbation 33
$\left(\right)$
, and for
$\Delta \mathcal{D}=0$
,
$\left(\overline{x}\left(0\right),\overline{Y}\left(0\right),\overline{S}\left(0\right)\right)=(\overline{x},\overline{Y},\overline{S})$
. The derivative
${D}_{\mathcal{D}}\left(\overline{x}\left(0\right),\overline{Y}\left(0\right),\overline{S}\left(0\right)\right)$
at
$(\overline{x},\overline{Y},\overline{S})$
is characterized by the directional derivatives
$$(\dot{x},\dot{Y},\dot{S}):={D}_{\mathcal{D}}\left(\overline{x}\left(0\right),\overline{Y}\left(0\right),\overline{S}\left(0\right)\right)[\Delta \mathcal{D}]$$
for any
$\Delta \mathcal{D}$
. Here
$(\dot{x},\dot{Y},\dot{S})$
is the unique solution of the system of linear equations,
$$\begin{array}{c}\begin{array}{cc}\mathcal{A}\left(\dot{x}\right)+\dot{S}& =\Delta C\Delta \mathcal{A}\left(\overline{x}\right),\\ H\dot{x}+{\mathcal{A}}^{*}\left(\dot{Y}\right)& =\Delta b\Delta H\overline{x}\Delta {\mathcal{A}}^{*}\left(\overline{Y}\right),\\ \overline{Y}\dot{S}+\dot{Y}\overline{S}+\dot{S}\overline{Y}+\overline{S}\dot{Y}& =0,\end{array}\end{array}$$ 
(37)

for the unknowns
$\dot{x}\in {\mathbb{R}}^{n}$
,
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
. Finally, the secondorder sufficient condition holds at
$\overline{x}(\Delta \mathcal{D})$
whenever
$\Delta \mathcal{D}$
is sufficiently small.
Remark 1
Theorem 1 is an extension of the sensitivity result for linear semidefinite programs presented in [
15]
. A related sensitivity result for linear semidefinite programs for a more restricted class of perturbations, but also under weaker assumptions, is given in [
29]
.
A local Lipschitz continuity property of unique and strictly complementary solutions of linear semidefinite programs is derived in [
21]
.
Remark 2
While we did not explicitly state a linear independence constraint qualification, commonly referred to as LICQ, it is implied by our condition of uniqueness of the stationary point; see, e.g., [
15]
. Moreover, our assumptions on the stationary point
$(\overline{x},\overline{Y},\overline{S})$
imply that
$\overline{x}$
is a strict local minimizer of ( 31 ).
Remark 3
The first and third equations in ( 37 ) are symmetric
$m\times m$
matrix equations, and so only their upper triangular parts have to be considered. Thus the total number of scalar equations in ( 37 ) is
${m}^{2}+m+n$
. On the other hand, there are
${m}^{2}+m+n$
unknowns, namely the entries of
$\dot{x}\in R{e}^{n}$
and of the upper triangular parts of
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
. Hence, ( 37 ) is a square system.
Remark 4
In view of part b) of Lemma 1 , the last equation of ( 37 ) is equivalent to
$$\begin{array}{c}\overline{Y}\dot{S}+\dot{Y}\overline{S}=0.\end{array}$$ 
(38)

Thus, Theorem 1 can be stated equivalently with ( 38 ) in ( 37 ). However, the resulting system of equations ( 37 ) would then be overdetermined.

Proof of Theorem 1 .
The proof is divided into four steps.
Step 1. In this step, we establish the following result. If the perturbed program has a local solution that is a differentiable function of the perturbation, then the derivative is indeed a solution of ( 37 ).
Slater's condition is invariant under small perturbations of the problem data. Hence, if there exists a local solution
$\overline{x}+\Delta x$
,
$\overline{S}+\Delta S$
of the perturbed problem near
$\overline{x}$
,
$\overline{S}$
, then the necessary firstorder conditions of the perturbed problem apply at
$\overline{x}+\Delta x$
,
$\overline{S}+\Delta S$
, and state that there exists a matrix
$\Delta Y$
such that
$\overline{Y}+\Delta Y\succcurlyeq 0$
,
$\overline{S}+\Delta S\succcurlyeq 0$
, and
$$\begin{array}{c}\begin{array}{cc}(\mathcal{A}+\Delta \mathcal{A})(\overline{x}+\Delta x)+C+\Delta C+\overline{S}+\Delta S& =0,\\ b+\Delta b+(H+\Delta H)(\overline{x}+\Delta x)+({\mathcal{A}}^{*}+\Delta {\mathcal{A}}^{*})(\overline{Y}+\Delta Y)& =0,\\ (\overline{Y}+\Delta Y)(\overline{S}+\Delta S)+(\overline{S}+\Delta S)(\overline{Y}+\Delta Y)& =0.\end{array}\end{array}$$ 
(39)

Subtracting from these equations the first three equations of ( 34 ) yields
$$\begin{array}{c}\begin{array}{cc}(\mathcal{A}+\Delta \mathcal{A})(\Delta x)+\Delta S& =\Delta C\Delta \mathcal{A}\left(\overline{x}\right),\\ (H+\Delta H)\Delta x+({\mathcal{A}}^{*}+\Delta {\mathcal{A}}^{*})(\Delta Y)& =\Delta b\Delta H\overline{x}\Delta {\mathcal{A}}^{*}\left(\overline{Y}\right),\\ \overline{Y}\Delta S+\Delta Y\overline{S}+\Delta S\overline{Y}+\overline{S}\Delta Y& =\Delta Y\Delta S\Delta S\Delta Y.\end{array}\end{array}$$ 
(40)

Neglecting the secondorder terms in ( 40 ), and using ( 38 ), we obtain the result claimed in ( 37 ). It still remains to verify the existence and differentiability of
$\Delta x$
,
$\Delta Y$
,
$\Delta S$
.
Step 2. In this step, we prove that the system of linear equations ( 37 ) has a unique solution.
To this end, we show that the homogeneous version of (
37 ), i.e., the system
$$\begin{array}{c}\begin{array}{cc}\mathcal{A}\left(\dot{x}\right)+\dot{S}& =0,\\ H\dot{x}+{\mathcal{A}}^{*}\left(\dot{Y}\right)& =0,\\ \overline{Y}\dot{S}+\dot{Y}\overline{S}+\dot{S}\overline{Y}+\overline{S}\dot{Y}& =0,\end{array}\end{array}$$ 
(41)

only has the trivial solution
$\dot{x}=0$
,
$\dot{Y}=\dot{S}=0$
.
Let
$\dot{x}\in {\mathbb{R}}^{n}$
,
$\dot{Y},\dot{S}\in {\mathcal{S}}^{m}$
be any solution of ( 41 ). Recall that, in view of part b) by Lemma 1 we may assume that
$\overline{Y}$
and
$\overline{S}$
are given in diagonal form:
$$\begin{array}{c}\overline{Y}=\left[\begin{array}{cccc}{\overline{Y}}_{1}& 0& 0& 0\end{array}\right],\overline{S}=\left[\begin{array}{cccc}0& 0& 0& {\overline{S}}_{2}\end{array}\right],\text{where}{\overline{Y}}_{1},{\overline{S}}_{2}\succ 0\text{and}{\overline{Y}}_{1},{\overline{S}}_{2}\text{are diagonal}.\end{array}$$ 
(42)

Indeed, this can be done by replacing, in ( 41 ),
$\overline{Y}$
,
$\overline{S}$
,
$\dot{Y}$
,
$\dot{S}$
,
$\mathcal{A}\left(x\right)$
by
${U}^{T}\overline{Y}U$
,
${U}^{T}\overline{S}U$
,
${U}^{T}\dot{Y}U$
,
${U}^{T}\dot{S}U$
,
${U}^{T}\mathcal{A}\left(x\right)U$
, respectively, where
$U$
is the matrix in ( 18 ), and then multiplying the first and third rows from the left by
$U$
and from the right by
${U}^{T}$
. Furthermore, in view of part b) by Lemma 1 , any matrices
$\dot{Y}\dot{S}\in {\mathcal{S}}^{m}$
satisfying the last equation of ( 41 ) are then of the form
$$\begin{array}{c}\dot{Y}=\left[\begin{array}{cccc}{\dot{Y}}_{1}& {\dot{Y}}_{3}& {\dot{Y}}_{3}^{T}& 0\end{array}\right],\dot{S}=\left[\begin{array}{cccc}0& {\dot{S}}_{3}& {\dot{S}}_{3}^{T}& {\dot{S}}_{2}\end{array}\right],\text{where}{\dot{Y}}_{3}{\overline{S}}_{2}+{\overline{Y}}_{1}{\dot{S}}_{3}=0.\end{array}$$ 
(43)

Next, we establish the inequality
$$\begin{array}{c}{\dot{x}}^{T}H\dot{x}\ge \mu \parallel \dot{x}{\parallel}^{2},\end{array}$$ 
(44)

where
$\mu >0$
is the modulus of the secondorder sufficient condition ( 36 ). Assume that
$\dot{x}\ne 0$
.
Let
$\stackrel{\u02d8}{x}\in {\mathbb{R}}^{n}$
be a Slater point for problem ( 31 ). This guarantees that
$$\begin{array}{c}M=\left[\begin{array}{cccc}{M}_{1}& {M}_{3}& {M}_{3}^{T}& {M}_{2}\end{array}\right]:=\left(\mathcal{A}\left(\stackrel{\u02d8}{x}\right)+C\right)\succ 0,\end{array}$$ 
(45)

where the block partitioning
$M$
is conforming with ( 43 ). For
$\eta >0$
, set
$$\begin{array}{c}{h}_{\eta}^{+}:=\dot{x}+\eta (\stackrel{\u02d8}{x}\overline{x})\text{and}{h}_{\eta}^{}:=\dot{x}+\eta (\stackrel{\u02d8}{x}\overline{x}).\end{array}$$ 
(46)

Since
$\dot{x}\ne 0$
, there exists an
${\eta}_{0}>0$
such that
${h}_{\eta}^{\pm}\ne 0$
for all
$0<\eta \le {\eta}_{0}$
. Next, we prove that for all such
$\eta $
, both vectors
${h}_{\eta}^{+}$
and
${h}_{\eta}^{}$
are feasible directions for ( 31 ) at
$\overline{x}$
. Let
$0<\eta \le {\eta}_{0}$
be arbitrary, but fixed. We then need to verify that
$\mathcal{A}(\overline{x}+\epsilon {h}_{\eta}^{\pm})+C\preccurlyeq 0$
for all sufficiently small
$\epsilon >0$
. Recall that
$\mathcal{A}$
is a linear function. Using ( 46 ), ( 42 ), ( 43 ), ( 45 ), the first equation of ( 34 ), and the first equation of ( 41 ), one readily verifies that
$$\begin{array}{c}\begin{array}{cc}\mathcal{A}(\overline{x}+\epsilon {h}_{\eta}^{\pm})+C& =(1\epsilon \eta )\left(\mathcal{A}\left(\overline{x}\right)+C\right)+\epsilon \eta \left(\mathcal{A}\left(\stackrel{\u02d8}{x}\right)+C\right)\pm \epsilon \mathcal{A}\left(\dot{x}\right)\\ & =\left(\left[\begin{array}{cccc}0& 0& 0& {\overline{S}}_{2}\end{array}\right]+\epsilon \left[\begin{array}{cccc}\eta {M}_{1}& \eta {M}_{3}\pm {\dot{S}}_{3}& (\eta {M}_{3}\pm {\dot{S}}_{3}{)}^{T}& \eta ({M}_{2}{\overline{S}}_{2})\pm {\dot{S}}_{2}\end{array}\right]\right).\end{array}\end{array}$$ 
(47)

Recall that
$\eta >0$
is fixed. Since, by ( 42 ) and ( 45 ),
${\overline{S}}_{2}\succ 0$
and
${M}_{1}\succ 0$
, a standard Schurcomplement argument shows that the matrix on the righthand side of ( 47 ) is negative definite for all sufficiently small
$\epsilon >0$
. Thus the vectors ( 46 ) are feasible directions for ( 31 ) at
$\overline{x}$
for any
$\eta >0$
. This in turn implies
$$\begin{array}{c}{\dot{x}}^{T}(b+H\overline{x})={\dot{x}}^{T}{\nabla}_{x}f\left(\overline{x}\right)=0.\end{array}$$ 
(48)

Indeed, suppose that
${\dot{x}}^{T}{\nabla}_{x}f\left(\overline{x}\right)<0$
. Then, for sufficiently small
$\eta >0$
, the feasible direction
${h}_{\eta}^{+}$
also satisfies
${\left({h}_{\eta}^{+}\right)}^{T}{\nabla}_{x}f\left(\overline{x}\right)<0$
, and thus
${h}_{\eta}^{+}$
is a descent direction for the objective function
$f$
of ( 31 ) at the point
$\overline{x}$
. This contradicts the local optimality of
$\overline{x}$
. Likewise, if
${\dot{x}}^{T}{\nabla}_{x}f\left(\overline{x}\right)>0$
, then, for sufficiently small
$\eta >0$
,
${h}_{\eta}^{}$
is a descent direction for the objective function
$f$
of ( 31 ) at the point
$\overline{x}$
, leading to the same contradiction. The secondorder sufficient condition ( 36 ) also holds true on the closure of the feasible directions. Since
$\dot{x}$
is the limit of the feasible directions ( 46 ) for
$\eta \to 0$
and
$\dot{x}$
satisfies ( 48 ), the inequality ( 44 ) follows from ( 36 ).
Next recall from (
42 ) that
${\overline{Y}}_{1}$
and
${\overline{S}}_{2}$
are positive definite diagonal matrices. The last relation in ( 43 ) thus implies that corresponding entries of the matrices
${\dot{Y}}_{3}$
and
${\dot{S}}_{3}$
are either zero or of opposite sign. It follows that
$\langle {\dot{Y}}_{3},{\dot{S}}_{3}\rangle \le 0$
, and equality holds if, and only if,
${\dot{Y}}_{3}={\dot{S}}_{3}=0$
. Using this inequality, together with the first two relations in ( 43 ), the first two equations of ( 41 ), and ( 44 ), one readily verifies that
$$0\ge 2\langle {\dot{Y}}_{3},{\dot{S}}_{3}\rangle =\langle \dot{Y},\dot{S}\rangle =\langle \dot{Y},\mathcal{A}\left(\dot{x}\right)\rangle =\langle {\mathcal{A}}^{*}\left(\dot{Y}\right),\dot{x}\rangle =\langle H\dot{x},\dot{x}\rangle ={\dot{x}}^{T}H\dot{x}\ge \mu \parallel \dot{x}{\parallel}^{2}.$$
Since
$\mu >0$
, this implies
$$\begin{array}{c}\dot{x}=0\text{and}{\dot{Y}}_{3}={\dot{S}}_{3}=0.\end{array}$$ 
(49)

By the first row of ( 41 ), it further follows that
$$\begin{array}{c}\dot{S}=\mathcal{A}\left(\dot{x}\right)=\mathcal{A}\left(0\right)=0.\end{array}$$ 
(50)

Thus it only remains to show that
$\dot{Y}=0$
. In view of ( 43 ) and ( 50 ), we have
$$\begin{array}{c}\dot{Y}=\left[\begin{array}{cccc}{\dot{Y}}_{1}& 0& 0& 0\end{array}\right].\end{array}$$ 
(51)

Now suppose that
${\dot{Y}}_{1}\ne 0$
. Then, by ( 42 ) and ( 51 ), we have
$${\overline{Y}}_{\epsilon}:=\overline{Y}+\epsilon \dot{Y}\succcurlyeq 0\text{and}{\overline{Y}}_{\epsilon}\ne \overline{Y}$$
for all sufficiently small
$\left\epsilon \right$
. Moreover, using ( 34 ), ( 41 ), and ( 50 ), one readily verifies that the point
$(\overline{x},{\overline{Y}}_{\epsilon},\overline{S})$
also satisfies ( 34 ) for all sufficiently small
$\left\epsilon \right$
. This contradicts the assumption that
$(\overline{x},\overline{Y},\overline{S})$
is a locally unique stationary point. Hence
${\dot{Y}}_{1}=0$
and, by ( 51 ),
$\dot{Y}=0$
.
This concludes the proof that the square system (
41 ) is nonsingular.
Step 3. In this step, we show that the nonlinear system ( 34 ) has a local solution that depends smoothly on the perturbation
$\Delta \mathcal{D}$
. To this end, we apply the implicit function theorem to the system
$$\begin{array}{c}\mathcal{A}\left(x\right)+C+S=0,Hx+b+{\mathcal{A}}^{*}\left(Y\right)=0,YS+SY=0.\end{array}$$ 
(52)

As we have just seen, the linearization of ( 52 ) at the point
$(\overline{x},\overline{Y},\overline{S})$
is nonsingular, and hence ( 52 ) has a differentiable and locally unique solution
$\left(\overline{x}(\Delta \mathcal{D}),\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\right)$
. Furthermore, we have
$\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\succcurlyeq 0$
. This semidefiniteness follows with a standard continuity argument:
The optimality conditions of the nonlinear SDP coincide with the optimality conditions of the linearized SDP. Under our assumptions, the latter one has a unique optimal solution that depends continuously on small perturbations of the data; see, e.g., [
15]
. Hence the linearized problem at the data point
$\mathcal{D}+\Delta \mathcal{D}$
has an optimal solution
$\left(\stackrel{~}{x},\stackrel{~}{Y},\stackrel{~}{S}\right)$
that satisfies the same optimality conditions as
$\left(\overline{x}(\Delta \mathcal{D}),\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\right)$
. The solution of the linearized problem also satisfies
$\stackrel{~}{Y}\succcurlyeq 0$
,
$\stackrel{~}{S}\succcurlyeq 0$
. Since
$\left(\overline{x}(\Delta \mathcal{D}),\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})\right)$
is locally unique, it must coincide with
$\left(\stackrel{~}{x},\stackrel{~}{Y},\stackrel{~}{S}\right)$
, i.e.,
$\overline{Y}(\Delta \mathcal{D}),\overline{S}(\Delta \mathcal{D})$
satisfy the semidefiniteness conditions.
Step 4. In this step, we prove that the secondorder sufficient condition is satisfied at the perturbed solution. Since feasible directions
$h$
are defined only up to a positive scalar factor, without loss of generality, one may require that
$\parallel h\parallel =1$
. For the unperturbed problem ( 31 ), the secondorder sufficient condition at
$\overline{x}$
then states that
${h}^{T}Hh\ge \mu $
for all
$h\in {\mathbb{R}}^{n}$
with
$\mathcal{A}(x+\epsilon h)+C\preccurlyeq 0$
,
$\epsilon =\epsilon \left(h\right)>0$
,
$\parallel h\parallel =1$
, and
${h}^{T}(\overline{b}+H\overline{x})=0$
. To prove that the secondorder sufficient condition is invariant under small perturbations
$\Delta \mathcal{D}$
of the problem data
$\mathcal{D}$
, we thus need to show that for some fixed
$\stackrel{~}{\mu}>0$
, we have
$$\begin{array}{c}{h}^{T}(H+\Delta H)h\ge \stackrel{~}{\mu}\end{array}$$ 
(53)

for all solutions
$h\in {\mathbb{R}}^{n}$
of the inequalities
$$\begin{array}{cc}(\mathcal{A}+\Delta \mathcal{A})\left(\overline{x}(\Delta \mathcal{D})+\epsilon h\right)+C+\Delta C\preccurlyeq 0,& \epsilon =\epsilon \left(h\right)>0,\end{array}$$  
$$\begin{array}{cc}\parallel h\parallel =1,& {h}^{T}\left(b+\Delta b+(H+\Delta H)\overline{x}(\Delta \mathcal{D})\right)=0.\end{array}$$  
In view of the first two relations in ( 39 ), the above is equivalent to
$$\begin{array}{c}\epsilon (\mathcal{A}+\Delta \mathcal{A})\left(h\right)\preccurlyeq \overline{S}(\Delta \mathcal{D}),\epsilon =\epsilon \left(h\right)>0,\parallel h\parallel =1,{h}^{T}({\mathcal{A}}^{*}+\Delta {\mathcal{A}}^{*})\left(\overline{Y}\right(\Delta \mathcal{D}\left)\right)=0.\end{array}$$ 
(54)

It remains to show that the set of solutions
$h$
of ( 54 ) varies continuously with
$\Delta D$
. Indeed, for any fixed
$\stackrel{~}{\mu}$
with
$0<\stackrel{~}{\mu}<\mu $
, the secondorder condition at
$\overline{x}$
then readily implies that ( 53 ) is satisfied for all solutions
$h$
of ( 54 ), provided
$\parallel \Delta \mathcal{D}\parallel $
is sufficiently small. In Step 2, we have shown that both
$\overline{S}(\Delta \mathcal{D})$
and
$\overline{Y}(\Delta \mathcal{D})$
are continuous functions of
$\Delta \mathcal{D}$
and that the dimension of the null space of
$\overline{S}(\Delta \mathcal{D})$
is constant, namely equal to
$k$
, for all sufficiently small
$\parallel \Delta D\parallel $
. Moreover, the null space of
$\overline{S}(\Delta \mathcal{D})$
varies continuously with
$\Delta D$
.
Let
$\Delta {\mathcal{D}}_{k}$
be a sequence of perturbations with
$\Delta {\mathcal{D}}_{k}\to 0$
. Let
${h}_{k}$
be a sequence of associated solutions of ( 54 ). It suffices to show that any accumulation point
$\overline{h}$
of the sequence
${h}_{k}$
satisfies ( 54 ) for
$\Delta \mathcal{D}=0$
and the associated values
$\Delta \mathcal{A}=0$
,
$\overline{Y}\left(0\right)=\overline{Y}$
,
$\overline{S}\left(0\right)=\overline{S}$
. Since
$\overline{Y}(\Delta \mathcal{D})$
and
$\Delta {\mathcal{A}}^{*}$
vary continuously with
$\Delta \mathcal{D}$
, it follows that
$\overline{h}$
satisfies the last two relations of ( 54 ) for
$\Delta \mathcal{D}=0$
.
We now assume by contradiction that
$\epsilon \mathcal{A}\left(\overline{h}\right)\u22e0\overline{S}$
for any
$\epsilon >0$
. Since
$\overline{S}\succcurlyeq 0$
this implies that there exists a vector
$z\in {\mathbb{R}}^{m}$
with
$\parallel z\parallel =1$
,
${z}^{T}\mathcal{A}\left(\overline{h}\right)z=\stackrel{~}{\epsilon}>0$
, and
${z}^{T}\overline{S}z=0$
. It follows that
$${z}^{T}(\mathcal{A}+\Delta {\mathcal{A}}_{k})\left({h}_{k}\right)z\ge \frac{\stackrel{~}{\epsilon}}{2}$$
if
$k$
is sufficiently large. Since the null space of
$\overline{S}(\Delta \mathcal{D})$
varies continuously with
$\Delta D$
, we have
$$(z+\Delta {z}_{k}{)}^{T}\overline{S}(\Delta {\mathcal{D}}_{k}\left)\right(z+\Delta {z}_{k})=0$$
for some small
$\Delta {z}_{k}\in {\mathbb{R}}^{m}$
whenever
$\parallel \Delta {\mathcal{D}}_{k}\parallel $
is sufficiently small. We now choose
$\parallel \Delta {\mathcal{D}}_{k}\parallel $
so small, i.e.,
$k$
so large, that
$$(z+\Delta {z}_{k}{)}^{T}(\mathcal{A}+\Delta {\mathcal{A}}_{k}\left)\right({h}_{k}\left)\right(z+\Delta {z}_{k})\ge \frac{\stackrel{~}{\epsilon}}{4}.$$
This implies that
${h}_{k}$
does not satisfy ( 54 ), and thus yields the desired contradiction. Hence
$\overline{h}$
satisfies ( 54 ) for
$\Delta \mathcal{D}=0$
.
Theorem 1 can be sharpened slightly.
Corollary 1
Under the assumptions of Theorem 1 there exists a small neighborhood
$\mathcal{N}$
of zero in the data space of 31
$\left(\right)$
such that for all perturbations
$\Delta \mathcal{D}\in \mathcal{N}$
of the problem data 32
$\left(\right)$
,
$\mathcal{D}$
, of 31
$\left(\right)$
, there exists a local solution
$({x}_{\Delta},{Y}_{\Delta},{S}_{\Delta})$
of 39
$\left(\right)$
near
$(\overline{x},\overline{Y},\overline{S})$
, at which the assumptions of Theorem 2 are also satisfied. Moreover, the second derivatives
$${\nabla}_{\mathcal{D}}^{2}\left({x}_{\Delta},{Y}_{\Delta},{S}_{\Delta}\right)[\Delta \mathcal{D}]$$
of such local solutions
$({x}_{\Delta},{Y}_{\Delta},{S}_{\Delta})$
are uniformly bounded for all
$\Delta \mathcal{D}\in \mathcal{N}$
.

Proof.
The first part of the corollary is an immediate consequence of Theorem 1 . For the second part observe that the second derivative is obtained by differentiating the system ( 37 ). For sufficiently small perturbations
$\Delta \mathcal{D}$
, the singular values of this system are uniformly bounded away from zero, and hence the second derivatives are uniformly bounded.
Theorem 1 can be generalized to the class of NLSDPs of the form ( 21 ). Recall that, by ( 22 ) and ( 24 ), the Lagrangian of ( 21 ) and its Hessian are given by
$$\begin{array}{c}\mathcal{\mathcal{L}}(x,Y)={b}^{T}x+\mathcal{\mathcal{B}}\left(x\right)\bullet Y\text{and}H(x,Y)={\nabla}_{x}^{2}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right),\end{array}$$ 
(55)

respectively. The generalization of Theorem 1 to problems ( 21 ) is then as follows.
Theorem 2
Let
${x}^{*}$
be a local solution of 21
$\left(\right)$
, and let
${Y}^{*}$
be an associated Lagrange multiplier. Assume that the Robinson constraint qualification is satisfied at
${x}^{*}$
and that the point
$({x}^{*},{Y}^{*})$
is strictly complementary and locally unique. Finally, assume that the secondorder sufficient condition holds at
${x}^{*}$
with modulus
$\mu >0$
. Then, 21
$\left(\right)$
has a locally unique solution for small perturbations of the data
$(\mathcal{\mathcal{B}},b)$
, and the solution depends smoothly on the perturbations.

Proof.
First, we define the linear function
$\mathcal{A}:={D}_{x}\mathcal{\mathcal{B}}\left({x}^{*}\right):{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, and the matrices
$C:=\mathcal{\mathcal{B}}\left({x}^{*}\right)$
and
$H:=H({x}^{*},{Y}^{*})$
. Then, the SSP approximation ( 29 ) (with
$p=q=0$
) of ( 21 ) at the point
$({x}^{*},{Y}^{*})$
is just the quadratic semidefinite problem
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}\Delta x+\frac{1}{2}(\Delta x{)}^{T}H\Delta x\text{subject to}& \Delta x\in {\mathbb{R}}^{n},\\ & \mathcal{A}(\Delta x)+C\preccurlyeq 0.\end{array}\end{array}$$ 
(56)

Note that ( 56 ) is a problem of the form ( 31 ) with data ( 32 ),
$\mathcal{D}$
. Moreover,
$\Delta \overline{x}:=0$
,
$\overline{Y}:={Y}^{*}$
, and
$\overline{S}:=\mathcal{A}\left(0\right)C$
satisfy the conditions ( 34 ). These conditions coincide with the firstorder conditions of ( 21 ), and thus the point
$(\Delta \overline{x},\overline{Y},\overline{S})$
is also the unique solution of ( 34 ).
Furthermore, the secondorder sufficient condition for (
56 ) and ( 21 ) coincide. This condition guarantees that
$\Delta \overline{x}$
is a locally unique solution of ( 56 ). Finally, the Robinson constraint qualification for problem ( 21 ) at
${x}^{*}$
implies that problem ( 56 ) satisfies Slater's condition. In particular, all assumptions of Theorem 1 are satisfied. Small perturbations
$\Delta \mathcal{D}$
of the data of ( 21 ) result in small changes of the corresponding SSP problem ( 56 ). Since Theorem 1 allows for arbitrary changes in all of the data of ( 56 ), the claims follow.
8 Convergence of the SSP method
In this section, we prove that the plain SSP method with step size 1 is locally quadratically convergent.
For pairs
$(x,Y)$
, where
$x\in {\mathbb{R}}^{n}$
,
$Y\in {\mathcal{S}}^{m}$
, we use the norm
$$\parallel (x,Y)\parallel :={\left(\parallel x{\parallel}^{2}+\parallel Y{\parallel}^{2}\right)}^{\frac{1}{2}}.$$
The main result of this section can then be stated as follows.
Theorem 3
Assume that the function
$\mathcal{\mathcal{B}}$
in 21
$\left(\right)$
is
${\mathcal{C}}^{3}$
differentiable and that problem 21
$\left(\right)$
has a locally unique and strictly complementary solution
$(\overline{x},\overline{Y})$
that satisfies the Robinson constraint qualification and the secondorder sufficient condition with modulus
$\mu >0$
, cf. 36
$\left(\right)$
.
Let some iterate
$({x}^{k},{Y}^{k})$
be given and let the next iterate
$$({x}^{k+1},{Y}^{k+1}):=({x}^{k},{Y}^{k})+(\Delta x,\Delta Y)$$
be defined as the local solution of 27
$\left(\right)$
, or, equivalently, 29
$\left(\right)$
, that is closest to
$({x}^{k},{Y}^{k})$
. Then there exist
$\epsilon >0$
and
$\gamma <1/\epsilon $
such that
$$\parallel ({x}^{k+1},{Y}^{k+1})(\overline{x},\overline{Y})\parallel \le \gamma {\parallel ({x}^{k},{Y}^{k})(\overline{x},\overline{Y})\parallel}^{2}$$
whenever
$\parallel ({x}^{k},{Y}^{k})(\overline{x},\overline{Y})\parallel <\epsilon $
.

Proof.
The proof is divided into three steps. In the first step, we establish the exact relation of problems ( 29 ) and ( 31 ). In a second step, we consider a point
${x}^{k}$
near
$\overline{x}$
. We show that
${x}^{k}$
is the optimal solution of an SSP subproblem the data of which is at most
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel )$
away from the data of the SSP subproblem at
$(\overline{x},\overline{Y})$
. We remark that
${x}^{k}$
is always the optimal solution of the
$(k1)$
st subproblem, but the data of this subproblem lies
$\mathcal{O}(\parallel {x}^{k1}\overline{x}\parallel )$
away from the SSP subproblem at
$(\overline{x},\overline{Y})$
. In a third step, we then show by a perturbation analysis that the correction
$\Delta x={x}^{k+1}{x}^{k}$
is of size
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel )$
and that the residual for the SSP subproblem in the
$(k+1)$
st step is of size
$\mathcal{O}\left(\right(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel {)}^{2})$
.
Step 1. We first show how the SSP subproblem ( 29 ) can be written in the form ( 31 ). To this end, we define the linear function
$\mathcal{A}:={D}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right):{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, and the matrices
$C:=\mathcal{\mathcal{B}}\left({x}^{k}\right)$
and
$H:=H({x}^{k},{Y}^{k})$
. Note that the linear constraint
$\mathcal{A}(\Delta x)+C\preccurlyeq 0$
is just the linearization of the nonlinear constraint
$\mathcal{\mathcal{B}}({x}^{k}+\Delta x)\preccurlyeq 0$
about the point
${x}^{k}$
. Finally, let
$b$
be as in ( 21 ). The SSP subproblem ( 29 ) then takes the simple form ( 56 ), and in particular, it conforms with the format ( 31 ) of Theorem 2 .
Step 2. Let any point
${x}^{k}$
close to
$\overline{x}$
be given. We show that
$\Delta x=0$
is a local solution of a problem of the form ( 56 ), where the data is `close' to the data of ( 29 ) at
$\left(\overline{x},\overline{Y}\right)$
. Let
$\Delta C:=\mathcal{\mathcal{B}}\left(\overline{x}\right)\mathcal{\mathcal{B}}\left({x}^{k}\right)$
. By continuity of
$\mathcal{\mathcal{B}}$
,
$\parallel \Delta C\parallel $
is of order
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel )$
. Let
$$\Delta b:={\nabla}_{x}\left(\mathcal{\mathcal{B}}\left({x}^{k}\right)\bullet \overline{Y}\right)b={\mathcal{A}}^{*}\left(\overline{Y}\right)b.$$
From ( 23 ) and the second row of ( 25 ), we obtain the estimate
$\parallel \Delta b\parallel =\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel )$
, and the point
$(0,\overline{Y},\overline{S})$
satisfies the firstorder conditions,
$$\mathcal{A}\left(0\right)+C+\Delta C+\overline{S}=0,b+\Delta b+H\cdot 0+{\mathcal{A}}^{*}\left(\overline{Y}\right)=0,\overline{Y}\overline{S}=0,$$
for the quadratic semidefinite program
$$\begin{array}{c}\begin{array}{cc}\text{minimize}(b+\Delta b{)}^{T}\Delta x+\frac{1}{2}(\Delta x{)}^{T}H\Delta x\text{subject to}& \Delta x\in {\mathbb{R}}^{n},\\ & \mathcal{A}(\Delta x)+C+\Delta C\preccurlyeq 0.\end{array}\end{array}$$ 
(57)

Let
$$\begin{array}{c}\overline{\mathcal{A}}:={D}_{x}\mathcal{\mathcal{B}}\left(\overline{x}\right),\overline{b}:=b,\overline{C}:=\mathcal{\mathcal{B}}\left(\overline{x}\right),\overline{H}:={\nabla}_{x}^{2}\mathcal{\mathcal{L}}(\overline{x},\overline{Y}),\end{array}$$ 
(58)

be the data of the SSP subproblem ( 29 ) at the point
$(\overline{x},\overline{Y})$
. Then, the data of ( 57 ) differs from the data ( 58 ) by a perturbation of norm
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel )$
. Here, the term
$\parallel {Y}^{k}\overline{Y}\parallel $
reflects the fact that
$H$
and
$\overline{H}$
also differ by the choice of
$Y$
. Note that the point
$(\Delta x,Y,S)=(0,\overline{Y},\overline{S})$
satisfies the firstorder optimality conditions,
$$\begin{array}{c}\overline{\mathcal{A}}(\Delta x)+\overline{C}+S=0,b+\overline{H}\Delta x+{\overline{\mathcal{A}}}^{*}\left(Y\right)=0,YS=0,\end{array}$$ 
(59)

of the quadratic problem ( 56 ) with data ( 58 ). As shown in Theorem 2 , the assumptions for the nonlinear SDP ( 21 ) at
$(\overline{x},\overline{Y})$
imply that problem ( 56 ) with data ( 58 ) satisfies all conditions of Theorem 2 at
$(0,\overline{Y},\overline{S})$
.
Since the secondorder sufficient condition depends continuously on the data of (
31 ), it follows that for ( 57 ), the secondorder condition at
$(0,\overline{Y},\overline{S})$
is satisfied, provided that
$\parallel {x}^{k}\overline{x}\parallel $
and
$\parallel {Y}^{k}\overline{Y}\parallel $
are sufficiently small. Thus problem ( 57 ) fulfills all assumptions of Theorem 2 .
Step 3. By definition,
$(\Delta x,Y)=(0,\overline{Y})$
is the optimal solution (with associated multiplier) of ( 57 ). Let
$({x}^{k},{Y}^{k})$
be close to
$(\overline{x},\overline{Y})$
. The SSP subproblem replaces the data
$\Delta b$
and
$\Delta C$
of ( 57 ) by 0 (of the respective dimension). Thus, the data of ( 57 ) is changed by a perturbation of order
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel )$
. We assume that this perturbation lies in the neighborhood
$\mathcal{N}$
about zero as guaranteed by Corollary 1 . Denote the optimal solution of the SSP subproblem by
$(\Delta x,\overline{Y}+\Delta \hat{Y})$
.
The SSP subproblem is then used to define
$({x}^{k+1},{Y}^{k+1})$
. Let
$${\mathcal{A}}^{+}:={D}_{x}\mathcal{\mathcal{B}}\left({x}^{k+1}\right),{C}^{+}:=\mathcal{\mathcal{B}}\left({x}^{k+1}\right),{H}^{+}:={\nabla}_{x}^{2}\mathcal{\mathcal{L}}({x}^{k+1},{Y}^{k+1})$$
be the data of the SSP subproblem at the next,
$(k+1)$
st iteration.
Corollary
1 states that
$(\Delta x,\Delta \hat{Y})$
are given by the tangent equations ( 37 ) plus some uniformly bounded secondorder terms. Thus,
$(\Delta x,\Delta \hat{Y})$
are of the order
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel )$
. Here,
$\Delta \hat{Y}$
is a correction of the unknown point
$\overline{Y}$
, while the correction
$\Delta Y={Y}^{k+1}{Y}^{k}$
produced by the SSP subproblem has the form
$\Delta Y=\Delta \hat{Y}+{Y}^{k}\overline{Y}$
. Obviously, also the norm
$\parallel \Delta Y\parallel $
of this correction is of the order
$\mathcal{O}(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel )$
.
Next, we compute an upper bound on the size of the residuals of the first and second equations in (
59 ) at
$({x}^{k+1},{Y}^{k+1},{S}^{k+1})$
. Note that the residual term of the third equation in ( 59 ) is zero. By definition of
$(\Delta x,{Y}^{k+1},{S}^{k+1})$
, it follows that
$$\begin{array}{c}\mathcal{A}(\Delta x)+C+{S}^{k+1}=0,b+H\Delta x+{\mathcal{A}}^{*}\left({Y}^{k+1}\right)=0,{Y}^{k+1}{S}^{k+1}=0.\end{array}$$ 
(60)

If the data of ( 21 ) is
${C}^{3}$
smooth, this implies that
$$\begin{array}{cc}\left({\mathcal{A}}^{+}{)}^{*}\right({Y}^{k+1})+b& ={\mathcal{A}}^{*}\left({Y}^{k+1}\right)+b+\Delta {\mathcal{A}}^{*}\left({Y}^{k+1}\right)\end{array}$$  
$$\begin{array}{cc}& =H\Delta x+\Delta {\mathcal{A}}^{*}\left({Y}^{k}\right)+\Delta {\mathcal{A}}^{*}(\Delta Y)\end{array}$$  
$$\begin{array}{cc}& ={\nabla}_{x}^{2}\left(\mathcal{\mathcal{B}}\left({x}^{k}\right)\bullet {Y}^{k}\right)\Delta x+\left({\nabla}_{x}\mathcal{\mathcal{B}}({x}^{k}+\Delta x){\nabla}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right)\right)\bullet {Y}^{k}+\Delta {\mathcal{A}}^{*}(\Delta Y)\end{array}$$  
$$\begin{array}{cc}& =\mathcal{O}(\parallel \Delta x{\parallel}^{2}+\parallel \Delta Y{\parallel}^{2}),\end{array}$$  
where
$\Delta \mathcal{A}:={\mathcal{A}}^{+}\mathcal{A}$
. Likewise, it follows from ( 60 ) that
$$\begin{array}{cc}{C}^{+}+{S}^{k+1}& =\Delta C+C+{S}^{k+1}=\Delta C\mathcal{A}(\Delta x)\end{array}$$  
$$\begin{array}{cc}& =\mathcal{\mathcal{B}}\left({x}^{k+1}\right)\mathcal{\mathcal{B}}\left({x}^{k}\right){D}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right)[\Delta x]=\mathcal{O}(\parallel \Delta x{\parallel}^{2}),\end{array}$$  
where
$\Delta C:={C}^{+}C$
. Hence, we can define perturbations
$\hat{b}$
and
${\hat{C}}^{+}$
of
$b$
and
${C}^{+}$
of order
$\parallel \hat{b}b\parallel +\parallel {\hat{C}}^{+}{C}^{+}\parallel =\mathcal{O}\left(\right(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel {)}^{2})$
such that
$(\Delta x,Y,S)=(0,{Y}^{k+1},{S}^{k+1})$
is an optimal solution of the problem ( 56 ) with data
${\mathcal{A}}^{+}$
,
$\hat{b}$
,
${\hat{C}}^{+}$
,
${H}^{+}$
. By the same derivation as above, the next SSP step has length
$\mathcal{O}\left(\right(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel {)}^{2})$
and generates residuals of order
$\mathcal{O}\left(\right(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel {)}^{4})$
. Repeating this process, it then follows by a standard argument that
$\parallel {x}^{k+1}\overline{x}\parallel $
and
$\parallel {Y}^{k+1}\overline{Y}\parallel $
are of order
$\mathcal{O}\left(\right(\parallel {x}^{k}\overline{x}\parallel +\parallel {Y}^{k}\overline{Y}\parallel {)}^{2})$
as well.
Remark 5
As mentioned before, one will typically choose to solve SSP subproblems with a positive semidefinite approximation
$\hat{H}$
to the Hessian of the Lagrangian. A proof of convergence for such modifications is the subject of current research; see, e.g., [
10]
. Since all the data enters in a continuous fashion in the preceding analysis, it follows that the SSP method with step size one is still locally superlinearly convergent if the matrices
${H}^{k}$
in ( 28 ) are replaced by approximations
${\hat{H}}^{k}$
with
$\parallel {H}^{k}{\hat{H}}^{k}\parallel \to 0$
.
Remark 6
The assumption of
${\mathcal{C}}^{3}$
differentiability of the function
$\mathcal{\mathcal{B}}$
in Theorem 3 can be weakened to
${\mathcal{C}}^{2}$
differentiability and a Lipschitz condition for the Hessian at
$\overline{x}$
.
The result of Theorem 3 can be extended to the following slightly more general class of NLSDPs. Given a vector
$b\in {\mathbb{R}}^{n}$
, a matrixvalued function
$\mathcal{\mathcal{B}}:{\mathbb{R}}^{n}\to {\mathcal{S}}^{m}$
, and two vectorvalued functions
$c:{\mathbb{R}}^{n}\to {\mathbb{R}}^{p}$
and
$d:{\mathbb{R}}^{n}\to {\mathbb{R}}^{q}$
, we consider problems of the following form:
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}x\text{subject to}& x\in {\mathbb{R}}^{n},\\ & \mathcal{\mathcal{B}}\left(x\right)\preccurlyeq 0,\\ & c\left(x\right)\le 0,\\ & d\left(x\right)=0.\end{array}\end{array}$$ 
(61)

The Lagrangian of problem ( 21 ) takes the form
$\mathcal{\mathcal{L}}:{\mathbb{R}}^{n}\times {\mathcal{S}}^{m}\times {\mathbb{R}}^{p}\times {\mathbb{R}}^{q}\to \mathbb{R}$
:
$$\begin{array}{c}\mathcal{\mathcal{L}}(x,Y,u,v):={b}^{T}x+\mathcal{\mathcal{B}}\left(x\right)\bullet Y+{u}^{T}c\left(x\right)+{v}^{T}d\left(x\right).\end{array}$$ 
(62)

Its gradient with respect to
$x$
is given by
$$\begin{array}{c}g(x,Y,u,v):={\nabla}_{x}\mathcal{\mathcal{L}}(x,Y,u,v)=b+{\nabla}_{x}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)+{\nabla}_{x}c\left(x\right)u+{\nabla}_{x}d\left(x\right)v\end{array}$$ 
(63)

and its Hessian by
$$\begin{array}{c}H(x,Y,u,v):={\nabla}_{x}^{2}\mathcal{\mathcal{L}}(x,Y,u,v)={\nabla}_{x}^{2}\left(\mathcal{\mathcal{B}}\left(x\right)\bullet Y\right)+{\sum}_{i=1}^{p}{u}_{i}{\nabla}_{x}^{2}{c}_{i}\left(x\right)+{\sum}_{j=1}^{q}{v}_{j}{\nabla}_{x}^{2}{d}_{j}\left(x\right).\end{array}$$ 
(64)

Note that in ( 63 ), the gradients of the vectorvalued functions
$c\left(x\right)$
and
$d\left(x\right)$
are defined as
${\nabla}_{x}c\left(x\right):=\left({D}_{x}c\right(x){)}^{T}$
and
${\nabla}_{x}d\left(x\right):=\left({D}_{x}d\right(x){)}^{T}$
.
For NLSDPs (
61 ), the SSP subproblems are of the form
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{b}^{T}\Delta x+\frac{1}{2}(\Delta x{)}^{T}{H}^{k}\Delta x\text{subject to}& \Delta x\in {\mathbb{R}}^{n},\\ & \mathcal{\mathcal{B}}\left({x}^{k}\right)+{D}_{x}\mathcal{\mathcal{B}}\left({x}^{k}\right)[\Delta x]\preccurlyeq 0,\\ & c\left({x}^{k}\right)+{D}_{x}c\left({x}^{k}\right)\Delta x\le 0,\\ & d\left({x}^{k}\right)+{D}_{x}d\left({x}^{k}\right)\Delta x=0.\end{array}\end{array}$$ 
(65)

The extension of Theorem 3 to problems ( 61 ) is as follows.
Theorem 4
Assume that the functions
$\mathcal{\mathcal{B}}$
,
$c$
, and
$d$
in ( 61 ) are
${\mathcal{C}}^{3}$
differentiable, and that problem 61
$\left(\right)$
has a locally unique and strictly complementary solution
$(\overline{x},\overline{Y},\overline{u},\overline{v})$
that satisfies the Robinson constraint qualification and the secondorder sufficient condition with modulus
$\mu >0$
, cf. 36
$\left(\right)$
. Let some iterate
$({x}^{k},{Y}^{k},{u}^{k},{v}^{k})$
be given and let the next iterate
$$({x}^{k+1},{Y}^{k+1},{u}^{k+1},{v}^{k+1}):=({x}^{k},{Y}^{k},{u}^{k},{v}^{k})+(\Delta x,\Delta Y,\Delta u,\Delta v)$$
be defined as the local solution of 65
$\left(\right)$
that is closest to
$({x}^{k},{Y}^{k},{u}^{k},{v}^{k})$
. Then there exist
$\epsilon >0$
and
$\gamma <1/\epsilon $
such that
$$\parallel ({x}^{k+1},{Y}^{k+1},{u}^{k+1},{v}^{k+1})(\overline{x},\overline{Y},\overline{u},\overline{v})\parallel \le \gamma {\parallel ({x}^{k},{Y}^{k},{u}^{k},{v}^{k})(\overline{x},\overline{Y},\overline{u},\overline{v})\parallel}^{2}$$
whenever
$\parallel ({x}^{k},{Y}^{k},{u}^{k},{v}^{k})(\overline{x},\overline{Y},\overline{u},\overline{v})\parallel <\epsilon $
.

Proof.
By our assumption on strict complementarity, all entries of the vector
$\overline{v}$
of the Lagrange multipliers associated with the equality constraints
$d\left(x\right)=0$
of ( 21 ) are different from zero.
Without loss of generality, we assume that
$\overline{v}>0$
. Indeed, for any entry
${\overline{v}}_{j}<0$
we replace the corresponding constraint
${d}_{j}\left(x\right)=0$
by the equivalent constraint
${d}_{j}\left(x\right)=0$
. These sign changes do not change the iterates generated by ( 29 ). Moreover, for
$\left({x}^{k},{Y}^{k},{u}^{k},{v}^{k}\right)$
sufficiently close to
$\left(\overline{x},\overline{Y},\overline{u},\overline{v}\right)$
it follows from
$\overline{v}>0$
that the iterates do not change when the constraints
$d\left(x\right)=0$
are replaced by
$d\left(x\right)\le 0$
. We can thus assume that
$q=0$
, i.e., there are no equality constraints in ( 61 ).
We further assume that, without loss of generality, the matrix
$\mathcal{\mathcal{B}}$
is augmented to a
$2\times 2$
block diagonal matrix, where the
$(2,2)$
block is the diagonal matrix
$Diag\left(c\right(x\left)\right)$
. Thus, for the analysis of the SSP method we may assume that
$p=q=0$
in ( 21 ), i.e., we only need to consider problems of the form ( 21 ).
9 Numerical results
In this section, we present results of some numerical experiments with a Matlab implementation of the SSP method. Actually, our Matlab program is for a slightly more general class of nonlinear programs with conic constraints (NLCPs). The numerical experiments with our program illustrate the theoretical results of the preceding sections. In particular, quadratic convergence is observed for problems where the Hessian
$H$
of the Lagrangian at the optimal solution is positive semidefinite. In cases where
$H$
is not positive semidefinite, our implementation uses perturbations of the nonconvex SSP subproblems in order to obtain convex conic subproblems.
In these cases, typically, the rate of convergence of the algorithm based on such perturbed problems is only linear.
The Matlab program generates its search directions by solving conic quadratic subproblems using Version 1.05R5 of SeDuMi [
28]
. SeDuMi allows free and positive variables as well as Lorentzcone (“icecream cone”) constraints, rotated Lorentzcone constraints, and semidefinite cone constraints. The NLCPs can also be formulated in terms of these cones. In order to simplify the use of SeDuMi for the SSP subproblems, the NLCPs are rewritten in the following standard format:
$$\begin{array}{c}\begin{array}{cc}\text{minimize}{c}^{T}x\text{subject to}& x\in K,\\ & F\left(x\right)=0.\end{array}\end{array}$$ 
(66)

Here,
$K$
is a Cartesian product of free variables and several cones of the types allowed in SeDuMi. We tested the following techniques for generating positive semidefinite approximations of
$H$
: a BFGS approach, the Hessian of the augmented Lagrangian, and the orthogonal projection of
$H$
onto the cone of positive semidefinite matrices. Our experiences with these techniques are as follows.

1.
The BFGS approach can result in considerably more SSP iterations compared to the projection of the Hessian of the Lagrangian. Moreover, the BFGS approach strongly depends on the initial matrix
${H}_{0}$
. A good choice is
${H}_{0}:=Vmax(D,\epsilon I){V}^{T}$
, where
$H=VD{V}^{T}$
is the eigenvalue decomposition of
$H$
.

2.
The use of the Hessian of the augmented Lagrangian can be a good choice for some problems, but for most of our test problems the penalty parameter had to be very large to obtain a semidefinite Hessian. This, in turn, significantly reduced the precision of our computations.

3.
In spite of not being affinely invariant, the use of the projection of the Hessian of the Lagrangian resulted in the most efficient overall algorithm.
We also tested different step length strategies.

1.
The following penalty line search with a quadratic correction gave good results for all test cases. The SSP subproblem provides a search direction
$\Delta x$
for problem ( 66 ). By solving a leastsquares problem, a vector
$q$
is computed satisfying
${D}_{x}F\left(x\right)q=F(x+\Delta x)$
. For
$\lambda \in [0,1]$
, a line search along the points
$x\left(\lambda \right):=x+\lambda \Delta x+{\lambda}^{2}q$
is performed based on the penalty function
$$M\parallel F\left(x\right(\lambda \left)\right)\parallel +{c}^{T}x\left(\lambda \right),$$
where
$M>0$
is a penalty parameter.

2.
For some examples, the choice of a filter approach was slightly better. In the filter approach used here, a Euclidean trustregion radius was always set to be 1.5 times larger than the previous step, and nonacceptable steps were not discarded, but instead an Armijotype steplength reduction was used to generate an acceptable step. The motivation for this modified filter strategy lies in the fact that the computation of a solution of a subproblem is very expensive, and therefore discarding the solution of a subproblem is avoided. The above filter approach led to very fast convergence, especially for convex problems.

3.
For the examples presented here, the trustregion approach was the best choice. The SSP subproblem was restricted by an additional Euclidean trustregion constraint. For problems of the form ( 67 ) below, it was sufficient to apply the trustregion constraint only to the variables
${X}_{G},{X}_{C}$
, while
$P,S$
remain free. For these examples, an additional corrector step significantly accelerated the convergence. For this corrector step,
${X}_{G},{X}_{C}$
is kept fixed, and
$P,S$
are updated by solving an additional linear SDP. At each iteration, the ratio between predicted and actual reduction was computed. Depending on that ratio, the step was accepted and the trust region was increased or decreased, or the step was rejected and the trust region was decreased.
For our numerical examples, we use nonlinear nonconvex SDPs of the form ( 10 ), which we rewrite in the form ( 67 ) below. Recall that in ( 10 ),
$G,C\in {\mathbb{R}}^{n\times n}$
and
${B}_{1},{B}_{2}\in {\mathbb{R}}^{n\times m}$
are given data matrices. The nonconvex NLSDP used for the numerical examples is then as follows:
$$\begin{array}{c}\begin{array}{cc}\text{minimize}\parallel S\parallel \text{subject to}& P\in {\mathbb{R}}^{n\times n},S\in {\mathbb{R}}^{n\times m},\\ & {X}_{G}\in {\mathbb{R}}^{n\times n},\parallel {X}_{G}\parallel \le {r}_{G},\\ & {X}_{C}\in {\mathbb{R}}^{n\times n},\parallel {X}_{C}\parallel \le {r}_{C},\\ & {P}^{T}{B}_{1}+S={B}_{2},\\ & {P}^{T}(G+{X}_{G})+(G+{X}_{G}{)}^{T}P\succcurlyeq {\varepsilon}_{G}I,\\ & {P}^{T}(C+{X}_{C})+(C+{X}_{C}{)}^{T}P\succcurlyeq {\varepsilon}_{C}I,\\ & {P}^{T}(C+{X}_{C})(C+{X}_{C}{)}^{T}P=0.\end{array}\end{array}$$ 
(67)

Furthermore, in ( 67 ), in addition to the constraints on the norms of the perturbations
${X}_{G}$
and
${X}_{C}$
, we restrict
${X}_{G}$
and
${X}_{C}$
to have possible nonzero entries only in certain positions, which depend on the nonzero structure of the given matrices
$G$
and
$C$
, respectively. For our numerical tests, the data matrices
$G$
and
$C$
in ( 67 ) are generated as follows. First, two matrices
${C}_{org}$
and
${G}_{org}$
were constructed such that the associated transfer function is guaranteed to be positive real. Then, certain entries of
${G}_{org}$
and
${C}_{org}$
were perturbed by random perturbations of norm at most
${\varepsilon}_{G}$
and
${\varepsilon}_{C}$
respectively, to define the data matrices
$G$
and
$C$
. In all our examples, the transfer functions of the systems given by the resulting matrices
$C$
and
$G$
were not positive real.
All our computations were run on a Xeon with a clock rate of 2.8 GHz and 3 GB RAM. All solutions were computed to a precision of 12 decimal digits. In the following table, we list the problem dimension
$n$
, the total number
$M\left(n\right)$
of equality constraints, the total number
$N\left(n\right)$
of scalar unknowns, the number of iterations, and the cpu time (in seconds).
$n$

$M\left(n\right)$

$N\left(n\right)$

iterations

cpu time

8

118

285

5

3.71

9

146

348

5

4.43

10

177

417

7

8.06

11

211

492

5

7.18

12

248

573

8

16.05

13

288

660

4

10.88

14

331

753

6

20.77

15

377

852

7

30.12

16

426

957

6

34.38

17

478

1068

5

37.40

18

533

1185

10

91.17

19

591

1308

4

47.61

20

652

1437

5

83.66

21

716

1572

4

289.48

22

783

1713

4

416.31

23

853

1862

5

151.33

24

926

2013

6

683.54

25

1002

2176

3

145.60

26

1081

2337

5

612.22

27

1163

2508

7

518.92

28

1248

2685

5

789.41

29

1336

2868

4

475.52

30

1427

3057

7

4213.50

31

1521

3252

4

784.34

32

1618

3455

6

4659.64

33

1718

3660

5

1130.44

34

1821

3877

2

630.53

35

1927

4092

6

1799.36


Table 9 shows that the number of iterations is nearly independent of the dimension
$n$
of the problem, while—as expected—the cpu time increases with
$n$
. The total number of constraints is approximately
$M\left(n\right)\approx \frac{3}{2}{n}^{2}$
, and the total number of scalar variables is approximately
$N\left(n\right)\approx 3{n}^{2}$
. The number of iterations to solve the linear semidefinite subproblems not only depends on the dimension, but also on other properties of the problem as, for example, a comparison of the problems of dimension 32 and 33 shows. In this case, the iteration counts differ only by one, yet the cpu time quadruples, since SeDuMi needs more iterations to solve the subproblems. Some of the linear semidefinite subproblems are nearly infeasible, a situation for which SeDuMi (and other solvers) needs a higher number of interiorpoint iterations.
10 Concluding remarks
We have discussed the SSP method, which is a generalization of the SQP method for standard nonlinear programs to nonlinear semidefinite programming problems. For the derivation of this generalization, we have chosen a motivation that contrasts the SSP method with primaldual interior methods. For interior methods that are applied directly to nonlinear semidefinite programs, the choice of the symmetrization procedure is considerably more complicated than in the linear case since the system matrix is no longer positive semidefinite. In the proposed method, the choice of the symmetrization scheme is shifted in a very natural way to the subproblems, and is thus reduced to a wellstudied problem. Our convergence analysis differs from the convergence analyses of standard SQP methods in that it is based on a sensitivity result for certain optimal solutions of quadratic semidefinite programs. The derivation of this sensitivity result is also of independent interest.
References

Alizadeh, F. (1995): Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5, 13–51

Alizadeh, F., Haeberly, J.P.A., Overton, M.L. (1998): Primaldual interiorpoint methods for semidefinite programming: convergence rates, stability and numerical results, SIAM J. Optim. 8, 746–768

Anderson, B.D.O., Vongpanitlerd, S. (1973): Network Analysis and Synthesis. PrenticeHall, Englewood Cliffs, New Jersey

Bai, Z., Freund, R.W. (2000): Eigenvaluebased characterization and test for positive realness of scalar transfer functions, IEEE Trans. Automat. Control 45, 2396–2402

Bai, Z., Freund, R.W. (2001): A partial PadéviaLanczos method for reducedorder modeling, Linear Algebra Appl. 332–334, 139–164

Boggs, P.T., Tolle, J.W. (1995): Sequential quadratic programming. Acta Numer. 4, 1–51

Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C. (1997): Optimisation Numérique. SpringerVerlag, Berlin

Boyd, S., El Ghaoui, L., Feron, E., Balakrishnan, V. (1994): Linear Matrix Inequalities in System and Control Theory. SIAM Publications, Philadelphia

Coelho, C.P., Phillips, J.R., Silveira, L.M. (2004): A convex programming approach for generating guaranteed passive approximations to tabulated frequencydata. IEEE Trans. ComputerAided Design 23, 293–301

Correa, R., Ramírez C., H. (2004): A global algorithm for nonlinear semidefinite programming. SIAM J. Optim. 15, 303–318

Fares, B., Apkarian, P., Noll, D. (2001): An augmented Lagrangian method for a class of LMIconstrained problems in robust control theory. Internat. J. Control 74, 348–360

Fares, B., Noll, D., Apkarian, P. (2002): Robust control via sequential semidefinite programming. SIAM J. Control Optim. 40, 1791–1820

Freund, R.W. (2003): Model reduction methods based on Krylov subspaces. Acta Numer. 12, 267–319

Freund, R.W., Jarre, F. (2004): An extension of the positive real lemma to descriptor systems. Optim. Methods Softw. 19, 69–87

Freund, R.W., Jarre, F. (2004): A sensitivity result for semidefinite programs. Oper. Res. Lett. 32, 126–132

Hörmander, L. (1973): An Introduction to Complex Analysis in Several Variables, Second Edition. NorthHolland Publishing Company, Amsterdam

Kocvara, M., Stingl, M. (2003): PENNON — A code for convex nonlinear and semidefinite programming. Optim. Methods Softw. 18, 317–333

Luo, Z.Q., Sturm, J.F., Zhang, S. (1998): Superlinear convergence of a symmetric primaldual path following algorithm for semidefinite programming, SIAM J. Optim. 8, 59–81

Mangasarian, O.L. (1969): Nonlinear Programming. MvGrawHill, New York

Monteiro, R.D.C., Zhang, Y. (1998): A unified analysis for a class of longstep primaldual pathfollowing interiorpoint algorithms for semidefinite programming. Math. Program. 81, 281–299

Nayakkankuppam, M.V., Overton, M.L. (1999): Conditioning of semidefinite programs. Math. Program. 85, 525–540

D. Noll, personal communication, Sept 2004.

Overton, M.L. (1988): On minimizing the maximum eigenvalue of a symmetric matrix. SIAM J. Matrix Anal. Appl. 9, 256–268

Overton, M.L. (1992): Largescale optimization of eigenvalues. SIAM J. Optim. 2, 88–120

Robinson, S.M. (1976): First order conditions for general nonlinear optimization. SIAM J. Appl. Math. 30, 597–607

Robinson, S.M. (1982): Generalized equations and their solutions, part II: applications to nonlinear programming. Math. Prog. Study 19, 200–221

Shapiro, A., Scheinberg, K. (2000): Duality and optimality conditions. In: Wolkowicz, H., Saigal, R., Vandenberghe, L. (eds.) Handbook of Semidefinite Programming: Theory, Algorithms and Applications. Kluwer Academic Publishers, Boston, pp. 67–110

Sturm, J.F. (1999): Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653

Sturm, J.F., Zhang, S. (2001): On sensitivity of central solutions in semidefinite programming. Math. Program. 90, 205–227

Todd, M.J. (2001): Semidefinite optimization. Acta Numer. 10, 515–560

Tuan, H.D., Apkarian, P., Nakashima, Y. (2000): A new Lagrangian dual global optimization algorithm for solving bilinear matrix inequalities. Internat. J. Robust Nonlinear Control 10, 561–578

Vandenberghe, L., Boyd, S. (1996): Semidefinite programming. SIAM Rev. 38, 49–95

Vanderbei, R.J. (1997): LOQO user's manual—version 3.10. Report SOR 9708, Princeton University, Princeton, New Jersey

Vanderbei, R.J., Benson, H.Y., Shanno, D.F. (2002): Interiorpoint methods for nonconvex nonlinear programming: filter methods and merit functions. Comput. Optim. Appl. 23, 257–272

Wolkowicz, H., Saigal, R., Vandenberghe, L., eds. (2000): Handbook of Semidefinite Programming: Theory, Algorithms and Applications. Kluwer Academic Publishers, Boston