Grant RFBR 02-01-00093, NSh.2251.2003.1 .
<ph f="cmbx">KANTOROVICH METRIC: INITIAL HISTORY AND LITTLE-KNOWN APPLICATIONS</ph>

### A.Vershik

Mathematical Institute of Russian Ac.Sci. St.Petersburg branch, Fontanka 27, St.Petersburg, 191023, Russia. E-mail address : vershik@pdmi.ras.ru
• Abstract. We recall the history of the transportation (Kantorovich) metric and the Monge–Kantorovich problem. We also describe several little-known applications: the first one concerns the theory of decreasing sequences of partitions (tower of measures and iterated metric), the second one relates to Ornstein's theory of Bernoulli automorphisms ( $\overline{d}$  -metric), and the third one is the formulation of the strong Monge–Kantorovich problem in terms of matrix distributions.
Bibliography: $30$  titles.

1 Introduction: the first papers on the transportation problem

The studies on the transportation problem could be called a true pearl in the extremely rich scientific legacy of L. V. Kantorovich. The beauty and naturalness of the formulation, the fundamental character of the main theorem (optimality criterion), and, finally, the wealth of applications (some of them are realized, but new applications keep on arising in areas that appear only now) – all this allows us to place these studies among the classic mathematical works of the 20th century.
Undoubtedly, the same words can be applied to the whole series of papers on linear programming (from which the transportation problem cannot be separated), which became the starting point for further studies on mathematical economics, but here we will only dwell on the remarkable role of what was later called the “Monge–Kantorovich problem” and “transportation metric.” 1 In this introduction we do not intend to give a survey of this huge subject; we will mention only the very first papers of L.V. and his co-authors.
Apparently, L.V. conceived the formulation of the transportation problem soon after he defined the general model of the production planning problem, i.e., in the late 30s (the booklet [8]). However, if we judge from the date of the first publication, the transportation problem was born in 1942, with the publication of the note [10], which later became famous. The year itself predetermined the long road this paper had to walk to become known to specialists. The paper contains an explicit formulation of the general continuous transportation problem on a compact metric space, the dual problem, and the optimality criterion. Later, in the small note [11] published in Uspekhi Mat. Nauk, Kantorovich established a relation to Monge's problem of excavations and embankments, i.e., to the transportation problem on the Euclidean plane. Since then, the general Kantorovich problem is sometimes called the Monge–Kantorovich problem (MK-problem for short). The next paper [6, joint with a pupil of L.V., M. K. Gavurin, was addressed rather to applied mathematicians and economists; it contained a development of the method of potentials (a version of the method of resolving multipliers suggested by L.V. in 1939) for solving the finite-dimensional transportation problem. Written long before publication, it appeared only in 1949, and this delay was caused not by the wartime conditions, but by the Soviet practice of that time, when each scientific paper that even slightly touched economic (not to mention socio-economic) problems had to go through long and absurd censorship; besides, the paper was published not in a journal, but in a special hard-to-reach volume.
Till 1956, i.e., during 18 years of existence of the new mathematical economic theory, L.V. and his co-authors published less than 10 papers on this subject (I remember G. Sh. Rubinshtein making up, at my request, the complete list of these papers in autumn 1956). Surely, not because texts dealing with these problems were not written. L.V. had already prepared a whole book on economics, whose destiny is an exact and gloomy illustration of the system's attitude to scientific studies that do not keep within obligatory schemes, rigid and hence fruitless. A revised version of the book was not published till almost twenty years later ([13]).
In 1955–56, L.V. decided to “open” this topic; he began to give public and special lectures, to popularize his theory. The moment was chosen quite well. However, the wide distribution and acknowledgment of these studies were still a long way off. One can read about all these events in the book [16] (in particular, in my paper [26]), but a detailed account of the whole story is still to be written.
Let us return to the transportation problem. The third important paper on this subject was the paper [15] by L.V. and his pupil and co-author G. Sh. Rubinshtein. It is this paper that contained an explicit definition of the norm in the space of measures related to the transportation metric. The main observation was that the conjugate space to the space of measures with this norm is the space of Lipschitz functions, and the optimality criterion is nothing more than the dual definition of the norm as a supremum over the sphere of the conjugate space. Before this paper it was not known whether the space of Lipschitz functions is conjugate to any Banach space. At that time (1956–57) I was interested in mathematical economics and maintained close contacts with L.V. and G. Sh. Rubinshtein, and G. Sh. described me in detail the stages of their work; in particular, he said that L.V. was very satisfied by this interpretation of the transportation problem. After this paper, the metric is often called the Kantorovich–Rubinshtein metric.
Here it is worthwhile to make two remarks. Of course, the idea of duality was contained from the very beginning both in the booklet of 1939 (the method of resolving multipliers) and in the note by L.V. in Doklady Akad. Nauk SSSR [9] – the first paper devoted to comprehending the relations between functional analysis and nonclassical linear extremal problems (calculation of norms and extrema); it is worth noting that this was one more example showing the utility of functional analysis for applications; see the paper [14], devoted to applications of linear programming to computational mathematics, and the classical work by L.V. on the Newton method [12]. On the other hand, the technique that consists in taking the objective function as the norm in the space of right-hand sides of an extremal problem (exactly this was suggested in [15]) can be successfully applied to many extremal problems (see, for example, [19, 28]). It was noted more than once that both classics of mathematical economics of the 20th century – von Neumann and Kantorovich – came from functional analysis.
We cannot but mention that eventually, in course of development of the theory of nonclassical extremal problems, other relations became obvious: to the theory of linear inequalities and separability theory, Chebyshev approximations and Krein's $L$  -moment problem, Weyl's studies on convex polytopes and convex geometry as a whole, Bourbaki's theory of polars and combinatorics, etc. 2 Today we would include in this list “tropical” mathematics, or max-plus algebra and impetuious developement of the applications to differential equations, in particualr to Monge-Ampere equation, hydrodynamics and so on (see references). We will not discuss here those illuminated applications.

1 This metric has a dozen of names known (one most used Vasserstein metric), because it has been rediscovered more than once and still keeps being rediscovered. For many years I had to explain that many metrics known in measure theory, ergodic theory, functional analysis, statistics, etc., introduced in the 50s–80s, are special cases of the general definition of Kantorovich's transportation metric. Many papers and books have appeared since then (see, for example, [18]), but maybe it is only now (2004) that we can say that the publicity of the main facts discovered by L.V. and his co-authors matches their importance.

2 The lecture course “Extremal problems,” which I taught for many years at the Department of Mathematics and Mechanics of the Leningrad State University, was compiled taking into account all these relations; in fact, it was a synthesis of functional analysis and the theory of extremal problems. The textbook based on this course was not finished, but part of material was included in the textbook [1] written by my pupil A. I. Barvinok.

2 Basic definitions

The transportation problem has been always holding a prominent position among all problems of linear programming due to its general formulation and methods of solution. In what follows, I would like to present several little-known applications of the transportation metric; but first let us recall the formulation of the transportation problem.
Definition 1. Let $\left(X,r\right)$  be a compact metric space, and let ${\mu }_{1}$  and ${\mu }_{2}$  be two probability Borel measures on $X$  . Consider the Monge–Kantorovich variational problem (MK-problem for short):
set ${k}_{r}\left({\mu }_{1},{\mu }_{2}\right)={inf}_{L}\int r\left({x}_{1},{x}_{2}\right)dL,$  where $L$  runs over all Borel measures on $X×X$  with marginal measures ${\mu }_{1}$  and ${\mu }_{2}$  .
The quantity ${k}_{r}\left({\mu }_{1},{\mu }_{2}\right)$  determines a metric on the simplex $V\left(X\right)$  of all probability measures on the compact space $X$  ; it is called the Kantorovich (or transportation) metric ([10]).
Remark. The measure $L$  is a “plan of transportation” of the distribution ${\mu }_{1}$  to the distribution ${\mu }_{2}$  ; the integral means the cost of a given transportation plan, and the infimum (the Kantorovich metric) is achieved at the optimal plan.
Theorem 1. (Kantorovich–Rubinshtein [15]) (1) Consider the vector space ${V}_{0}\left(X\right)$  of all (not necessarily positive) Borel measures $\nu$  with zero charge and finite variation (i.e., the positive part, ${\nu }^{+}$  , and the negative part, ${\nu }_{-}$  , of $\nu$  have the same finite variation) and define the Kantorovich–Rubinshtein norm $||\nu |{|}_{k}$  of an element $\nu \in {V}_{0}\left(X\right)$  as the Kantorovich distance between the positive and negative parts of $\nu$  :
$||\nu |{|}_{k}={k}_{r}\left({\nu }_{+},{\nu }_{-}\right).$  Then the space of Lipschitz (up to additive constant) functions with the Lipschitz norm is the conjugate normed space to the space ${V}_{0}\left(X\right)$  with the norm $||.|{|}_{k}$  .
(2) A plan $L$  in (1) is optimal if and only if there exists a Lipschitz function $U$  with Lipschitz constant $1$  such that $U\left(x\right)-U\left(y\right)=r\left(x,y\right)$  almost everywhere with respect to the plan $L$  .
We will omit the index $r$  in the notation ${k}_{r}$  if the metric $r$  is fixed, as well as the index $k$  in the notation $||.|{|}_{k}$  .
Remark 1. The Kantorovich metric induces the weak topology on the simplex of probability measures on the compact space $X$  ([15]).
Remark 2. In the framework of solution of the finite-dimensional transportation problem, the optimal Lipschitz function $U$  is nothing more than the Kantorovich–Gavurin potential from [6].
There is a huge number of difficult problems related to explicit calculation of the Kantorovich metric for a given compact space. For ${\mathbb{R}}^{2}$  , this is the classical Monge's problem on transportation of sand.
For ${\mathbb{R}}^{1}$  , there is a good answer: let ${\nu }_{1}$  and ${\nu }_{2}$  be two probability measures on $\left[0,1\right]$  , and let $r$  be the ordinary (Euclidean) metric; then ${k}_{r}\left({\nu }_{1},{\nu }_{2}\right)={\int }_{0}^{1}|{\nu }_{1}\left(\left[0,t\right]\right)-{\nu }_{2}\left(\left[0,t\right]\right)|dt$  , i.e., the Kantorovich metric is just the ${L}^{1}$  -metric for distribution functions. Apparently, there are no explicit formulas for ${\mathbb{R}}^{n}$  , $n\ge 2$  . Many papers are devoted to this problem; we will mention only the recent surveys [29, 30, 5].
However, it makes sense to mention an essential idea, which has appeared recently and which plays a very important role in modern applications to hydrodynamics, differential equations, and other areas (see [4,5,30] and references there); I mean the $p$  -Kantorovich norms (see [2]). Namely, the original definition of the Kantorovich metric (and Kantorovich norm) resembles the definition of the ${L}^{1}$  -norm; but we can also define an analog of the ${L}_{p}$  -norm ${k}_{p}\left({\nu }_{1},{\nu }_{2}\right)={inf}_{L}{\left[\int r\left({x}_{1},{x}_{2}{\right)}^{p}dL\right]}^{1/p},$  where the infimum is taken, as before, over all transportation plans $L$  for a pair of probability measures $\left({\nu }_{1},{\nu }_{2}\right)$  , and the corresponding norm $||\nu |{|}_{p}={k}_{p}\left({\nu }_{+},{\nu }_{-}\right)$  for all $p\ge 1$  . Of course, the original Kantorovich metric (the case $p=1$  ) has more physical significance, but the case $p=2$  is much more convenient from the technical and geometric point of view. The corresponding variational problem and Euler equation are simpler than in the case $p=1$  , and the results of [2] show that for a certain geometric transportation problem, the Euler equation is the well-known Monge–Ampère equation (which a priori has nothing to do with the Monge–Kantorovich problem).
Let us mention another important special case, which is sometimes also called the MK-problem; we will call it the strong MK-problem.
Namely, with the above notation, it is formulated as follows: to find $\overline{k}\left({\mu }_{1},{\mu }_{2}\right)\equiv {inf}_{T}\int r\left(x,Tx\right)d{\mu }_{1}\left(x\right),$  where the infimum is taken over all measurable mappings $T$  such that $T{\mu }_{1}={\mu }_{2}$  .
The existence of minimum in (2) is a very subtle question. Of course, $\overline{k}\left({\mu }_{1},{\mu }_{2}\right)\ge k\left({\mu }_{1},{\mu }_{2}\right)$  , and the question of when the inequality becomes an equality is difficult and very important. In the last section, we will present a new approach to both problems.
Among a huge number of applications of the Kantorovich metric, I would like to mention only three examples, which are little known to specialists in applications of this metric, yet are very important in dynamical systems and functional analysis.

3 The iterated Kantorovich metric and the tower of measures

We will begin with the notion of tower of measures, which was defined in [20] and considered in more detail in [23], [39] Let $\left(X,r\right)$  be an arbitrary compact metric space (say, the unit interval with the Euclidean metric). We can consider a new compact space $V\left(X\right)$  , the space of all probability Borel measures on $X$  , and supply it with the Kantorovich metric. Thus we have defined a functor $F$  from the category of metric compact spaces to itself: $F:X↦V\left(X\right)$  , $r↦{k}_{r}$  ; it is clear that $F$  sends each homeomorphism of a compact space ${X}_{1}$  to a compact space ${X}_{2}$  to a homeomorphism of $V\left({X}_{1}\right)$  to $V\left({X}_{2}\right)$  .
Obviously, $\left(X,r\right)$  can be isometrically embedded into $\left(V\left(X\right),{k}_{r}\right)$  via the mapping $x↦{\delta }_{x}$  .
Let us iterate this procedure:
$\left(X,r\right)⟶\left(V\left(X\right),{k}_{r}\right)⟶\left(V\left(V\left(X\right)\right),{k}_{{k}_{r}}\right)⟶....$  Set ${V}^{n}=V\left({V}^{n-1}\left(X\right)\right)$  and ${k}_{r}^{n}={k}_{{k}_{r}^{n-1}}$  and introduce the notation ${F}_{n}$  for the mapping $\left({V}^{n-1},{k}_{r}^{n-1}\right)⟶\left({V}^{n},{k}_{r}^{n}\right)$  .
We can consider the inductive limit of this sequence of metric spaces with isometric embeddings:
$\left({V}^{\infty },{k}_{r}^{\infty }\right)\equiv {\text{indlim}}_{n}\left(\left({V}^{n},{k}_{r}^{n}\right),{F}_{n}\right).$  This inductive limit (a metric space) is called the infinite tower of measures; it plays a crucial role in the theory of filtrations of $\sigma$  -fields generated by random processes and its various applications.
On the other hand, for $n\ge 2$  there is a natural projection ${P}_{n}:{V}^{n}⟶{V}^{n-1},{P}_{n}\left(\mu \right)=\overline{\mu },$  where $\overline{\mu }$  is the barycenter of the measure $\mu$  , which is well defined for measures on affine compact spaces (thus the projection is defined for ${V}^{n}$  , $n\ge 2$  ), and we have the sequence $\left({V}^{1}\left(X\right),{k}_{r}\right)⟵\left({V}^{2}\left(X\right),{k}_{r}^{2}\right)⟵....$  Thus we obtain the projective limit ${\overline{V}}^{\infty }\equiv {\text{projlim}}_{n}\left({V}^{n}\left(X\right),{P}_{n}\right).$  Since ${P}_{n}{F}_{n}={I}_{n-1}$  , the inductive limit ${V}^{\infty }$  is naturally embedded into the projective limit:
${V}^{\infty }\subset {\overline{V}}^{\infty };$  but, in contrast to the case of inductive limit, on the projective limit there is no natural metric.3 The main application of this tower of measures is as follows. Assume that we have a “metric triple” $\left(X,r,\mu \right)$  , i.e., a measure space with a metric or semimetric, and a decreasing sequence of measurable partitions of this space (discrete filtration) $\left\{{\xi }_{n}\right\}$  , $n=0,1,...$  ; here ${\xi }_{0}$  is trivial and ${\xi }_{n}>{\xi }_{n+1}$  .
First consider one partition $\xi$  ; for almost all points $a\in X/\xi$  of the quotient space with respect to this partition, there is a well-defined conditional measure on the element of $\xi$  corresponding to $a$  . We regard it as a measure on $\left(X,r\right)$  ; thus we have a mapping ${f}_{\xi }:X/\xi \to V\left(X,r\right)$  , which sends almost every point $a\in X/\xi$  to a (conditional) measure on $\left(X,r\right)$  . It is convenient to regard this mapping as a function from $\left(X,\mu \right)$  to $V\left(X\right)$  .
Now define a metric (or semimetric) on $X/\xi$  as follows: for almost all pairs of points $a,b\in X/\xi$  , define the distance between them as the Kantorovich distance between the corresponding conditional measures.
Thus we have defined a metric (or semimetric) on a subset of full measure in the quotient space $X/\xi$  ; it can also be regarded as a semimetric on the original space $\left(X,\mu \right)$  .
Apply this process to the decreasing sequence of partitions $\left\{{\xi }_{n}\right\}$  :
start from ${\xi }_{1}$  , then define a metric on $X/{\xi }_{1}$  , a mapping ${f}_{1}:X\to V\left(X,r\right)$  , and a partition ${\xi }_{2}/{\xi }_{1}$  ; now we have a mapping from $X/{\xi }_{2}$  to ${V}^{2}\left(X\right)$  , a new metric on $X/{\xi }_{2}$  , and a map ${f}_{2}:\to {V}^{2}\left(X,r\right)$  .
Continuing this process, we obtain mappings ${f}_{n}$  from $\left(X,\mu \right)$  to the iterated spaces ${V}^{n}\left(X,r\right)$  , or to the inductive limit $\left({V}^{\infty },{k}_{r}^{\infty }\right)$  .
One of the main results of the theory of decreasing sequences ([20], [23]) is the following theorem.
Theorem 2. A decreasing homogeneous sequence of measurable partitions is standard (see [20, 23] for definitions) if and only if the sequence of measures ${f}_{n}*\mu$  (in other words, the sequence of the distributions of the mappings ${f}_{n}$  with respect to the measure $\mu$  ), regarded as a sequence of measures on the inductive limit $\left({V}^{\infty },{k}_{r}^{\infty }\right)$  , tends to a $\delta$  -measure.
A discussion of these subjects can be found in [23] and in forthcoming papers.

3 Inductive systems having projections that are the right inverses to the embeddings can be called indo-projective systems; they appear quite often.

4 The Kantorovich metric in Ornstein's theory

In the early 70s, Donald Ornstein solved a long-standing problem in ergodic theory: he gave necessary and sufficient conditions on a discrete-time stationary random process under which the shift in the space of trajectories of this process is isomorphic to a Bernoulli shift; using this result, he proved that the Kolmogorov entropy is a complete invariant of Bernoulli shifts ([17]). We will formulate the main theorem of Ornstein's theory in order to illustrate the role of the Kantorovich metric, which was rediscovered by Ornstein (he called it the $\overline{d}$  -metric).
Assume that the state space $S$  of a stationary process is finite and $\mu$  is the stationary measure on ${S}^{\mathbb{Z}}$  generated by this process. The question is formulated as follows: when there exists an isomorphism (in the measure-theoretic sense) of the Bernoulli space ${{S}^{\prime }}^{\mathbb{Z}}$  with product measure and the space $\left({S}^{\mathbb{Z}},\mu \right)$  that commutes with the shift. This is the well-known isomorphism problem in ergodic theory. It is clear that the criterion of existence of such an isomorphism must be expressed in terms of the rate of decrease of the correlation between the past and the future of the process. There are many known conditions of this type, which are sometimes called “mixing conditions.” Most of such conditions known in the theory of stationary processes are too strong (Kolmogorov's, Rozenblatt's, Ibragimov's conditions, etc.). It turned out that the right notion is related to the Kantorovich metric on the space of words with the Hamming metric – this was discovered by D. Ornstein. Our interpretation slightly differs from the original one, but is closer to the previous context (see [29]).
Let $\left\{{\xi }_{n}\right\}$  , $n\in \mathbb{Z}$  , be a stationary random process with finite state space $S$  and shift-invariant measure $\mu$  on ${S}^{\mathbb{Z}}$  . Consider the “past” of the process: $\mathcal{P}={\prod }_{-\infty }^{0}S$  ; the projection of $\mu$  to $\mathcal{P}$  will be denoted by ${\mu }^{-}$  . Fix a point ${x}^{-}=\left({x}_{0},{x}_{-1},{x}_{-2},...\right)\in \mathcal{P}$  and consider the conditional distribution on the $n$  -future given a fixed past ${x}^{-}$  :
${P}_{n}\left({x}_{1},{x}_{2},...,{x}_{n}|{x}^{-}\right);$  this is a measure on the $n$  -future ${S}^{n}$  defined for almost all points ${x}^{-}\in \mathcal{P}$  ; it is an element of $V\left({S}^{n}\right)$  , thus we have a mapping ${F}_{n}:\mathcal{P}\to V\left({S}^{n}\right)$  defined almost everywhere.
Consider the Hamming metric on ${S}^{n}$  : ${h}_{n}\left(x,y\right)=\frac{1}{n}#\left\{i\in \left(1,...,n\right):{x}_{i}\ne {y}_{i}\right\},$  where $x=\left({x}_{1},...,{x}_{n}\right),y=\left({y}_{1},...,{y}_{n}\right)\in {S}^{n}$  and $#$  stands for the number of points in a set; and let ${k}_{{h}_{n}}$  be the Kantorovich metric on the space $V\left({S}^{n},{h}_{n}\right)$  of measures on the $n$  -future.
Theorem 3. [17, 24] Consider a stationary process $\left\{{\xi }_{n}\right\}$  , $n\in \mathbb{Z}$  , and the right shift in the space of realizations generated by this process. An invertible encoding of this shift into a Bernoulli shift (in other words, a measure-preserving isomorphism of the shift in the space of realizations of the process and a Bernoulli shift) exists if and only if ${lim}_{n\to \infty }\int {\int }_{{x}^{-}\in \mathcal{P},{y}^{-}\in \mathcal{P}}{k}_{{h}_{n}}\left(P\left(*|{x}^{-}\right),P\left(*|{y}^{-}\right)\right)d{\mu }^{-}\left({x}^{-}\right)d{\mu }^{-}\left({y}^{-}\right)=0$  (the integral of the value of the Kantorovich metric for the pair of conditional measures corresponding to a pair of points from $\mathcal{P}×\mathcal{P}$  with respect to the product measure ${\mu }^{-}×{\mu }^{-}$  ).
The literal meaning of the above condition is very transparent: it means that the conditional distribution on the future given a fixed past asymptotically does not depend on the past; roughly speaking, there is only one type of distribution on the future; but a more precise sense of these words essentially depends on the choice of a metric on the space of realizations of the process (we should take the Hamming metric) and a metric on the spaces of measures (here we should use the Kantorovich metric); in general, the conclusion of the theorem will be false if we replace the Kantorovich metric by some other one (for example, by the variation metric).
The last formulation also motivates the definition of the so-called secondary entropy of a stationary process (see [24]). Define ${M}_{n}^{+}$  as the image of the measure ${\mu }^{-}$  (see above) under the mapping ${F}_{n}:\mathcal{P}\to V\left({S}^{n},{h}_{n}\right)$  ; this is a measure on $V\left({S}^{n},{h}_{n}\right)$  . In the case of Bernoulli automorphisms, by Ornstein's theorem, the measure ${M}_{n}^{+}$  tends to a $\delta$  -measure as $n\to \infty$  . But for a general Kolmogorov stationary process (K-automorphism), this is not the case. More precisely, if the automorphism is not a Bernoulli automorphism, then the limit exists, but is not a $\delta$  -measure. Thus it is natural to introduce a characteristic of the limiting measure. Namely, we may consider the so-called $\varepsilon$  -entropy of the measure ${M}_{n}^{+}$  . This notion also uses the Kantorovich metric. For an arbitrary Borel probability measure $\nu$  on a metric space $\left(X,d\right)$  , the $\varepsilon$  -entropy ${h}_{\varepsilon }\left(\nu \right)$  (as a function of $\varepsilon$  ) is defined as follows: ${h}_{\epsilon }\left(\nu \right)=inf\left\{H\left(l\right):{k}_{d}\left(l,\nu \right)<\epsilon \right\},$  where the infimum is taken over all discrete measures $l$  on $\left(X,d\right)$  and $H\left(l\right)$  is the ordinary entropy of a discrete measure: $H\left(l\right)=-\sum {l}_{i}log{l}_{i}$  , $l=\left({l}_{1},...,{l}_{n}\right)$  , ${\sum }_{i}{l}_{i}=1$  , ${l}_{i}\ge 0$  , $i=1,...,n$  .
The asymptotic of ${h}_{\varepsilon }\left({M}_{n}^{+}\right)$  with respect to $n$  is called the secondary entropy of the process. An open problem: what kind of asymptotic behavior can appear? Presumably, the secondary entropy is a metric invariant of K-automorphisms.

5 Application to the classification of metric spaces

Consider a Polish (=metric, complete, separable) space with a Borel probability measure. We call such a space a metric triple (another term is an $mm$  -space [7]). Two triples $\left(X,\rho ,\mu \right)$  and $\left({X}^{\prime },{\rho }^{\prime },{\mu }^{\prime }\right)$  are isomorphic if there exists a mapping $T:X\to {X}^{\prime }$  that is an isometry and preserves the measures: ${\rho }^{\prime }\left(Tx,Ty\right)=\rho \left(x,y\right)$  and $T\mu ={\mu }^{\prime }$  .
We regard the metric as a measurable function of two variables:
$\rho :X×X⟶\mathbf{R}.$  (The theorem below is true for an arbitrary symmetric measurable function $\rho$  , not necessarily a metric.) Let ${X}^{\infty }$  be the product of infinitely many copies of the space $X$  .
Define a mapping $F:{X}^{\infty }⟶{M}_{\infty }\left(\mathbb{R}\right)$  from ${X}^{\infty }$  to the set of symmetric matrices as follows: $F\left(x,y\right)=\left\{{r}_{i,j}{\right\}}_{i,j=1}^{\infty }$  , where $x=\left({x}_{1},{x}_{2},\dots \right)$  and ${r}_{i,j}=\rho \left({x}_{i},{x}_{j}\right)$  .
Let us denote the image of the measure ${\mu }^{\infty }$  under the mapping $F$  by $F\left(\mu \right)\equiv {D}_{\rho }$  ; the measure ${D}_{\rho }$  on ${M}_{\infty }\left(\mathbf{R}\right)$  will be called the matrix distribution of the function $\rho$  .
In [25], we considered and classified general (nonsymmetric) measurable functions $f\left(x,y\right)$  of two variables on the space $\left(X×X,\mu ×\mu \right)$  up to mappings of the form ${T}_{1}×{T}_{2}$  , where ${T}_{1}$  and ${T}_{2}$  are measure-preserving automorphisms of $\left(X,\mu \right)$  . We also defined the notion of matrix distribution for this case; it is a complete invariant for so-called pure functions.
But now we need another classification. We also consider arbitrary measurable (nonsymmetric) functions $f$  on the space $\left(X×X,\mu ×\mu \right)$  , where $\left(X,\mu \right)$  is a Lebesgue space with continuous measure, but we classify them up to mappings of the form $T×T$  , where $T$  is an automorphism of $\left(X,\mu \right)$  (in other words, ${T}_{1}={T}_{2}$  ). Namely, define a mapping ${F}_{f}:{X}^{\infty }⟶{M}_{\infty }\left(\mathbb{R}\right),$  where ${F}_{f}\left(x\right)=\left\{f\left({x}_{i},{x}_{j}\right){\right\}}_{i,j=1}^{\infty }$  and $x=\left({x}_{1},{x}_{2},...\right)\in {X}^{\infty }$  ; here ${M}_{\infty }\left(\mathbb{R}\right)$  is the set of arbitrary (not necessarily symmetric) matrices.
The ${F}_{f}$  -image of the measure $\mu ×\mu$  , which is a measure on ${M}_{\infty }\left(\mathbb{R}\right)$  , is called the symmetric matrix distribution of the function $f$  and denoted by ${D}_{f}^{s}$  .
Theorem 4. (Gromov [7], Vershik [25]) (1) Two metric triples $\left(X,\rho ,\mu \right)$  and $\left({X}^{\prime },{\rho }^{\prime },{\mu }^{\prime }\right)$  are isomorphic if and only if their matrix distributions coincide:
${D}_{\rho }^{s}={D}_{{\rho }^{\prime }}^{s}.$  In other words, the matrix distribution of the metric is a complete invariant of a metric triple.
(2) (Vershik [25]). The symmetric matrix distribution ${D}_{f}^{s}$  of a measurable function $f\left(\cdot ,\cdot \right)$  of two variables is a complete metric invariant of the function regarded up to automorphisms of the form $T×T$  , where $T$  is an automorphism of $\left(X,\mu \right)$  .
Now we apply this classification to MK-problems. Let $X$  be a compact metric space with metric $\rho$  ; we want to “transport” a Borel probability measure ${\mu }_{1}$  to another Borel probability measure ${\mu }_{2}$  . Thus we have two metric triples: $\left(X,\rho ,{\mu }_{1}\right)$  and $\left(X,\rho ,{\mu }_{2}\right)$  . It is more convenient to reduce the problem to a more symmetric form and to have one metric triple. Let us consider only continuous measures; then we can choose a measure-preserving isomorphism $S:\left(X,{\mu }_{2}\right)\to \left(X,{\mu }_{1}\right)$  . Let $f\left(x,y\right)=\rho \left(x,Sy\right)$  , so that $f$  is a nonnegative measurable (in general, nonsymmetric) function of two variables – the “shifted metric.” Now we can consider only one measure ${\mu }_{1}\equiv \mu$  and the function $f$  on the space $\left(X×X,\mu ×\mu \right)$  .
In terms of the shifted metric, the MK-problem can be formulated as follows: to find $k\equiv {inf}_{L}\int f\left({x}_{1},{x}_{2}\right)dL,$  where $L$  runs over all Borel measures on the product $X×X$  with both marginal measures equal to the measure $\mu$  ; thus $L$  belongs to the set of bistochastic measures, or, in other words, $L$  is an element of the semigroup of polymorphisms with invariant continuous measure $\mu$  (see [22]) for definitions). Thus the MK-problem turns into a variational problem on the convex set of bistochastic measures (or on the semigroup of polymorphisms).
The strong MK-problem reads as follows: to find $\overline{k}\equiv {inf}_{T}\int f\left(x,Tx\right)d\mu \left(x\right),$  where $T$  runs over all $\mu$  -preserving transformations of $\left(X,\mu \right)$  . In this case, we have a variational problem on the group of measure-preserving transformations.
Now we can apply the above-defined symmetric matrix distribution ${D}_{f}^{s}$  of the function $f$  regarded as a measurable function (shifted metric) on the space $\left(X×X,\mu ×\mu \right)$  . Since ${D}_{f}^{s}$  is a complete invariant of the triple $\left(X,f,\mu \right)$  , all properties of the (ordinary and strong) MK-problem can be expressed in terms of ${D}_{f}^{s}$  as a measure on the space of matrices ${M}_{\infty }\left(\mathbb{R}\right)$  . But this means that we have a random matrix with distribution ${D}_{f}^{s}$  , which we can use for analysis of the problem. Here we describe only one example of applying this approach.
Let $r=\left\{{r}_{i,j}{\right\}}_{i,j=1}^{\infty }$  be a random matrix with distribution ${D}_{f}^{s}$  . The new version of the MK-problem reads as follows. Choose a random matrix $r$  , for each $n$  consider the ordinary finite transportation problem, and define ${k}_{n}\left(r\right)\equiv {inf}_{l}{\sum }_{i,j=1}^{n}{l}_{i,j}{r}_{i,j},$  where $l=\left\{{l}_{i,j}{\right\}}_{i,j=1}^{n}$  is a bistochastic matrix (i.e., ${\sum }_{i=1}^{n}{l}_{i,j}={\sum }_{j=1}^{n}{l}_{i,j}=1$  , ${l}_{i,j}\ge 0$  for all $i,j=1,...,n$  ) and ${r}_{n}=\left\{{r}_{i,j}{\right\}}_{i,j=1}^{n}$  is the $n$  -fragment of $r$  (the random matrix constructed from the shifted metric as described above). Thus ${k}_{n}\left(r\right)$  is a random variable that depends on the random matrix $r$  .
Theorem 5. In the previous notation, ${lim}_{n\to \infty }{k}_{n}\left(r\right)=k\text{in measure}{D}_{f}^{s},$  where $k$  is the solution of the original MK-problem, i.e., the sequence of random variables ${k}_{n}\left(r\right)$  converges in measure ${D}_{f}^{s}$  to the solution of the MK-problem.
A natural conjecture: for almost every choice of the matrix $r=\left\{{r}_{i,j}\right\}$  with respect to the measure ${D}_{f}^{s}$  , the same assertion is true:
${D}_{f}^{s}\left\{r:{lim}_{n\to \infty }{k}_{n}\left(r\right)=k\right\}=1,$  which means that ${k}_{n}\left(r\right)$  converges to $k$  with probability one with respect to the choice of the matrix $r$  according to the measure ${D}_{f}^{s}$  .
Note that we approximate the MK-problem with the simplest finite-dimensional problem of linear programming – the allocation problem. By the Birkhoff–von Neumann theorem, the solution of this problem is a permutation, i.e., an element of the symmetric group, or an extreme point of the convex set of bistochastic matrices (the so-called Hungarian polytope).
Nevertheless, the question of when the strong MK-problem has a solution and how it can be approximated by permutations is more involved.
The theorem and conjecture given above are typical for applications of our method to various problems with integral kernel: we obtain a probabilistic approximation of a functional or variational problem using a random choice of values of the function. We will return to this elsewhere.
Partially supported by the RFBR, project 02-01-00093, and the President of Russian Federation grant for support of leading scientific schools NSh-2251.2003.1. Translated by A. M. Vershik and N. V. Tsilevich. References

1. A. Barvinok, A Course in Convexity, Amer. Math. Soc., Providence, Rhode Island (2002).
2. Y. Brennier,”Extended Monge–Kantorovich theory”, Lecture Notes in Math., 1813, 91–122 (2003).
3. M. Émery, “Espaces probabilisés filtrés: de la théorie de Vershik au mouvement brownien, via les idées de Tsirelson,” Séminaire BOURBAKI, No. 882 (2000).
4. U. Frish, Turbulence. The Legacy of A. N. Kolmogorov, Cambridge Univ. Press, Cambridge (1995).
5. W. Gangbo and R. J. McCann, “The geometry of optimal transportation,” Acta Math., 177, No. 2, 113–161 (1966).
6. M. L. Gavurin and L. V. Kantorovich, “Application of mathematical methods to problems of analysis of freight flows,” in: Problems of Raising the Efficiency of Transport Performance [in Russian], Moscow–Leningrad (1949), pp. 110–138.
7. M. Gromov, Metric Structures for Riemannian and Non-Riemannian Spaces, Birkhäuser, Boston (1999).
8. L. V. Kantorovich, Mathematical Methods in the Organization and Planning of Production [in Russian], Leningrad (1939).
9. L. V. Kantorovich, “On an efficient method of solving some classes of extremal problems,” Dokl. Akad. Nauk SSSR, 28, No. 3, 212–215 (1940).
10. L. V. Kantorovich, “On the translocation of masses,” Dokl. Akad. Nauk SSSR, 37, Nos. 7–8, 227–229 (1942).
11. L. V. Kantorovich, “On a problem of Monge,” Uspekhi Mat. Nauk, 3, No. 2, 225–226 (1948).
12. L. V. Kantorovich, “Functional analysis and applied mathematics,” Uspekhi Mat. Nauk, 3, No. 6, 89–185 (1948).
13. L. V. Kantorovich, Economical Calculation of the Best Use of Resources [in Russian], Moscow (1960).
14. L. V. Kantorovich, “On new approaches to computational methods and processing of observations,” Sib. Mat. Zhurn., 3, No. 5, 701–709 (1962).
15. L. V. Kantorovich and G. Sh. Rubinshtein, “On a space of totally additive functions,” Vestn Lening. Univ., 13, No. 7, 52–59 (1958).
16. Leonid Vitalievich Kantorovich: Man and Scientist, vol. 1, Novosibirsk (2002).
17. D. Ornstein, Ergodic Theory, Randomness, and Dynamical Systems, Yale Univ. Press, New Haven–London (1974).
18. S. T. Rachev, Probability Metrics and the Stability of Stochastic Models, Wiley, Chichecter (1991).
19. A. M. Vershik, “Some remarks on infinite-dimensional problems of linear programming,” Uspekhi Mat. Nauk, 25, No. 5, 117–124 (1970).
20. A. M. Vershik, “Decreasing sequences of measurable partitions and their applications,” Sov. Math. Dokl., 11, No. 4, 1007–1011 (1970).
21. A. M. Vershik, “On D. Ornstein's papers, weak dependence conditions and classes of stationary measures,” Theory Probab. Appl., 21 (1977), 655–657.
22. A. M. Vershik, “Multivalued mappings with invariant measure (polymorphisms) and Markov operators,” J. Sov. Math., 23, 2243–2266 (1983).
23. A. M. Vershik, “Theory of decreasing sequences of measurable partitions,” St. Petersburg Math. J., 6, No. 4, 705–761 (1994).
24. A. M. Vershik, “Dynamic theory of growth in groups: entropy, boundaries, examples,” Russian Math. Surveys, 55, No. 4, 667–733 (2000).
25. A. M. Vershik, “Classification of measurable functions of several arguments, and invariantly distributed random matrices,” Funct. Anal. Appl., 36, No. 2, 93–105 (2002).
26. A. M. Vershik, “About L. V. Kantorovich and linear programming,” in: Leonid Vitalievich Kantorovich: Man and Scientist, vol. 1, Novosibirsk (2002), pp. 130–152.
27. A. Vershik, “Polymorphims, Markov processes, quasi-similarity of K-automorphisms,” to appear in Discrete Contin. Dyn. Syst.
28. A. M. Vershik and M. M. Rubinov, “General duality theorem in linear programming,” in: Mathematical Economics and Functional Analysis [in Russian], Nauka, Moscow (1974), pp. 35–55.
29. C. Villani, Topics in Optimal Transportation, Amer. Math. Soc., Providence, Rhode Island (2000).
30. Optmal Transportation and Applications. Springer Lecture Notes in Mathematics. Edit.L.A.Cafarelli, S.Salsa. v. 1813 (2003).

Mathematical Institute of Russian Ac.Sci. St.Petersburg branch, Fontanka 27, St.Petersburg, 191023, Russia. E-mail address : vershik@pdmi.ras.ru