, ,

1 Free Independence and Free Harmonic Analysis.

Free probability theory was developed by Voiculescu as a way to deal with von Neumann algebras of free groups. In addition to the view of von Neumann algebras as “non-commutative measure spaces”, which was already presented in this conference, free probability theory considers von Neumann algebras as “non-commutative probability spaces”.

There are by now several standard references on free probability theory, of which we mention two: [VDN92, Voi00] .

1.1 Probability spaces.

Recall that a classical probability space is a measure space

(X, B, μ)

. Here

B

is a sigma-algebra of subsets of

X

, and

μ

is a measure, which is a probability measure, i.e.

μ (X) = 1

. One thinks of

X

as a set of events and for

Y \in B

, the measure

μ (Y)

is a probability of an event occurring in the set

Y

1.1.1 Random variables; laws.

An alternative point of view on probability theory involves considering random variables, i.e., measurable functions

f : X \to C

. One can think of a random variable as a measurement, which assigns to each event

x \in X

a value

f (x)

. Note that the probability of the value of

f

lying in a set

A \subset C

is exactly

μ (f^{- 1} (A)) = (f_{*} μ) (A)

. Thus the law of

f

μ_{f}

, defined to be the push-forward measure

μ_{f} = f_{*} μ

C

, measures the probabilities that

f

assumes various values.

1.1.2 The expectation $E$ .

Let us say that

f \in L^{\infty} (X, μ)

is an essentially bounded random variable. Then the integral

E (f) = \int f (x) d μ (x)

has the meaning of the expected value of

f

. For this reason, the linear functional

E : L^{\infty} (X, μ) \to C

given by integration against

μ

is called an expectation. We note that

E

satisfies:

E (1) = 1

(normalization),

E (f) \geq 0

f \geq 0

(positivity).

Note that the knowledge of

(X, B, μ)

is equivalent (up to an isomorphism and up to null sets) to the knowledge of

L^{\infty} (X, μ)

and

E

. Thus the notion of a classical probability space can be phrased entirely in terms of commutative (von Neumann) algebras.

1.2 Non-commutative probability spaces.

We now play the usual game of dropping the word “commutative” in a definition:

Definition 1.1. An algebraic non-commutative probability space is a pair

(A, φ)

consisting of a unital algebra

A

and a linear functional

φ : A \to C

, so that

φ (1) = 1

Thus we think of

a \in A

as a “non-commutative random variable”,

φ (a)

as its “expected value” and so on. Of course, any classical probability space is also a non-commutative probability space. But there are many interesting genuinely non-commutative probability spaces. For example, if

Γ

is a discrete group, we could set

A = C Γ

(the group algebra) and

φ = τ_{Γ}

(the group trace). Here if

g \in Γ \subset C Γ

, then

φ (g) = 0

g \neq 1

and

φ (1) = 1

. The same construction works with

A

replaced by the reduced group

C^{*}

-algebra of

Γ

, or the von Neumann algebra of

Γ

1.2.1 Positivity.

Operator algebras give one a “test” of which algebraic non-commutative spaces “exist in nature”. These are precisely those non-commutative probability spaces that can be represented by (possibly unbounded) operators on a Hilbert space

H

, so that

φ

is a linear functional given by a vector-state,

φ (a) = 〈 h, a h 〉

for some

h \in H

. If

A

is a

*

-algebra, it is not hard to characterize these (via the GNS construction) in terms of the properties of

φ

φ

must be positive, i.e.,

φ (a^{*} a) \geq 0

for all

a \in A

1.2.2 The law of a random variable.

Recall that we assigned to a classical random variable

f

its law

μ_{f}

. If

A

is an algebra of operators on a Hilbert space

H

φ (\cdot) = 〈 h, \cdot h 〉

and

a \in A

is self-adjoint, then the spectral theorem gives us a measure

ν_{a}

R

valued in the set of projections on

H

, so that

a = \int t d ν_{a} (t) .

If we let

μ_{a} = φ \circ ν_{a},

then

μ_{a}

is a measure on

R

. It is not hard to check that if we are in the classical situation and

a \in L^{\infty} (X, μ)

H = L^{2} (X, μ)

h = 1

, then this construction gives us precisely the law of

a

1.2.3 Moments.

However, if

a

is not self-adjoint, or if we are dealing with a

k

-tuple of random variables, there is no description of the law of

a

in terms of a measure.

Fortunately, for

f \in L^{\infty} (X, μ)

the moments of

f

, i.e., the expected values

E (f^{p})

p = 1, 2, \dots

are exactly the same as the moments of the law

μ_{f}

f

. Indeed,

E (f^{p}) = \int t^{p} d μ_{f} (t)

is exactly the

p

-th moment of

μ_{f}

. For essentially bounded

f

, the moments of

μ_{f}

determine

μ_{f}

Thus given a family

F

of variables

a_{1}, \dots, a_{n} \in A

, we say that an expression of the form

φ (a_{i_{1}} \dots a_{i_{p}})

is the

i_{1}, \dots, i_{p}

-th moment of the family

F

. The collection of all moments can be thought of as a linear functional

μ_{F}

defined on the algebra of polynomials in

n

indeterminates

t_{1}, \dots, t_{n}

μ_{F} (p) = φ (p (a_{1}, \dots, a_{n})) .

This functional

μ_{F}

is called the joint law, or joint distribution, of the family

F

1.3 Classical independence.

Definition 1.2. Two random variables

f

and

g

L^{\infty} (X, μ)

are called independent, if

E (f^{n} g^{m}) = E (f^{n}) E (g^{m})

for all

n, m \geq 0

Equivalently,

E (F G) = 0

whenever

E (F) = E (G) = 0

and

F

is in the algebra

W^{*} (f)

generated by

f

, while

G \in W^{*} (g)

The equality

E (f g) = E (f) E (g)

is a consequence of the statement that “the probability that the value of

f

lies in a set

A

and the value of

g

lies in the set

B

is the product of the probabilities that the value of

f

lies in

A

and the value of

g

lies in

B

”, which is a more familiar way of phrasing independence.

X = X_{1} \times X_{2}

and

μ = μ_{1} \times μ_{2}

, then any functions

f, g

so that

f

depends only on the

X_{1}

coordinate and

g

only on the

X_{2}

coordinate are independent. Note that another way of saying this is that the random variables

f_{1} \otimes 1

and

1 \otimes g_{1}

L^{\infty} (X_{1}, μ_{1}) \bar{\otimes} L^{\infty} (X_{2}, μ_{2})

are independent, for any

f_{1} \in L^{\infty} (X_{1}, μ_{1})

and

g_{1} \in L^{\infty} (X_{2}, μ_{2})

Thus independence has to do with the operation of taking tensor products of probability spaces.

1.4 Free products of non-commutative probability spaces.

There is “more room” in the non-commutative universe to accommodate a different way of combining two non-commutative probability spaces: free products. Just like the notion of a tensor product can be used to recover the notion of independence, free products have led Voiculescu to discover the notion of free independence.

1.4.1 Free products of groups.

We start with a motivating example. Let

Γ_{1}

and

Γ_{2}

be two discrete groups. View the group algebra of the free product

C (Γ_{1} * Γ_{2})

as a non-commutative probability space by letting

φ

be the group trace; for

g \in Γ_{1} * Γ_{2}

φ (g) = 0

unless

g = 1

Let us understand the relative positions of

C Γ_{1}

and

C Γ_{2}

inside of the group algebra of the free product

C (Γ_{1} * Γ_{2})

. Let

w \in Γ_{1} * Γ_{2}

be a word. Thus

g = g_{1} \dots g_{n}

with

g_{j} \in Γ_{i (j)}

. We may carry out multiplications and cancellations until we reduce the word so that consequent letters lie in different groups; i.e.,

i (1) \neq i (2)

i (2) \neq i (3)

and so on. The resulting word is non-trivial if all

g_{1}, \dots, g_{n}

are non-trivial. Thus:

φ (g_{1} \dots g_{n}) = 0

provided that

g_{j} \in Γ_{i (j)}

i (1) \neq i (2)

i (2) \neq i (3)

\dots

, and

φ (g_{1}) = φ (g_{2}) = \dots = 0

By linearity we get:

Proposition 1.3. if

a \in C (Γ_{1} * Γ_{2})

has the form

a = a_{1} \dots a_{n},

with

a_{j} \in C Γ_{i (j)}

i (1) \neq i (2)

i (2) \neq i (3)

\dots

, and

φ (a_{1}) = φ (a_{2}) = \dots = 0

, then

φ (a) = 0 .

We now note that this proposition allows one to compute

φ

C (Γ_{1} * Γ_{2}) = C Γ_{1} * C Γ_{2}

in terms of its restriction to

C Γ_{1}

and

C Γ_{2}

Indeed, an arbitrary element of

C Γ_{1} * C Γ_{2}

is a linear combination of

1

and of terms of the form

a_{1} \dots a_{n}, a_{j} \in C Γ_{i (j)}, i (1) \neq i (2), i (2) \neq i (3), \dots .

But then the equation

0 = φ ((a_{1} - φ (a_{1})) (a_{2} - φ (a_{2})) \dots (a_{n} - φ (a_{n})))

allows one to express

φ (a_{1} \dots a_{n})

in terms of values of

φ

on shorter words. By induction, this allows one to express

φ

in terms of

φ |_{C Γ_{1}}

and

φ |_{C Γ_{2}}

1.4.2 Free products of algebras.

Such an expression is universal and works in any free product of two algebras (not necessarily of group algebras). We thus say:

Definition 1.4. [Voi85] Let

(A_{1}, φ_{1})

and

(A_{2}, φ_{2})

be two non-commutative probability spaces. We call the unique linear functional

φ

A_{1} * A_{2}

which satisfies

\begin{matrix} φ (a_{1} \dots a_{n}) & = & 0, a_{j} \in A_{i (j)}, i (1) \neq i (2), i (2) \neq i (3), \dots, \end{matrix}

\begin{matrix} φ_{i (j)} (a_{j}) = 0, \forall j \end{matrix}

the free product of

φ_{1}

and

φ_{2}

. It is denoted

φ_{1} * φ_{2}

One can check that the free product of two positive linear functionals is positive (to do so it is the easiest to make sense of the product of the underlying GNS representations). Thus one can talk about (reduced) free products of

C^{*}

-algebras or von Neumann algebras by passing to the appropriate closure in the GNS representation associated to the free product functional.

1.4.3 Free independence.

By analogy with the relationship between classical independence and tensor products, Voiculescu gave the following definition:

Definition 1.5. [Voi85] Let

F_{1}, F_{2} \subset (A, φ)

be two families of non-commutative random variables. We say that

F_{1}

and

F_{2}

are freely independent, if

φ (a_{1} \dots a_{n}) = 0

whenever

a_{j} \in Alg (1, F_{i (j)})

i (1) \neq i (2)

i (2) \neq i (3),

\dots

, and

φ (a_{1}) = φ (a_{2}) = \dots = 0

Here

Alg (S)

denote the algebra generated by a set

S

We should point out a certain similarity between this definition and the classical independence, where the requirement was that

E (F G) = 0

E (F) = E (G) = 0

1.5 Free Fock space.

We give an example of freely independent random variables that does not come from groups.

1.5.1 Free Fock space.

Let

H

be a Hilbert space,

Ω

be a vector, and let

F (H) = C Ω \oplus H \oplus H \otimes H \oplus \dots

be the Hilbert space direct sum of the tensor powers of

H

(the one-dimensional space

C Ω

is thought of as the zeroth tensor power of

H

). This space is called the free (or full) Fock space, by analogy with the symmetric and anti-symmetric Fock spaces (where the symmetric or anti-symmetric tensor product is used instead).

1.5.2 Free creation operators.

For

h \in H

consider the left creation operator

ℓ (h) : F (H) \to F (H)

given by

ℓ (h) h_{1} \otimes \dots \otimes h_{n} = h \otimes h_{1} \otimes \dots \otimes h_{n}

(here

h \otimes Ω = h

by convention). Then

ℓ (h)^{*}

exists and is given by

ℓ (h)^{*} h_{1} \otimes \dots \otimes h_{n} = 〈 h, h_{1} 〉 h_{2} \otimes \dots \otimes h_{n}

and

ℓ (h)^{*} Ω = 0

. The operator

ℓ (h)^{*}

is also called the annihilation operator.

These operators satisfy

ℓ^{*} (h) ℓ (g) = 〈 h, g 〉 1 .

In particular, the map

h \mapsto ℓ (h)

is a linear isometry between

H

(with its Hilbert space norm) and the closed linear span of

{ℓ (h) : h \in H}

, taken with the operator norm.

1.5.3 Relation with non-crossing diagrams.

Let

h_{1}, \dots, h_{n} \in H

be an orthonormal family. Let

ℓ_{j} = ℓ (h_{j})

Thus

ℓ_{i}^{*} ℓ_{j} = δ_{i j} 1

The joint distribution of the family

{ℓ_{1}, ℓ_{1}^{*}, \dots, ℓ_{n}, ℓ_{n}^{*}}

(also known as the

*

-distribution of

{ℓ_{1}, \dots, ℓ_{n}}

) has a nice combinatorial description.

Suppose that we are interested in

φ (ℓ_{i (1)}^{g (1)} \dots ℓ_{i (k)}^{g (k)}),

where

i (j) \in {1, \dots, n}

and

g (j) \in {\cdot, *}

j = 1, \dots, k

(by

ℓ_{j}^{g}

we mean

ℓ_{j}^{*}

g = *

and

ℓ_{j}

g = \cdot

Mark

k

points on the

x

-axis in half-plane

{(x, y) : y \geq 0}

at positions

(1, 0), \dots, (k, 0)

, and color them by

n

colors, so that the

j

-th point point is colored with the

i (j)

-th color. Attach to the

j

-th point the line segment from

(j, 0)

(j, 1)

. Orient this segment upwards (towards infinity) if

g (j) = \cdot

and orient it downwards (toward the

x

-axis) if

g (j) = *

. Color the segment the same way as the

j

-th point, from which it is drawn.

Then there exists at most one way of drawing a diagram so that:

$\circ$ The upper end of every segment is connected to the upper end of exactly one other segment, and all segments connected together have the same color;
$\circ$ Orient each line connecting two segments counter-clockwise. Then the orientation of the line is compatible with the orientation of the segments;
$\circ$ The lines do not cross.

It is not hard to prove that

φ (ℓ_{i (1)}^{g (1)} \dots ℓ_{i (k)}^{g (k)}) = 1

iff such a diagram exists, while

φ (ℓ_{i (1)}^{g (1)} \dots ℓ_{i (k)}^{g (k)}) = 0

otherwise.

1.5.4 Moments of $ℓ_{1} + ℓ_{1}^{*}$ .

Utilizing this description one can prove, for example, that

φ ((ℓ_{1} + ℓ_{1}^{*})^{k}) = C_{k},

where

C_{k}

is the number of non-crossing pairings between the integers

{1, \dots, k}

. Recall that a pairing of

{1, \dots, k}

is an equivalent relation on this set, so that each equivalence class has exactly two elements. Non-crossing pairings are ones for which one can draw lines above the real axis

R \supset {1, \dots, k}

, connecting the equivalent classes of the pairings, and having no intersections (more generally, one can in a similar way define non-crossing partitions of the set

{1, \dots, k}

). We shall later see that the moments of

X = (ℓ_{1} + ℓ_{1}^{*})

are related to the semicircle law.

Non-crossing diagrams and non-crossing partitions have a very deep connection with free probability; this connection is beyond the scope of these notes (see e.g. [Spe98] ). We will point out later, however, how this connection explains the relationship between freeness and large random matrices.

1.5.5 Free independence.

Let

A = C^{*} (ℓ (h) : h \in H)

and let

φ : A \to C

be given by

φ (a) = 〈 Ω, a Ω 〉 .

The

C^{*}

-algebra

A

is an extension of the Cuntz algebra

O_{n}

n = dim H

n < \infty

and is isomorphic to

O_{\infty}

dim H = \infty

It is not hard to prove that if

H_{1} ⊥ H_{2}

are two subspaces of

H

, then the algebras

C^{*} (ℓ (h) : h \in H_{1}) and C^{*} (ℓ (h) : h \in H_{2})

are freely independent in

(A, φ)

1.6 Free Central Limit Theorem.

1.6.1 Convergence in moments.

We say that a sequence

X_{n}

of random variables converges in moments to the law a random variable

X

, if

μ_{X_{n}} \to μ_{X}

in moments; that is to say, for any

p \geq 0

E (X_{n}^{p}) \to E (X^{p}) .

This definition makes sense verbatim (with the replacement of

E

φ

) in the setting of a non-commutative probability space.

1.6.2 Classical CLT

Let

X_{1}, \dots, X_{n}, \dots

be independent random variables, so that for all

j

E (X_{j}) = 0

E (X_{j}^{2}) = 1

, and so that for any

p \geq 0

{sup}_{n} E (X_{n}^{p}) \leq C_{p}

for some constants

C_{p} < \infty

. The classical central limit theorem states:

Theorem 1.6. Let

Y_{n} = \frac{1}{\sqrt{n}} (X_{1} + \dots + X_{n}) .

Then the laws of the random variables

Y_{n}

converge in moments to the Gaussian law

μ_{Gauss}

given by

d μ_{Gauss} (t) = \frac{1}{\sqrt{2 π}} exp (- t^{2} / 2) d t .

The main tool used in the proof of this theorem is the fact that if

Z_{1}

and

Z_{2}

are independent random variables, then the law of their sum is given by a convolution formula:

μ_{Z_{1} + Z_{2}} = μ_{Z_{1}} * μ_{Z_{2}} .

One then utilizes the fact that the Fourier transform

\hat{\cdot}

satisfies

\hat{μ * ν} = \hat{μ} \cdot \hat{ν} .

Thus if we write

L_{μ} = log \hat{μ},

then

L_{μ_{Z_{1} + Z_{2}}} = L_{μ_{Z_{1}}} + L_{μ_{Z_{2}}} .

Using this one can compute

L_{μ_{Y_{n}}}

and argue that it is quadratic in

t

. This implies that

μ_{Y_{n}}

converge in moments to a measure whose Fourier transform is proportional to

exp (- t^{2} / 2) d t

, so that

μ_{Y_{n}} \to μ_{Gauss}

1.6.3 Free CLT

Amazingly, the statement of the free central limit theorem is essentially the same as that of the classical one. The only difference is the replacement of the requirement of independence by that of free independence. This is only a single example of a surprising number of parallels between the behavior of independent and freely independent random variables.

Let

X_{1}, \dots, X_{n}, \dots

be freely independent random variables, so that for all

j

φ (X_{j}) = 0

φ (X_{j}^{2}) = 1

, and so that for any

p \geq 0

{sup}_{n} φ (| X_{n}^{p} |) \leq C_{p}

for some constants

C_{p} < \infty

. The classical central limit then states:

Theorem 1.7. [Voi85] Let

Y_{n} = \frac{1}{\sqrt{n}} (X_{1} + \dots + X_{n}) .

Then the laws of the random variables

Y_{n}

converge in moments to the Gaussian law

μ_{semicirc}

given by

d μ_{semicirc} (t) = \frac{1}{2 π} \sqrt{4 - t^{2}} d t .

We will postpone the proof of this theorem until we get to talk about the

R

-transform. For now let us just note that we need a tool to compute the distribution of

Z_{1} + Z_{2}

in terms of the distributions of

Z_{1}

and

Z_{2}

Z_{1}

and

Z_{2}

are freely independent.

1.7 Free Harmonic Analysis.

The corresponding classical problem was involved computing the convolution of two measures via the Fourier transform.

1.7.1 Free additive convolution.

By analogy with the classical situation, Voiculescu gave the following definition:

Definition 1.8. [Voi85] Let

μ_{1}

and

μ_{2}

be two probability measures on

R

. We define their free additive convolution

μ_{1} ⊞ μ_{2}

to be the law of the random variable

Z_{1} + Z_{2}

, where

Z_{1}

and

Z_{2}

are freely independent, and

μ_{Z_{j}} = μ_{j}

j = 1, 2

Since

Z_{1}, Z_{2}

are free in

(A, φ)

, the freeness condition determines the restriction of

φ

Alg (Z_{1}, Z_{2})

in terms of the restrictions of

φ

Alg (Z_{j})

j = 1, 2

. Thus the joint distribution of

Z_{1}

and

Z_{2}

depends only on

μ_{Z_{1}}

and

μ_{Z_{2}}

. Thus the distribution of

Z_{1} + Z_{2}

(which depends only on the joint distribution of

Z_{1}

and

Z_{2}

) depends only on

μ_{Z_{1}} = μ_{1}

and

μ_{Z_{2}} = μ_{2}

. It follows that

μ_{1} ⊞ μ_{2}

is well-defined.

Note that

⊞

is an operation on the space of probability measures on

R

Example 1.9. Let

μ

be a probability measure and let

δ_{x}

be the point mass at

x

. Then

μ ⊞ δ_{x} = μ_{x}

, the translate of

μ

x

In particular,

μ ⊞ δ_{x}

is the same as the classical convolution

μ * δ_{x}

1.7.2 $R$ -transform.

There is a free analog of the logarithm of the Fourier transform, which linearizes free additive convolution.

Let

μ

be a probability measure on

R

, and let

G_{μ} (ζ) = \int_{R} \frac{d μ (t)}{ζ - t}, ℑ ζ > 0

be a function defined in the upper half-plane. This function is sometimes callled the Cauchy transform of

μ

μ

has moments of all orders (e.g., if it is compactly supported),

G_{μ}

is a power series in

1 / ζ

, and we have

G_{μ} (ζ) = \frac{1}{ζ} \sum_{p \geq 0} μ_{p} ζ^{- p},

where

μ_{p} = \int_{R} t^{p} d μ (t)

are the moments of

μ

. Thus

G_{μ}

is the generating function for the moments of

μ

Define

R_{μ} (z)

by the equation

G_{μ} (\frac{1}{z} + R_{μ} (z)) = z .

It turns out that

R_{μ} (z)

is analytic in a certain region in

C

; however, one can simply understand it as a formal power series in

z

and regard the equation above as an equation involving composition of formal power series.

Voiculescu proved the following linearization theorem, which shows that the map

μ \mapsto R_{μ}

is a free analog of the logarithm of the Fourier transform.

Theorem 1.10. [Voi85] Let

R_{μ} (z) = \sum_{n \geq 0} α_{n + 1} z^{n}

be the

R

-transform of

μ

. Then: (a)

α_{n}

is a universal polynomial expression in the first

n

moments of

μ

; (b)

R_{μ} (z) = z

if and only if

μ = μ_{semicirc}

; i.e.,

d μ (t) = \frac{1}{2 π} \sqrt{4 - t^{2}} d t

; (c)

R_{μ_{1} ⊞ μ_{2}} (z) = R_{μ_{1}} (z) + R_{μ_{2}} (z)

; (d) If

Y

has law

μ

and

λ \in R

, then

R_{μ_{λ Y}} (z) = λ R_{μ} (λ z)

1.7.3 Proof of additivity of $R$ -transform.

We will sketch a proof of (a), (b) and (c). We start with a Lemma.

Lemma 1.11. Let

X \in (M, ψ)

be a non-commutative random variable. Fix

h \in C

∥ h ∥ = 1

. For a sequence of numbers

a_{1}, a_{2}, \dots

, let

Y_{N} = Y_{N}^{{a_{j}}_{j = 1}^{\infty}} = ℓ_{1}^{*} + \sum_{j = 0}^{N} a_{j + 1} ℓ_{1}^{j} \in (C^{*} (ℓ (C)), φ)

acting on the full Fock space

F (C)

. Then there exists a unique sequence of numbers

a_{1}, a_{2}, \dots,

, so that for each

N

ψ (X^{j}) = φ (Y_{N}^{j}), \forall 0 \leq j \leq N + 1 .

Moreover, each

a_{k + 1}

is a polynomial in

{ψ (X^{j}), 0 \leq j \leq k + 1}

, and this polynomial is universal, and does not depend on

X

The proof is based on an inductive argument and the combinatorial formula for moments of free creation operators.

1.7.4 Combinatorial definition of $R$ -transform.

Given

X

, let

a_{1}, a_{2}, \dots

be as in the Lemma above. Consider the formal power series

R_{μ} (z) = \sum_{n \geq 0} a_{n + 1} z^{n} .

For now we’ll consider

R_{μ}

given by this new definition, and call it the “combinatorial

R

-transform”. We shall later prove that

R_{μ} (z)

satisfies our old analytic definition in terms of

G_{μ}

given above; in particular, it will follow that

α_{n} = a_{n}

1.7.5 Additivity of combinatorial $R$ -transform.

Proposition 1.12.

R_{μ_{1} ⊞ μ_{2}} = R_{μ_{1}} + R_{μ_{2}}

Proof. Let $ℓ_{1}, ℓ_{2}$ be two free creation operators on the free Fock space $F (C^{2})$ , associated to a pair of orthonormal vectors.
Given $μ_{1}$ and $μ_{2}$ , let $Y_{1} (n) = ℓ_{1}^{*} + \sum_{k \leq n} a_{k + 1} ℓ_{1}^{k}, Y_{2} (n) = ℓ_{2}^{*} + \sum_{k \leq n} b_{k + 1} ℓ_{2}^{k}$ be random variables in $C^{*} (ℓ_{1})$ , $C^{*} (ℓ_{2})$ , respectively, so that their first $n$ moments are the same as the first $n$ moments of $μ_{1}$ and $μ_{2}$ , respectively.
Since $C^{*} (ℓ_{1})$ and $C^{*} (ℓ_{2})$ are freely independent, $Y_{1} (n)$ and $Y_{2} (n)$ are freely independent. Since moments of order up to $n$ of $Y_{1} (n) + Y_{2} (n)$ depend only on the moments of order up to $n$ of $Y_{1} (n)$ and $Y_{2} (n)$ , we see that the moments of order up to $n$ of $μ_{1} ⊞ μ_{2}$ and $Y_{1} (n) + Y_{2} (n)$ are the same.
We leave to the reader the combinatorial exercise to check that the moments of $Y_{1} (n) + Y_{2} (n)$ are the same as the moments of $Y_{3} (n) = ℓ_{3}^{*} + \sum (a_{k + 1} + b_{k + 1}) ℓ_{3}^{k} .$ By the uniqueness statement in Lemma 1.11 , it follows that $R_{μ_{1} ⊞ μ_{2}} = R_{μ_{1}} + R_{μ_{2}}$ as claimed. □

1.7.6 Analytic and combinatorial $R$ -transforms are the same.

It now remains to prove that the combinatorial

R

-transform

R_{μ} (z) = \sum a_{n + 1} z^{n}

satisfies the formula relating it to the Cauchy transform

G_{μ}

(and so

α_{n} = a_{n}

). The proof of the following proposition is due to Haagerup [Haa97] .

Proposition 1.13. Let

K_{μ} (z) = \frac{1}{z} + R_{μ} (z) = z^{- 1} + \sum a_{k + 1} z^{k}

. With the above notation, one has

G_{μ} (K_{μ} (z)) = z

and

K_{μ} (G_{μ} (ζ)) = ζ,

both equalities interpreted in terms of composition of formal power series.

Proof. Let

ℓ

be a free creation operator corresponding to a unit vector

e

, and acting on the full Fock space

F (C)

. Let

x = ℓ^{*} + f (ℓ)

, where

f

is a polynomial with real coefficients. Thus by definition,

R_{μ_{x}} (z) = f (z)

For

z \in C

with

| z | < 1

, consider the vector

ω_{z} = (1 - z ℓ)^{- 1} Ω = Ω + \sum_{n = 1}^{\infty} z^{n} e^{\otimes n} .

Then

ℓ ω_{z} = \sum_{n = 0}^{\infty} z^{n} e^{\otimes (n + 1)} = \frac{1}{z} (ω_{z} - Ω), 0 < | z | < 1 .

Similarly,

ℓ^{*} ω_{z} = \sum_{n = 1}^{\infty} z^{n} e^{\otimes (n - 1)} = z ω_{z}, | z | < 1 .

Thus

ω_{z}

is an eigenvector for

ℓ^{*}

with eigenvalue

z

. Hence

\begin{matrix} x^{*} ω_{z} & = & (ℓ + f (ℓ^{*})) ω_{z} = ℓ ω_{z} + f (z) ω_{z} \end{matrix}

\begin{matrix} = & \frac{1}{z} (ω_{z} - Ω) + f (z) ω_{z} \end{matrix}

\begin{matrix} = & (\frac{1}{z} + f (z)) ω_{z} - \frac{1}{z} Ω, 0 < | z | < 1 . \end{matrix}

It follows that

\frac{1}{z} Ω = ((\frac{1}{z} + f (z)) 1 - x^{*}) ω_{z} .

Now choose

0 < δ < 1

, so that

((\frac{1}{z} + f (z)) 1 - x^{*})

is invertible for

0 < | z | < δ

. This is possible, since

{lim}_{z \to 0} | \frac{1}{z} + f (z) | = \infty

(since

f (z)

is a polynomial). Hence

{((\frac{1}{z} + f (z)) 1 - x^{*})}^{- 1} Ω = z ω_{z};

thus

\begin{matrix} φ ({((\frac{1}{z} + f (z)) 1 - x^{*})}^{- 1}) & = & 〈 {((\frac{1}{z} + f (z)) 1 - x^{*})}^{- 1} Ω, Ω 〉 \end{matrix}

\begin{matrix} = & z 〈 ω_{z}, Ω 〉 = z . \end{matrix}

Since by definition of

G_{μ}

G_{μ} (λ) = φ ((λ 1 - x)^{- 1}) = \bar{φ ((\bar{λ} 1 - x^{*})^{- 1})} .

Since all of the coefficients of the power series

G_{μ} (λ)

are real, we get that

\bar{G_{μ} (λ)} = G_{μ} (\bar{λ}),

so that

G_{μ} (\bar{λ}) = φ ((\bar{λ} 1 - x^{*})^{- 1}) .

We now substitute

\bar{λ} = \frac{1}{z} + f (z)

to get

G_{μ} ((\frac{1}{z} + f (z))) = z .

We also see that

G

is invertible with respect to composition on some neighborhood. Applying its inverse to both sides, and remembering that

f (z) = R_{μ} (z)

, we get that

K_{μ} (z) = R_{μ} (z) + \frac{1}{z} = G_{μ}^{- 1} (z)

as claimed.

This concludes the proof in the case that

R_{μ}

is a polynomial; the general statement can be deduced from this partial case by taking limits. □

We have thus proved (a) and (c) of Theorem 1.10 .

1.7.7 Semicircular variables.

Let us prove (b). Assume that

R_{μ} (z) = z

. Then

K_{μ} (z) = \frac{1}{z} + z

and

\frac{1}{G_{μ} (ζ)} + G_{μ} (ζ) = ζ .

Solving this gives

G_{μ} (ζ) = \frac{ζ - \sqrt{ζ^{2} - 4}}{2} .

One can recover

μ

from

G_{μ}

by the formula

d μ (t) = {lim}_{s ↓ 0} \frac{1}{π} G_{μ} (t + i s) d t .

Since

G_{μ} (ζ) \to 0

ζ \to \infty

(as is apparent from the integral formula for the Cauchy transform), the branch of the square root must be chosen so that

\sqrt{ζ^{2} - 4} > 0

for

ζ

real and large. It follows that

d μ (t) = \frac{1}{2 π} \sqrt{4 - t^{2}} d t, t \in [- 2, 2],

and

d μ (t) = 0

outside of this interval.

Note that if

R_{μ} (z) = z

, then

α_{j} = a_{j} = 0

unless

j = 2

. It follows that the variable

ℓ_{1} + ℓ_{1}^{*}

on the Fock space

F (C)

has semicircular distribution.

1.7.8 Proof of free CLT

We are now ready to give a proof of the free central limit theorem.

Let

X_{1}, \dots, X_{n}, \dots

be freely independent random variables satisfying the assumptions of the free central limit theorem, and let

\begin{matrix} Z_{n} & = & (X_{1} + \dots + X_{n}), \end{matrix}

\begin{matrix} Y_{n} & = & \frac{1}{\sqrt{n}} (X_{1} + \dots + X_{n}) = \frac{1}{\sqrt{n}} Z_{n} . \end{matrix}

Let

ν_{n}

be the law of

X_{n}

, let

μ_{n}

be the law of

Y_{n}

and let

λ_{n}

be the law of

Z_{n}

. Thus

λ_{n} = ν_{1} ⊞ \dots ⊞ ν_{n}

and because of additivity of

R

-transform,

R_{λ_{n}} (z) = R_{ν_{1}} (z) + \dots + R_{ν_{n}} (z) .

Write

R_{λ_{n}} (z) = \sum_{p} α_{p + 1}^{(n)} z^{p}

. Since the coefficient of

z^{p}

R_{ν_{j}} (z)

is a universal polynomial in the moments up to order

p

X_{j}

, and

{sup}_{j} | φ (X_{j}^{p}) | < \infty

, it follows that

| α_{p + 1}^{(n)} | \leq n \cdot K_{p}

, where

K_{p}

are some constants independent of

n

Thus

R_{μ_{n}} (z) = \frac{1}{\sqrt{n}} R_{λ_{n}} (\frac{z}{\sqrt{n}}) = \sum_{p} \frac{α_{p + 1}^{(n)}}{n^{\frac{p + 1}{2}}} z^{p} .

p > 1

is fixed, the estimate

| α_{p + 1}^{(n)} | \leq n K_{p}

implies that

\frac{α_{p + 1}^{(n)}}{n^{\frac{p + 1}{2}}} \to 0 .

p = 0

, the fact that

φ (X_{n}) = 0

so that

φ (Y_{n}) = 0

implies that

α_{1}^{(n)} = 0

for all

n

. Finally, the fact that

φ (X_{n}^{2}) = 1

implies that

φ (Y_{n}^{2}) = 1

and

α_{2}^{(n)} = 1

for all

n

We conclude that

R_{μ_{n}} (z) \to z

n \to \infty

in the sense of coefficient-wise convergence of formal power series. Since the

p

-th moment of

μ_{n}

is a universal polynomial in the first

p

coefficients of the power series

R_{μ_{n}} (z)

, it follows that the

p

-th moment of

μ_{n}

converges to the

p

-th moment of the unique measure

μ

for which

R_{μ} (z) = z

. We saw above that this implies that

μ

is then the semicircle measure, and so

μ_{n} \to μ_{semicirc}

1.8 Further topics.

We already briefly touched upon the amazing correspondence between various theorems in the classical and free context. There are several other instances of this. For example, one can consider the free analog of infinite divisibility. A measure

μ

is called infinitely divisible if for any

n

there is a measure

μ_{n}

so that

μ

is the

n

-fold convolution

μ_{n} * \dots * μ_{n}

. One can say that

μ

is freely infinitely divisible if for each

n

there is a measure

μ_{n}

so that

μ

is the

n

-fold free convolution

μ_{n} ⊞ \dots ⊞ μ_{n}

. Remarkably, there is a one-to-one correspondence between the classically infinitely divisible measures and the free ones. A similar situation occurs when considering stable and freely stable laws.

There is a also a notion of multiplicative free convolution, based on taking products of non-commutative random variables.

The reader is encouraged to consult [Voi00] for more details.

2 Random Matrices and Free Probability.

One of the most important advances in free probability theory was Voiculescu’s discovery that free probability theory describes the asymptotic distribution of certain large random matrices.

This has led to a number of applications of free probability theory, both to spectral computations for random matrices, and to von Neumann algebras. The latter applications rely on the somewhat unexpected presence of a “matricial” structure in free probability theory: if one takes several square arrays of certain free random variables and creates several matrices out of these arrays, then the resulting matrices have surprising freeness properties (for example, the resulting matrices may be freely independent).

2.1 Random matrices.

A random matrix is a matrix, whose entries are random variables. One can also think of a random matrix as a matrix-valued random variable, i.e., as a randomly chosen matrix. Any Borel function of a random matrix becomes then a random variable. For example, the eigenvalues of a random matrix (being functions of its entries) are themselves random variables.

2.1.1 Expected distributions.

Let

X_{N}

be a self-adjoint random matrix of size

N \times N

. We think of

X_{N}

as a function

X_{N} : Σ \to M_{N} (C)

on some probability space

(Σ, σ)

. Integration with respect to

σ

has the meaning of taking the expected value and will be denoted by

E

One is frequently interested in the expected proportion of the eigenvalues of

X_{N}

that lie in a given interval

[a, b]

\begin{matrix} Λ_{N} ([a, b]) & = & \frac{1}{N} Expected # {eigenvalues of X_{N} in [a, b]} \end{matrix}

\begin{matrix} = & E (\frac{1}{N} # {eigenvalues of X_{N} (t) in [a, b]}) \end{matrix}

Let

λ_{1} (t), \dots, λ_{N} (t)

be the eigenvalues of

X (t)

, listed with multiplicity, and viewed as random variables. Let

ν_{N}^{t} = \frac{1}{N} \sum_{j = 1}^{N} δ_{λ_{j} (t)}

be a random measure associated with this list of eigenvalues (we say that

ν_{N}^{t}

is random to emphasize that it depends on

t

, i.e., is a measure-valued random variable). Then

Λ_{N} ([a, b]) = E (ν_{N}^{t} ([a, b]))

is the expected value of

ν_{N}

. Thus if we set

μ_{N} = E (ν_{N}^{t})

we obtain that

Λ_{N} ([a, b]) = μ_{N} ([a, b]) .

Note that

ν_{N}^{t} = \frac{1}{n} Tr \circ σ_{N}^{t},

where

σ_{N}^{t}

is the spectral measure of

X_{N} (t)

. In other words,

ν_{N}^{t}

is the distribution of

X_{N}^{t}

, when viewed as a random variable in

(M_{N} (C), \frac{1}{N} Tr)

. Thus

μ_{N}

is the “expected value of the distribution of

X_{N}

”.

2.2 Asymptotics of random matrices.

We are mainly interested in the asymptotics of the expected number of eigenvalues of a random matrix in a given interval. In other words, we are interested in studying the asymptotics of the measure

μ_{N}

N \to \infty

It should be mentioned that the eigenvalue distributions of random matrices have been studied in several ways. Instead of looking at the expected numbers of eigenvalues, there is also interest in the behavior of eigenvalue spacing (normalized so that the average spacing is

1

). One is also interested in the behavior of the largest and smallest eigenvalues (this translates into considering the expected value of the spectral radius, or the operator norm, of the matrix

X_{N}

). We have already heard in this conference of the significant progress recently made by Haagerup and Thorbjornsen on the latter problem, in the case that

X_{N}

is an arbitrary polynomial of a

k

-tuple of Gaussian random matrices.

2.2.1 Wigner’s theorem for Gaussian random matrices.

Let

X_{N}

be a self-adjoint random matrix, whose entries are

g_{i j}

1 \leq i, j \leq N

, determined as follows. The variables

{g_{i j} : i \leq j}

are independent; if

i < j

, then

g_{i j}

is a centered complex Gaussian random variable of variance

\frac{1}{N}

; if

i = j

, then

g_{i j}

is a centered real Gaussian random variable of variance

\frac{2}{N}

. Finally, if

i > j

g_{i j} = \bar{g_{j i}}

One can think of the random matrix

X_{N}

as a map

X_{N} : (Σ, σ) \to M_{N} (C) .

Here

Σ = M_{N} (C)

is the space of complex

N \times N

matrices,

X_{N}

is the map

A \mapsto \frac{A + A^{*}}{2},

and

σ

is the Gaussian measure on

Σ

given by

d σ (A) = α_{N} e^{- \frac{1}{N} Tr (A^{*} A)} d A,

for a suitable constant

α_{N}

Let

μ_{N}

be as before the expected value of the distribution of

X_{N}

Then

μ_{N} \to μ_{semicirc}

weakly as

N \to \infty

. This is a very old result, going back to the work of Wigner in 1950s [Wig55] .

It turns out that the semicircle law is fairly universal for matrices with independent identically distributed entries. In fact, Wigner’s original work involved matrices

X_{N}

whose entries were not Gaussian, but random signs.

2.2.2 Voiculescu’s asymptotic freeness results.

The semicircular law also arose in free probability theory as the central limit law. Voiculescu showed that this is not just a coincidence: families of certain

N \times N

random matrices behave as free random variables in the large

N

asymptotics.

For each

N

, let

D_{N}

be a diagonal matrix; assume that the operator norms

∥ D_{N} ∥

are uniformly bounded in

N

, and assume that the distribution of

D_{N}

(as an element of

(M_{N} (C), \frac{1}{N} Tr)

) converges in moments to a limit measure

ν

. Let

X_{N}^{(1)}, \dots, X_{N}^{(k)}

be random matrices described as follows. Let

Σ = M_{N} (C)^{k}

with the measure

σ

given by

d σ (A_{1}, \dots, A_{k}) = C_{N, k} e^{- \frac{1}{N} Tr (A_{1}^{*} A_{1} + \dots + A_{k}^{*} A_{k})} d A_{1} \dots d A_{k},

for a suitable constant

C_{N, k}

. Then

X_{N}^{(p)}

is the map

X_{N}^{(p)} : (A_{1}, \dots, A_{k}) \mapsto \frac{A_{p} + A_{p}^{*}}{2} .

More explicitly, if we denote by

g_{i j}^{(p)}

the

i, j

-th entry of

X_{N}^{(p)}

, then

{g_{i j}^{(p)} : 1 \leq i \leq j \leq N, 1 \leq p \leq k}

form a family of independent centered Gaussian random variables, so that:

g_{i j}^{(p)}

is a complex Gaussian of variance

\frac{1}{N}

i < j

;

g_{i i}^{(p)}

is real Gaussian of variance

\frac{2}{N}

; and

g_{i j}^{(p)} = \bar{g_{j i}^{(p)}}

i > j

The family

(X_{N}^{(1)}, \dots, X_{N}^{(k)})

is sometimes called the Gaussian Unitary Ensemble (or GUE) because of the obvious invariance of their joint distribution under conjugation by

k

unitaries.

Let

μ_{N}

be the distribution of the family

(D_{N}, X_{N}^{(1)}, \dots, X_{N}^{(k)})

, viewed as a linear functional on the space of polynomials in

k + 1

indeterminates.

Then Voiculescu proved:

Theorem 2.1. [Voi91] Let

(d, x_{1}, \dots, x_{k})

be a family of free random variables in a non-commutative probability space

(A, φ)

, so that

d

has distribution

ν

, and

x_{1}, \dots, x_{k}

have semicircular distribution. Let

μ

be the distribution of this family, and let

μ_{N}

be the distribution of

(D_{N}, X_{N}^{(1)}, \dots, X_{N}^{(k)})

as described above. Then as

N \to \infty

μ_{N} \to μ

in moments.

In other words, for any

t

and any

j_{1}, \dots, j_{t} \in {1, \dots, k}

n_{0}, \dots, n_{t} \in {0, 1, 2, \dots}

one has

\begin{matrix} {lim}_{N \to \infty} E (\frac{1}{N} Tr (D_{N}^{n_{0}} X_{N}^{(j_{1})} D_{N}^{n_{1}} \dots X_{N}^{(j_{t})} D_{N}^{n_{t}})) \end{matrix}

\begin{matrix} = φ (d^{n_{0}} x_{j_{1}} d^{n_{1}} \dots x_{j_{t}} d^{n_{t}}) . \end{matrix}

Note that in particular we have that

D_{N}

and

X_{N}^{(1)}, \dots, X_{N}^{(k)}

are asymptotically free. One also recovers Wigner’s result, since in particular

μ_{X_{N}^{(1)}} \to μ_{x_{1}}

, and

μ_{x_{1}}

is the semicircle law.

2.2.3 Some remarks on the proof.

We will not prove this theorem here; see e.g. [VDN92] for a proof.

We shall only sketch the essential combinatorial trick used in the proof and explain its connection to non-crossing partitions.

We concentrate on the case of a single random matrix

X_{N}

with Gaussian entries

g_{i j}

(depending on

N

Consider the value of the moment

\begin{matrix} \frac{1}{N} E (Tr (X_{N}^{k})) & = & \frac{1}{N} \sum_{i_{1}, \dots, i_{k}} E (g_{i_{1} i_{2}} g_{i_{2} i_{3}} \dots g_{i_{k - 1} i_{k}} g_{i_{k} i_{1}}) . \end{matrix}

(2.1)

k

is odd, it is not hard to see that the value of the moment is zero, so we’ll assume that

k

is even for the remainder of the proof.

Since

g_{i j}

are Gaussian of variance

\frac{1}{N}

E (g_{i_{1} i_{2}} g_{i_{2} i_{3}} \dots g_{i_{k - 1} i_{k}} g_{i_{k} i_{1}})

is zero unless the variable

g_{i_{p} i_{q}}

entering in the product “pair up” with another variable

g_{i_{p^{'}} i_{q^{'}}}

entering the product, and

i_{p} = i_{q^{'}}

i_{q} = i_{p^{'}}

(so that

g_{i_{p} i_{q}} = \bar{g_{i_{p^{'}} i_{q^{'}}}}

). That is to say, a term in the sum ( 2.1 ) is zero unless for some pairing

π

of the set

{1, \dots, k}

with itself, the indices

i_{1}, \dots, i_{k}

satisfy the equations

\begin{matrix} i_{s} = i_{r}, i_{s + 1} = i_{r - 1} if s \sim_{π} r, s \neq r \end{matrix}

(2.2)

(where

s + 1

is understood as the remainder mod

n

, and

s \sim_{π} r

iff

s

and

r

are in the same equivalence class of

π

Suppose now that we fix

π

and ask how large a contribution we can get from all of the terms that satisfy ( 2.2 ) for this given

π

. The equations ( 2.2 ) can be visualized as follows. Let

C_{k}

be the cyclic graph with

k

edges, numbered

1

through

k

. Place

i_{1}, \dots, i_{k}

on the vertices of this graph, so that the

j

-th edge, oriented clockwise, has vertices

i_{j}

and

i_{j + 1}

, in that order (

j + 1

is again understood modulo

n

). In other words, we can think of the map

j \mapsto i_{j}

as a function on the vertices of

C_{k}

The pairing

π

defines an equivalence relation on the set of edges of

C_{k}

: edges

r

and

s

are equivalent if

r \sim_{π} s

. Form the quotient graph

C_{k} / \sim_{π}

by gluing equivalent edges with orientation reversed. Then ( 2.2 ) is equivalent to saying that the function

j \mapsto i_{j}

descends to a function on the quotient graph

C_{k} / \sim_{π}

The total number of such functions is

N^{v}

, where

v

is the number of vertices of

C_{k} / \sim_{π}

Because the variance of

g_{i j}

E (g_{i j} \bar{g_{i j}}) = \frac{1}{N}

, we can deduce that the contribution to the sum ( 2.1 ) of those terms that satisfy equations ( 2.2 ) for a given

π

is at most

\frac{1}{N} \cdot {(\frac{1}{N})}^{k / 2} \cdot N^{v} .

The first factor

1 / N

comes from the normalization of the trace; the term

(1 / N)^{k / 2}

comes from bound on the variance; and the factor

N^{v}

comes from our estimation of the number of indices

i_{1}, \dots, i_{k}

satisfying ( 2.2 ). It follows that the contribution of all of the terms that satisfy ( 2.2 ) for a given

π

is negligible (is of order

1 / N

) if

v < 1 + \frac{k}{2}

Recall that

C_{k}

has exactly

k

edges and that

k

is even. Thus

C_{k} / \sim_{π}

has exactly

k / 2

edges. It follows that

C_{k} / \sim_{π}

has

1 + \frac{k}{2}

vertices exactly if it is a tree. With a little bit of care, one can show that ( 2.1 ) is then equal to

E (Tr (X_{N}^{k})) = \sum_{π s.t. C_{k} / \sim_{π} is a tree} 1 + O (\frac{1}{N}) .

On the other hand, we mentioned in § 1.5.4 that the

k

-th moment of a semicircular element is given by

τ (s^{k}) = \sum_{σ \in N C (k)} 1,

where

N C (k)

stands for the set of non-crossing pairings of

{1, \dots, k}

. It is not hard to see that if we interpret a pairing

σ

{1, \dots, k}

as a pairing of edges of

C_{k}

, it is non-crossing if and only if

C_{k} / \sim_{σ}

is a tree. This concludes the proof.

2.3 An application to random matrix theory.

Keeping the notations of Theorem 2.1 , let

Y_{N} = D_{N} + X_{N}^{(1)}

. It is not hard to work out the limit distribution of

Y_{N}

using free probability tools. Indeed,

μ_{Y_{N}} \to μ_{d + x_{1}} .

On the other hand,

d

and

x_{1}

are freely independent. Thus

μ_{d + x_{1}} = μ_{d} ⊞ μ_{x_{1}} = ν ⊞ μ_{semicirc} .

The computation of the limit distribution of

Y_{N}

can then be carried out using the machinery of

R

-transform.

2.4 Applications to von Neumann algebras.

Let us say that a non-commutative non-self-adjoint random variable

y

is circular if

ℜ y

and

ℑ y

are freely independent and are semicircular.

X_{N}^{(1)}

and

X_{N}^{(2)}

are two GUE random matrices, then

X_{N}^{(1)} + \sqrt{- 1} X_{N}^{(2)}

converges in

*

-distribution to a circular variable.

If we start with

2 n^{2}

GUE random matrices

X_{N}^{(i, j, 1)},

X_{N}^{(i, j, 2)}

1 \leq i, j \leq n

, then we can form a new matrix,

Y_{N} = (Y_{N}^{(i j)})_{i, j = 1}^{n}

of size

n N \times n N

, where

Y_{N}^{(i j)} = X_{N}^{(i, j, 1)} + \sqrt{- 1} X_{N}^{(i, j, 2)}

. It is not hard to see that

ℜ \frac{1}{\sqrt{n}} Y_{N}, ℑ \frac{1}{\sqrt{n}} Y_{N}

is a pair of GUE random matrices. We thus obtain that

\frac{1}{\sqrt{n}} Y_{N}

is circular in the limit

N \to \infty

. From this it is not hard to prove that if

x_{i j}

1 \leq i, j \leq n

are a free circular family, then the matrix

y = \frac{1}{\sqrt{n}} (x_{i j})_{i, j = 1}^{n}

is again circular. In fact, one can use the asymptotic freeness result to show that if we let

D

be the algebra of scalar diagonal

n \times n

matrices, then

d

is free from

(y, y^{*})

This fact underlies the earliest applications of free probability theory to von Neumann algebras and free group factors. For example, one has the following result of Voiculescu [Voi90] :

Theorem 2.2. Let

n

be an integer, and let

t \in Q

be a rational number, so that

m = \frac{1}{t^{2}} (n - 1) + 1

is an integer. Let

p \in L (F (n))

be a projection in the free group factor

L (F (n))

associated to the free group on

n

generators. Assume that

p

has trace

t

. Then

\begin{matrix} p L (F (n)) p \sim = L (F (m)) . \end{matrix}

(2.3)

This theorem has many far-reaching extensions due to Dykema and Radulescu, see e.g.

[Voi90, Dyk95, Dyk93b, Dyk93a, Dyk94, R–d92, R–d94] . For example,it turns out that it is possible to define for each

t \in (1, + \infty]

avon Neumann algebra

L (F (t))

, called an interpolated free groupfactor, in such a way that

L (F (t))

is the von Neumann algebra onthe free group with

t

generators, if

t

is an integer. Moreover, thecompression formula ( 2.3 ) remains valid for non-rational traces of

p

: the result is an interpolated free group factor with

\frac{1}{t^{2}} (n - 1) + 1

generators; the same formula is valid also for non-integer

n

. For a II

_{1}

factor

N

, its fundamental group was defined by Murray and von Neumann to be the set

F (M) = {λ \in (0, + \infty) : M \sim = p M p for p \in M a projection of trace λ} .

Radulescu proved that

F (L (F (\infty))) = (0, + \infty)

(Voiculescu’s result quoted above implied that the positive rational numbers

Q_{+} \subset F (L (F (\infty)))

). In fact, it turns out that there is a dichotomy:

either all interpolated free group factors are the same among each other (and also are isomorphic to

L (F (\infty))

), and all have

(0, + \infty)

as their fundamental groups; or

L (F (\infty)) ≁ = L (F (t))

for finite

t

, and

F (L (F (t))) = {1}

for finite

t

. It is not known which of the two alternatives holds.

Further developments of these techniques gave information on fundamental groups of more general free products of von Neumann algebras and on subfactors of

L (F (\infty))

(see e.g.

[R–d94, Dyk95, Shl98, Shl99, PS03, SU02, DR00] ).

3 Free Entropy via Microstates.

Free entropy was introduced and developed by Voiculescu in a series of papers [Voi93, Voi94, Voi96, Voi97, Voi98b, Voi98a, Voi99a, Voi99b] as a free probability analogue of the classical information-theoretic entropy; see also Voiculescu’s survey [Voi02] .

3.1 Definition of free entropy.

Voiculescu’s original “microstates” approach to free entropy followed Boltzman’s definition of entropy of a macroscopic state.

3.1.1 Microstates and Macrostates.

Assume that the macroscopic behavior of a physical system (e.g. gas) is described by several macroscopic parameters (e.g., pressure, volume and temperature). Then a macrostate

s

is a state of the system corresponding to certain prescribed values of these parameters.

Microscopically, the system is made out of a large number of smaller systems (e.g., the molecules that make up the gas). On this microscopic level, the system can be described by a microstate

s

that specifies exactly the states of all of the sub-systems (e.g, the exact locations and moments of all of the molecules of the gas). If we fix a macrostate

S

, there are many microstates

s

that lead to the same macroscopic state.

Boltzman’s formula is then that the entropy of

S

must be given by

K log # {s : microscopic state s leads to macroscopic state S}

for some constant

K

3.1.2 Matricial microstates.

Voiculescu’s idea is to interpret

x_{1}, \dots, x_{n} \in (A, φ)

as a description of a macroscopic state of a system, and as microstates to take the set of all matrices

X_{1}, \dots, X_{n}

of a specific dimension that approximate

x_{1}, \dots, x_{n}

. More precisely for

x_{1}, \dots, x_{n}

in a non-commutative probability space

(A, φ)

x_{j} = x_{j}^{*}

, let

M_{k \times k}^{s a}

be the space of

k \times k

self-adjoint matrices, and consider the set

\begin{matrix} Γ (x_{1}, \dots, x_{n}; k, l, ɛ) = {(X_{1}, \dots, X_{n}) \in (M_{k \times k}^{s a})^{n} : \end{matrix}

\begin{matrix} for any word w in n letters of length at most l, \end{matrix}

\begin{matrix} | \frac{1}{N} Tr (w (X_{1}, \dots, X_{n})) - φ (w (x_{1}, \dots, x_{n})) | < ɛ} . \end{matrix}

In other words, we are considering a weak neighborhood

U

of the joint law

μ_{x_{1}, \dots, x_{n}}

defined by the property that

μ^{'} \in U

iff the value of the law

μ^{'}

on all words of length at most

l

deviates by no more than

ɛ

from that of

μ_{x_{1}, \dots, x_{n}}

. Next, we consider all self-adjoint

k \times k

matrices

(X_{1}, \dots, X_{n})

so that

μ_{X_{1}, \dots, X_{n}} \in U .

The set

Γ (x_{1}, \dots, x_{n}; k, l, ɛ)

is called the set of (matricial) microstates for

x_{1}, \dots, x_{n}

3.1.3 Definition of free entropy.

Voiculescu then defined the free entropy by

χ (x_{1}, \dots, x_{n}) = {inf}_{ɛ, l} {limsup}_{k \to \infty} \frac{1}{k^{2}} log Vol Γ (x_{1}, \dots, x_{n}; k, l, ɛ) + \frac{n}{2} log k,

where

Vol

refers to the Euclidean volume associated to the standard identification of

M_{k \times k}^{s a}

with

R^{k^{2}}

. We use the convention that

log 0 = - \infty

We should note that

χ

depends only on the law of

x_{1}, \dots, x_{n}

and not on the particular realization of this law. It would be also appropriate to write

χ (μ_{x_{1}, \dots, x_{n}})

3.1.4 Relation to Connes’ problem.

Note that there is no a priori reason for

Γ (x_{1}, \dots, x_{n}; k, l, ɛ)

to be non-empty. Connes has posed a question in [Con76] of whether every II

_{1}

factor can be embedded into an ultrapower of the hyperfinite II

_{1}

factor. It is not hard to see that his question is equivalent to the question of whether, given

x_{1}, \dots, x_{n}

in a von Neumann algebra

(A, φ)

with

φ

a trace, one has that for any

ɛ > 0

and

l > 0

there is a

k

so that

Γ (x_{1}, \dots, x_{n}; k, l, ɛ) \neq \emptyset

. This question is open even for

x_{1}, \dots, x_{n}

elements of the group algebra of an arbitrary discrete group

Γ

3.2 Properties of free entropy.

Voiculescu gave an explicit formula for the free entropy of a single variable

x

with law

μ

χ (x) = \int \int log | s - t | d μ (s) d μ (t) + C

for a certain universal constant

C

Free entropy has a number of nice properties, related to freeness and analogous to the properties of classical entropy; we list a few, due to Voiculescu [Voi94] :

$\circ$ If $x_{1}, \dots, x_{n}$ are free, then $χ (x_{1}, \dots, x_{n}) = χ (x_{1}) + \dots + χ (x_{n})$ . Furthermore, if $χ (x_{1}, \dots, x_{n}) = χ (x_{1}) + \dots + χ (x_{n}) \neq - \infty$ , then $x_{1}, \dots, x_{n}$ are freely independent.
$\circ$ $χ (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{m}) \leq χ (x_{1}, \dots, x_{n}) + χ (y_{1}, \dots, y_{m})$ .
$\circ$ $χ (x_{1}, \dots, x_{n})$ is maximal subject to $\sum φ (x_{i}^{2}) = n^{2}$ iff $x_{1}, \dots, x_{n}$ is a free semicircular family and each $x_{i}$ satisfies $φ (x_{i}^{2}) = 1$ .
$\circ$ If $s_{1}, \dots, s_{n}$ are free semicircular variables, freely independent from the family $x_{1}, \dots, x_{n}$ , then $W^{*} (x_{1}, \dots, x_{n})$ embeds into the ultrapower of the hyperfinite II $_{1}$ factor if and only if $χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} x_{n}) > - \infty$ for every $δ > 0$ . Thus semicircular perturbations (i.e., “free Brownian motion”) have a regularization effect on free entropy.

To give but one example of the technical difficulties that working with

χ

presents, one would be able to prove that

χ (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{m}) = χ (x_{1}, \dots, x_{n}) + χ (y_{1}, \dots, y_{m})

(x_{1}, \dots, x_{n})

and

(y_{1}, \dots, y_{m})

are free families, provided that one could argue that the

limsup

in the definition of free entropy is a limit.

3.2.1 Infinitesimal change of variables formula.

We end the review of free entropy by mentioning the change of variables formula [Voi94] .

Assume that

y_{1}, \dots, y_{n}

are given as non-commutative power series in

x_{1}, \dots, x_{n}

y_{j} = F_{j} (x_{1}, \dots, x_{n})

. Assume moreover that the multi-radius of convergence of

F_{j}

is large enough to exceed the norms of all

y_{1}, \dots, y_{n}

. Assume further that

x_{j} = G_{j} (y_{1}, \dots, y_{n})

for some non-commutative power series

G_{j}

, and that similarly the multi-radius of convergence of

G_{j}

is large enough to exceed the norms of

x_{1}, \dots, x_{n}

Let

M = W^{*} (x_{1}, \dots, x_{n}) = W^{*} (y_{1}, \dots, y_{n})

, and let

φ

be the given trace on

M

. Consider the derivation

\partial_{j} : C [x_{1}, \dots, x_{n}] \to M \bar{\otimes} M

determined by

\partial_{j} (x_{i}) = δ_{j i} 1 \otimes 1 .

For example,

\partial_{2} (x_{1} x_{2}^{2} x_{3} x_{2}) = x_{1} \otimes x_{2} x_{3} x_{2} + x_{1} x_{2} \otimes x_{3} x_{2} + x_{1} x_{2}^{2} x_{3} \otimes 1 .

Let

J (x_{1}, \dots, x_{n}) = {(J_{i j} (x_{1}, \dots, x_{n}))}_{i j = 1}^{n} \in M_{n} (M \bar{\otimes} M)

be the “Jacobian” of

F

J_{i j} (x_{1}, \dots, x_{n}) = \partial_{i} F_{j} (x_{1}, \dots, x_{n})

. Then

χ (y_{1}, \dots, y_{n}) = χ (x_{1}, \dots, x_{n}) + n log (| det | (J (x_{1}, \dots, x_{n}))),

where

| det |

refers to the Kadison-Fuglede determinant

| det | (J) = exp (τ_{M_{n \times n} (M \otimes M)} (log | J |)) .

Here

τ_{M_{n \times n} (M \otimes M)}

is the tensor product

\frac{1}{n} Tr \otimes φ \otimes φ

of the traces on

M_{n \times n}

and

M \otimes M

The explanation of this formula and the appearance of

J

is that the Jacobian of the transformation

(X_{1}, \dots, X_{n}) \mapsto (F_{1} (X_{1}, \dots, X_{n}), \dots, F_{n} (X_{1}, \dots, X_{n})),

viewed as a map from

M_{k \times k}^{n} \to M_{k \times k}^{n}

is naturally a matrix in

M_{n \times n} (End (M_{k \times k})) \sim = M_{n \times n} (M_{k \times k} \otimes M_{k \times k})

, and is given by

J (X_{1}, \dots, X_{n})

3.3 Free entropy dimension.

Voiculescu’s original idea for defining free entropy dimension was to consider a kind of asymptotic Minkowski dimension of the set of microstates. We present below an equivalent definition of K. Jung, which is based on packing dimension instead.

3.3.1 Packing and covering numbers and Minkowski dimension.

For a metric space

X

, let

P_{ɛ} (X)

be the packing number of

X

; that is, the maximal number of disjoint

ɛ

-balls that can be placed inside

X

. Similarly, let

K_{ɛ} (X)

be the covering number of

X

; that is, the minimal number of

ɛ

-balls needed to cover

X

For a metric space

X

, the upper uniform packing dimension and the upper uniform covering dimension are the same and are defined as

{limsup}_{ɛ \to 0} \frac{log P_{ɛ} (X)}{| log ɛ |} = {limsup}_{ɛ \to 0} \frac{log K_{ɛ} (X)}{| log ɛ |} .

It is a theorem that if

X \subset R^{d}

, then both of these numbers are the same as the Minkowski dimension of

X

, which is given by

d - {liminf}_{ɛ \to 0} \frac{log Vol N_{ɛ} (X)}{log ɛ},

where

N_{ɛ} (X)

denotes the tubular neighborhood of

X

of radius

ɛ

3.3.2 Free entropy dimension.

Let

x_{1}, \dots, x_{n} \in (A, φ)

be self-adjoint. Then let

\begin{matrix} P_{δ} (x_{1}, \dots, x_{n}) & = & {inf}_{ɛ, l} {limsup}_{k \to \infty} \frac{1}{k^{2}} log P_{δ} (Γ (x_{1}, \dots, x_{n}; l, k, ɛ)) \end{matrix}

\begin{matrix} K_{δ} (x_{1}, \dots, x_{n}) & = & {inf}_{ɛ, l} {limsup}_{k \to \infty} \frac{1}{k^{2}} log K_{δ} (Γ (x_{1}, \dots, x_{n}; l, k, ɛ)) . \end{matrix}

Then K. Jung proved the following theorem [Jun02] :

Theorem 3.1. One has

{limsup}_{δ \to 0} \frac{P_{δ} (x_{1}, \dots, x_{n})}{| log δ |} = {limsup}_{δ \to 0} \frac{K_{δ} (x_{1}, \dots, x_{n})}{| log δ |} .

Moreover, if

s_{1}, \dots, s_{n}

are free semicircular variables, free from

x_{1}, \dots, x_{n}

, then

{limsup}_{δ \to 0} \frac{P_{δ} (x_{1}, \dots, x_{n})}{| log δ |} = n - {liminf}_{δ \to 0} \frac{χ (x_{1}^{δ}, \dots, x_{n}^{δ} : s_{1}, \dots, s_{n})}{log δ^{1 / 2}},

where

x_{j}^{δ} = x_{j} + \sqrt{δ} s_{j}

The value of any of these limits is then by definition called the free entropy dimension

δ_{0} (x_{1}, \dots, x_{n})

Here

χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n} : s_{1}, \dots, s_{n})

is the free entropy of

x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n}

in the presence of

s_{1}, \dots, s_{n}

; it is a technical modification of the free entropy

χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n})

. Very roughly, the value of

χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n})

is the asymptotic logarithmic volume of a

δ^{1 / 2}

-tubular neighborhood of the set of microstates for

x_{1}, \dots, x_{n}

. Thus the number

n - {liminf}_{δ \to 0} \frac{χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n} : s_{1}, \dots, s_{n})}{log δ^{1 / 2}}

is a kind of asymptotic Minkowski dimension of the set of microstates. This was the original definition of free entropy dimension given by Voiculescu. We finish this section with an example.

Let

x_{1}, \dots, x_{n}

be free semicircular variables. Then

x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n}

are also semicircular. In fact

χ (x_{1} + \sqrt{δ} s_{1}, \dots, x_{n} + \sqrt{δ} s_{n} : s_{1}, \dots, s_{n}) \geq χ (x_{1}, \dots, x_{n}) > - \infty .

It follows that

δ_{0} (x_{1}, \dots, x_{n}) = n

. In particular, the free group factor

L (F (n))

can be generated by a family with free entropy dimension

n

3.4 Properties of free entropy dimension.

The theory of free entropy dimension has found a number of spectacular applications to von Neumann algebra theory.

For example, Voiculescu used free entropy dimension to prove that free group factors do not have Cartan subalgebras; soon thereafter, L. Ge gave a proof that free group factors are prime, i.e., cannot be written as tensor products of infinite-dimensional von Neumann algebras.

One of the main remaining questions about free entropy dimension is the extent to which

δ_{0} (x_{1}, \dots, x_{n})

depends on the elements

x_{1}, \dots, x_{n}

. Voiculescu asked if

δ_{0} (x_{1}, \dots, x_{n})

is an invariant of the von Neumann algebra generated by

x_{1}, \dots, x_{n}

, taken with a fixed trace. Since

L (F (n))

has a generating family with free entropy dimension equal to

n

, a positive answer to this question would imply non-isomorphism of free group factors.

3.4.1 Invariance of $δ_{0}$ .

Voiculescu proved that

δ_{0} (x_{1}, \dots, x_{n})

depends only on the restriction of the trace to the algebra generated by

x_{1}, \dots, x_{n}

In particular, if

Γ

is a discrete group and

x_{1}, \dots, x_{n} \in C Γ

are self-adjoint generators of the group algebra, then

δ_{0} (x_{1}, \dots, x_{n})

depends only on the group. This invariant seems to be related to the

L^{2}

-cohomology of

Γ

; see below.

3.4.2 Free entropy dimension for a single variable.

Voiculescu proved that if

X

has law

μ

, then

δ_{0} (X) = 1 - \sum_{t an atom of μ} μ ({t})^{2} .

In particular, notice that

δ_{0}

is an invariant of the von Neumann algebra (with a fixed trace) generated by

X

3.4.3 Upper bounds on $δ_{0}$ .

M

satisfies any of the following conditions, then

δ_{0} (x_{1}, \dots, x_{n}) = 1

for any

x_{1}, \dots, x_{n} \in M

generating

M

(1) [Voi96] $M$ has a Cartan subalgebra, i.e., a maximal abelian subalgebra $A$ so that $M = W^{*} ({u \in M unitary : u A u^{*} = A})$ . Thus free group factors have no Cartan subalgebras.
(2) [Voi96] $M$ has a diffuse regular hyperfinite subalgebra: a hyperfinite subalgebra $R$ so that $M = W^{*} ({u \in M unitary : u R u^{*} = R})$ . This is the case, in particular, if $M = L (Γ)$ and $Γ$ has an infinite normal amenable subgroup. Thus free group factors do not have diffuse regular hyperfinite subalgebras.
(3) [Voi96] $M$ has property $Γ$ : there is a sequence of unitaries $u_{n} \in M$ , so that $τ (u_{n}) \to 0$ but $∥ u_{n} x - x u_{n} ∥_{2} \to 0$ for all $x \in M$ . Free group factors are non- $Γ$ by a classical result of Murray and von Neumann.
(4) [Ge98] $M \sim = M_{1} \otimes M_{2}$ with $M_{1}$ and $M_{2}$ infinite-dimensional. Thus free group factors are prime.

In particular, note that

M ≁ = L (F_{n}) * N

for any

N

which be embedded into the ultrapower of the hyperfinite II

_{1}

factor (e.g.,

N = C

is already interesting).

There are other conditions assuring upper bounds on

δ_{0}

; we mention the work of K. Dykema [Dyk97] , M. Stefan [Ste99] and of Ge and Shen [GS00] . Upper estimates on

δ_{0}

turned out to be of relevance also to the theory of type III factors [Shl00, Shl03b] .

3.4.4 Lower bounds on $δ_{0}$ .

K. Jung has proved the following “hyperfinite monotonicity result” [Jun03] : let

M

be a diffuse von Neumann algebra, and assume that

M

is embeddable in the ultrapower of the hyperfinite II

_{1}

factor. Then

δ_{0} (x_{1}, \dots, x_{n}) \geq 1

for any generators

x_{1}, \dots, x_{n}

Combined with the upper estimates, this shows that if

M

satisfies any of the properties (1)–(4) above and is embeddable into the ultrapower of the hyperfinite II

_{1}

factor, then the value of

δ_{0}

1

on any set of generators. In particular,

δ_{0}

is an invariant of the entire von Neumann algebra!

Jung has also computed

δ_{0}

for arbitrary generators of a hyperfinite algebra [Jun03] (which is in general a direct sum of matrix algebras and a diffuse hyperfinite von Neumann algebra) and once again found that

δ_{0}

is an invariant of the von Neumann algebra in that case.

3.5 Relation with $L^{2}$ -Betti numbers.

By [CS] , for any generators

(x_{1}, \dots, x_{n})

of a tracial algebra

(A, τ)

one has the inequality relating

δ_{0}

to the

L^{2}

-Betti numbers of

A

δ_{0} (A) = δ_{0} (x_{1}, \dots, x_{n}) \leq β_{1}^{(2)} (A, τ) - β_{0}^{(2)} (A, τ) + 1 .

In particular, specializing to the case of the group algebra of a discrete group

Γ

, we have that

δ_{0} (Γ) \leq b_{1}^{(2)} (Γ) - b_{0}^{(2)} (Γ) + 1,

where

b_{j}^{(2)}

are the

L^{2}

-Betti numbers of the group.

The same combination of Betti numbers also occurs in Gaboriau’s work on cost of equivalence relations [Gab00, Gab02] ; indeed he proves that

b_{1}^{(2)} (Γ) - b_{0}^{(2)} (Γ) + 1 \leq C (Γ),

where

C (Γ)

is the cost of

Γ

. There are no known examples in which equality does not hold.

It is curious that

C (Γ)

measures the “optimal number of generators” for an equivalence relation induced by

Γ

; on the other hand,

δ_{0} (x_{1}, \dots, x_{n})

is known to be

\leq 1

in many cases in which the von Neumann algebra is “singly generated” [GP98] .

One obstruction for the equality between

δ_{0} (Γ)

and

b_{1}^{(2)} (Γ) - b_{0}^{(2)} (Γ) + 1

is the fact that the latter quantity is insensitive to the outcome of Connes’ embedding question (if there is an non-embeddable group, one can manufacture a non-embeddable group with large Betti numbers by taking free products).

It is also possible to define a “relative” version of Voiculescu’s free entropy dimension for equivalence relations; one can obtain an invariant of an equivalence relation in this way (see [Shl01, Shl03a] .

4 Non-microstates Approach to Free Entropy.

We have reviewed the microstates definition of free entropy in the previous lecture. There are several difficulties connected with that definition. The first is that the involvement of sets of microstates makes the definition hard to work with technically; as we saw there are several properties of free entropy (such as additivity for free families) that one expects to hold, but which one is unable to prove because of such technical difficulties. Another example of such acute difficulties arises when one deals with free Fisher information. By analogy with the classical case, one wants to define the free Fisher information

Φ (x_{1}, \dots, x_{n})

by the formula

Φ (x_{1}, \dots, x_{n}) = 2 \frac{d}{d ɛ} χ (x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} s_{n}) |_{ɛ = 0},

where

s_{1}, \dots, s_{n}

are free semicircular variables, free from

(x_{1}, \dots, x_{n})

. The definition works fine in the case that

n = 1

(the explicit formula for

χ

is essential), but it is not clear how to prove that the derivative exists and that the definition makes sense in the case

n > 1

The other point is that the definition of the microstates free entropy subsumes existence of microstates, i.e., embedability into the ultrapower of the hyperfinite II

_{1}

factor. A priori, it is not clear why one should assume this for elements of an arbitrary non-commutative tracial probability space (although of course if Connes’ embedability question always has an affirmative answer, this second point disappears).

Voiculescu [Voi98a] gave a new definition of free entropy, based on an “infinitesimal” approach involving free Fisher information.

This new approach does not involve microstates and for this reason the resulting entropy bears the name “non-microstates” or “microstates-free”. It is not at present known if the two definitions (microstates and non-microstates) are the same, except in the one-variable case; and indeed, showing this would give a positive answer to Connes’ embedability question.

Nonetheless, a recent work by Biane, Capitaine and Guionnet [BCG03] shows that the microstates free entropy is always smaller than the non-microstates entropy.

To distinguish the two definitions, quantities related to the non-microstates entropy are denoted by the same letter as their microstates analogs, but with an asterisk; for example, the non-microstates free entropy is

χ^{*}

, and the corresponding free entropy dimension is

δ^{*}

4.1 A non-rigorous derivation of the non-microstates definition.

We begin with a (rigorous) consequence of the change of variables formula for microstates entropy. We shall assume that

x_{1}, \dots, x_{n}

are in a non-commutative probability space

A

with a tracial positive linear functional

τ

4.1.1 Infinitesimal change of variables.

Let

P_{1}, \dots, P_{n}

be polynomials in

n

indeterminates. Consider the change of variables

x_{j}^{ɛ} = x_{j} + ɛ P_{j} (x_{1}, \dots, x_{n}) .

Then for

ɛ

sufficiently small, this change of variables can be inverted and

x_{j}

can be expressed as a non-commutative power series in terms of

y_{1}^{ɛ}, \dots, y_{n}^{ɛ}

, so that the multi-radius of convergence of that power series exceeds the operator norms of

y_{1}^{ɛ}, \dots, y_{n}^{ɛ}

. Thus one can apply the change of variables formula and express

χ (y_{1}^{ɛ}, \dots, y_{n}^{ɛ})

in terms of the free entropy

χ (x_{1}, \dots, x_{n})

and the logarithm of the Jacobian of our transformation.

Expanding the value of the logarithm of the Jacobian as a power series in

ɛ

gives us the infinitesimal change of variables formula [Voi97] :

χ (y_{1}^{ɛ}, \dots, y_{n}^{ɛ}) = χ (x_{1}, \dots, x_{n}) + ɛ \sum_{j = 1}^{n} τ \otimes τ (\partial_{j} P_{j}) + O (ɛ^{2}) .

4.1.2 Conjugate variables.

Let us now assume that

\partial_{j} : L^{2} (M) \to L^{2} (M \bar{)} \otimes L^{2} (M)

, with

M = W^{*} (x_{1}, \dots, x_{n})

has the property that

1 \otimes 1

is in the domain of

\partial_{j}^{*}

Let

ξ_{j} = \partial_{j}^{*} (1 \otimes 1) \in L^{2} (M)

. The elements

ξ_{1}, \dots, ξ_{n}

are called conjugate variables to

(x_{1}, \dots, x_{n})

and satisfy

〈 ξ_{j}, Q 〉 = 〈 \partial_{j} (Q), 1 \otimes 1 〉 = τ \otimes τ (\partial_{j} (Q)),

for any polynomial

Q \in C [X_{1}, \dots, X_{n}]

Then our infinitesimal change of variables formula becomes:

χ (y_{1}^{ɛ}, \dots, y_{n}^{ɛ}) = χ (x_{1}, \dots, x_{n}) + ɛ \sum_{j = 1}^{n} 〈 P_{j}, ξ_{j} 〉 + O (ɛ^{2}) .

It turns out that conjugate variables are intimately connected with free Brownian motion. If we let

x_{j}^{ɛ} = x_{j} + \sqrt{ɛ} s_{j},

where

s_{j}

are a free semicircular family, free from

x_{1}, \dots, x_{n}

, then for any polynomial

Q

n

indeterminates one can prove that

τ (Q (x_{1}^{ɛ}, \dots, x_{n}^{ɛ})) = τ (Q (x_{1} + \frac{ɛ}{2} ξ_{1}, \dots, x_{n} + \frac{ɛ}{2} ξ_{n})) + O (ɛ^{2}) .

Thus perturbations by conjugate variables give an “approximation” in law to free Brownian motion; note, however, that while

x_{j}^{ɛ}

no longer lies in

W^{*} (x_{1}, \dots, x_{n})

x_{j} + \frac{ɛ}{2} ξ_{j}

does lie in

L^{2} (W^{*} (x_{1}, \dots, x_{n}))

Conjugate variables frequently exist. For example, if

x_{1}, \dots, x_{n}

are a free semicircular family, then

ξ_{j}

exist and in fact

ξ_{j} = s_{j}

j = 1, \dots, n

. One can show that for any

x_{1}, \dots, x_{n}

and any

ɛ > 0

, conjugate variables to the family

(x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} s_{n})

always exist. In fact, in this case

ξ_{j} = E_{W^{*} (x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} s_{n})} (\frac{1}{\sqrt{ɛ}} s_{j}) .

4.1.3 Non-rigorous derivation of the formula for $Φ (x_{1}, \dots, x_{n})$ .

Assume now that

(ξ_{1}, \dots, ξ_{n})

are conjugate variables to

(x_{1}, \dots, x_{n})

. Let

(s_{1}, \dots, s_{n})

be as before a free semicircular system, free from

(x_{1}, \dots, x_{n})

Recall that we want to define the free Fisher information by

Φ (x_{1}, \dots, x_{n}) = 2 \frac{d}{d ɛ} χ (x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} x_{n}) .

Since

χ (x_{1}, \dots, x_{n})

depends only on the law of

x_{1}, \dots, x_{n}

, and since the laws of

(x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} x_{n})

and

(x_{1} + \frac{ɛ}{2} ξ_{1}, \dots, x_{n} + \frac{ɛ}{2} ξ_{n})

are the same up to higher orders in

ɛ

, one would expect that

Φ (x_{1}, \dots, x_{n}) = 2 \frac{d}{d ɛ} χ (x_{1} + \frac{ɛ}{2} ξ_{1}, \dots, x_{n} + \frac{ɛ}{2} ξ_{n}) .

We now assume that

ξ_{j}

are sufficiently nice functions of

x_{1}, \dots, x_{n}

so that the infinitesimal change of variables applies.

Thus

\begin{matrix} χ (x_{1} + \frac{ɛ}{2}, \dots, x_{n} + \frac{ɛ}{2}) & = & χ (x_{1}, \dots, x_{n}) + ɛ \sum_{j = 1}^{n} 〈 \frac{1}{2} ξ_{j}, ξ_{j} 〉 + O (ɛ^{2}) \end{matrix}

\begin{matrix} = & χ (x_{1}, \dots, x_{n}) + \frac{ɛ}{2} \sum_{j = 1}^{n} ∥ ξ_{j} ∥_{L^{2} (M)}^{2} + O (ɛ)^{2} . \end{matrix}

Summarizing, we then expect that

Φ (x_{1}, \dots, x_{n}) = \sum_{j = 1}^{n} ∥ ξ_{j} ∥_{L^{2} (M)}^{2} .

4.1.4 Definition of $Φ^{*} (x_{1}, \dots, x_{n})$ .

This leads us to take the non-rigorous formula for

Φ

as a definition of the non-microstates free Fisher information:

Definition 4.1. [Voi98a] Let

(x_{1}, \dots, x_{n})

be a family of non-commutative random variables in

(A, τ)

. If conjugate variables

(ξ_{1}, \dots, ξ_{n})

to this family exist, then we set

Φ^{*} (x_{1}, \dots, x_{n}) = \sum_{j = 1}^{n} ∥ ξ_{j} ∥_{L^{2} (M)}^{2}, M = W^{*} (x_{1}, \dots, x_{n}) .

If the conjugate variables do not exist, we set

Φ^{*} (x_{1}, \dots, x_{n}) = + \infty

Note that this definition does not involve microstates.

In the case of a single variable,

ξ_{1}

ends up being the restriction of the Hilbert transform of the distribution of

x_{1}

to the support of this distribution. One can then compute that if

μ_{x}

is Lebesgue absolutely-continuous, and

d μ_{x} (t) = p (t) d t

, then

Φ^{*} (x) = Φ (x) = \frac{2}{3} \int p (t)^{3} d t .

4.1.5 Definition of $χ^{*}$ .

Since

Φ^{*}

was supposed to be proportional to the derivative of free entropy one can recover free entropy from the free Fisher information. The formula is

χ^{*} (x_{1}, \dots, x_{n}) = \frac{1}{2} \int_{0}^{\infty} (\frac{n}{1 + t} - Φ^{*} (x_{1}^{t}, \dots, x_{n}^{t})) d t + n log 2 π e;

here as before

x_{j}^{t} = x_{j} + \sqrt{t} s_{j}

, and

(s_{1}, \dots, s_{n})

is a free semicircular family, free from

(x_{1}, \dots, x_{n})

Voiculescu proved that the function

t \mapsto Φ^{*} (x_{1}^{t}, \dots, x_{n}^{t})

is monotone decreasing and right semi-continuous in the sense that

{lim}_{s \to t^{+}} Φ^{*} (x_{1}^{s}, \dots, x_{n}^{s}) = Φ^{*} (x_{1}^{t}, \dots, x_{n}^{t}) .

It is an important open question if this function is always continuous.

Furthermore, if

n = \sum τ (x_{j}^{2})

, then

\frac{n}{1 + t} \leq Φ^{*} (x_{1}^{t}, \dots, x_{n}^{t}) \leq \frac{n}{t},

which implies that the integral defining

χ^{*}

makes sense and converges to a value in

[- \infty, + \infty)

4.2 Properties of $χ^{*}$ .

As we mentioned in the foreword to this section, the principal outstanding question in the theory of free entropy is the question of when

χ = χ^{*}

. To this end there are two results:

$\circ$ [Voi98a] In the single-variable case, the two quantities are equal: $χ (x_{1}) = χ^{*} (x_{1})$ ;
$\circ$ [BCG03] In general, the following inequality is satisfied: $χ (x_{1}, \dots, x_{n}) \leq χ^{*} (x_{1}, \dots, x_{n}) .$

The non-microstates definition turns out to be easier to work with in some respects, but harder in others. One of the big difficulties in the non-microstates framework is one’s inability to prove the change of variables formula. This difficulty is related to our inability to handle the continuity properties of the “non-commutative Hilbert transform”,

(x_{1}, \dots, x_{n}) \mapsto (ξ_{1}, \dots, ξ_{n})

, where

(ξ_{1}, \dots, ξ_{n})

are the conjugate variables to

(x_{1}, \dots, x_{n})

Nonetheless,

χ^{*}

has a lot of nice properties, for example: (all of these are from [Voi98a] )

$\circ$ $χ^{*} (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{m}) = χ^{*} (x_{1}, \dots, x_{n}) + χ^{*} (y_{1}, \dots, y_{m})$ if the families $(x_{1}, \dots, x_{n})$ and $(y_{1}, \dots, y_{m})$ are free;
$\circ$ $χ^{*} (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{m}) \leq χ^{*} (x_{1}, \dots, x_{n}) + χ^{*} (y_{1}, \dots, y_{m})$ ;
$\circ$ $χ^{*} (x_{1}, \dots, x_{n})$ is maximal subject to $\sum τ (x_{i}^{2}) = n$ if and only if $x_{1}, \dots, x_{n}$ are free semicircular variables, and $τ (x_{1}^{2}) = \dots = τ (x_{n}^{2}) = 1$ .
$\circ$ If $s_{1}, \dots, s_{n}$ are free semicircular variables, free from the family $x_{1}, \dots, x_{n}$ , then for any $ɛ > 0$ , $χ^{*} (x_{1} + \sqrt{ɛ} s_{1}, \dots, x_{n} + \sqrt{ɛ} s_{n}) > - \infty$ .

Comparing the last property of

χ^{*}

with the corresponding property of

χ

explains why

χ = χ^{*}

would imply a positive answer to Connes’ embedability question.

4.3 Non-microstates free entropy dimension.

Although we don’t know how to formulate the packing number definition of free entropy dimension in the non-microstates approach, the Minkowski dimension definition does have a straightforward analog. We set

δ^{*} (x_{1}, \dots, x_{n}) = n - {liminf}_{ɛ \to 0} \frac{χ^{*} (x_{1}^{ɛ}, \dots, x_{n}^{ɛ})}{log ɛ^{1 / 2}},

where as before

x_{j}^{ɛ} = x_{j} + \sqrt{ɛ} s_{j}

, and

s_{1}, \dots, s_{n}

is a free semicircular family, free from the family

x_{1}, \dots, x_{n}

It is tempting to formally apply L’Hopital’s rule in the definition of

δ^{*}

and use the fact that

\frac{d}{d ɛ} χ^{*} (x_{1}^{ɛ}, \dots, x_{n}^{ɛ}) = Φ^{*} (x_{1}, \dots, x_{n})

. Thus we write

δ^{◆} = n - {liminf}_{ɛ \to 0} ɛ Φ^{*} (x_{1}^{ɛ}, \dots, x_{n}^{ɛ}) .

One can easily show that

δ^{◆} (x_{1}, \dots, x_{n}) \geq δ^{*} (x_{1}, \dots, x_{n}),

with no examples in which equality does not hold.

There are unfortunately preciously few computations of

δ^{*}

δ^{◆}

, and much less is known about their properties than about the properties of

δ

. In particular, it is not known in general if

δ^{◆}

δ^{*}

depend only on the algebra generated by

x_{1}, \dots, x_{n}

, taken with its trace.

We summarize what is known below:

$\circ$ $δ^{*} (x_{1}) = δ^{◆} (x_{1}) = δ_{0} (x_{1}) = 1 - \sum_{t} μ_{x_{1}} ({t})^{2}$ , where $μ_{x}$ is the law of $x_{1}$ ;
$\circ$ $δ^{*} (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{m}) = δ^{*} (x_{1}, \dots, x_{n}) + δ^{*} (y_{1}, \dots, y_{m})$ if $(x_{1}, \dots, x_{n})$ are free from $(y_{1}, \dots, y_{m})$ ; the same is true for $δ^{◆}$ ;
$\circ$ [CS] If $x_{1}, \dots, x_{n}$ are generators of a tracial algebra $(A, τ)$ , then $δ^{*} (x_{1}, \dots, x_{n}) \leq δ^{◆} (x_{1}, \dots, x_{n}) \leq β_{1}^{(2)} (A, τ) - β_{0}^{(2)} (A, τ) + 1,$ where $β_{j}^{(2)} (A, τ)$ are the $L^{2}$ -Betti numbers of $(A, τ)$ .
$\circ$ [MS] If $x_{1}, \dots, x_{n} \in C Γ$ are self-adjoint and generate the group algebra of a discrete group $Γ$ , then equality holds:
$δ^{◆} (x_{1}, \dots, x_{n}) = δ^{*} (x_{1}, \dots, x_{n}) = b_{1}^{(2)} (Γ) - b_{0}^{(2)} (Γ) + 1$ , where $b_{j}^{(2)} (Γ) = β_{j}^{(2)} (C Γ)$ are the $L^{2}$ -Betti numbers of $Γ$ . In particular, in this case $δ^{*} = δ^{◆}$ are algebraic invariants;
$\circ$ [CS] If $W^{*} (x_{1}, \dots, x_{n})$ has diffuse center, then $δ^{◆} (x_{1}, \dots, x_{n}) \leq 1$ .

References

P. Biane, M. Capitaine, and A. Guionnet, Large deviation bounds for matrix Brownian motion, Invent. Math. 152 (2003), no. 2, 433–459.
A. Connes, Classification of injective factors. Cases $I I_{1},$ $I I_{\infty},$ $I I I_{λ},$ $λ \neq 1$ , Ann. of Math. (2) 104 (1976), no. 1, 73–115.
A. Connes and D. Shlyakhtenko, $L^{2}$ -homology for von Neumann algberas, Preprint math.OA/0309343, to appear in J. Reine Angew. Math.
K. Dykema and F. Radulescu, Compressions of free products of von Neumann algebras, Math. Ann. 316 (2000), no. 1, 61–82.
K. Dykema, Free products of hyperfinite von Neumann algebras and free dimension, Duke Math J. 69 (1993), 97–119.
K. Dykema, On certain free product factors via an extended matrix model, J. Funct. Anal 112 (1993), 31–60.
K. Dykema, Interpolated free group factors, Pacific J. Math. 163 (1994), 123–135.
K. Dykema, Amalgamated free products of multi-matrix algebras and a construction of subfactors of a free group factor, Amer. J. Math. 117 (1995), no. 6, 1555–1602.
Kenneth J. Dykema, Two applications of free entropy, Math. Ann. 308 (1997), no. 3, 547–558.
D. Gaboriau, Cout des relations d’equivalence et des groupes, Invent. Math. 139 (2000), no. 1, 41–98.
D. Gaboriau, Invariants $ℓ^{2}$ de relations d’equivalence et de groupes, Publ. Math. Inst. Hautes Etudes Sci. 95 (2002), 93–150.
L. Ge, Applications of free entropy to finite von Neumann algebras. II, Ann. of Math. (2) 147 (1998), no. 1, 143–157.
Liming Ge and Sorin Popa, On some decomposition properties for factors of type $I I_{1}$ , Duke Math. J. 94 (1998), no. 1, 79–101.
L. Ge and J. Shen, Free entropy and property $T$ factors, PNAS 97 (2000), 9881–9885.
Uffe Haagerup, On Voiculescu’s $R$ and $S$ -transforms for free non-commuting random variables, Free probability theory (Waterloo, ON, 1995), Fields Inst. Commun., vol. 12, Amer. Math. Soc., Providence, RI, 1997, pp. 127–148.
K. Jung, A free entropy dimension lemma, Preprint math.OA/0207149, 2002.
Kenley Jung, The free entropy dimension of hyperfinite von Neumann algebras, Trans. Amer. Math. Soc. 355 (2003), no. 12, 5053–5089 (electronic).
I. Mineyev and D. Shlyakhtenko, Non-microstates free entropy dimension for groups, Preprint, math.OA/0312242, to appear in GAFA.
S. Popa and D. Shlyakhtenko, Universal properties of $L (F_{\infty})$ in subfactor theory, MSRI preprint 2000-032, to appear in Acta Math., 2003.
F. Radulescu, A one parameter group of automorphisms of $L (F_{\infty}) \otimes B (ℋ)$ scaling the trace, C.R. Acad. Sci. Paris 314 (1992), no. 1, 1027–1032.
F. Radulescu, Random matrices, amalgamated free products and subfactors of the von Neumann algebra of a free group, of noninteger index, Invent. math. 115 (1994), 347–389.
D. Shlyakhtenko, Some applications of freeness with amalgamation, J. reine angew. Math. 500 (1998), 191–212.
D. Shlyakhtenko, $A$ -valued semicircular systems, J. Func. Anal 166 (1999), 1–47.
D. Shlyakhtenko, Prime type III factors, Proc. Natl. Acad. Sci. USA 97 (2000), 12439–12441.
D. Shlyakhtenko, Free Fisher information with respect to a completely positive map and cost of equivalence relations, Comm. Math. Phys. 218 (2001), no. 1, 133–152.
D. Shlyakhtenko, Microstates free entropy and cost of equivalence relations, Duke Math. J. 118 (2003), 375–425.
D. Shlyakhtenko, On the classification of full factors of type III, Preprint math.OA/0201007, to appear in Trans. AMS, 2003.
R. Speicher, Combinatorial theory of the free product with amalgamation and operator-valued free probability theory, Mem. Amer. Math. Soc. 132 (1998), x+88.
M. Stefan, Idecomposibility of free group factors over nonprime subfactors and abelian subalgebras, Preprint, 1999.
D. Shlyakhtenko and Y. Ueda, Irreducible subfactors of $L (F_{\infty})$ of index $λ > 4$ , J. reine angew. Math 548 (2002), 149–166.
D.-V. Voiculescu, K. Dykema, and A. Nica, Free random variables, CRM monograph series, vol. 1, American Mathematical Society, 1992.
D.-V. Voiculescu, Symmetries of some reduced free product $C^{*}$ -algebras, Operator Algebras and Their Connections with Topology and Ergodic Theory, Lecture Notes in Mathematics, vol. 1132, Springer Verlag, 1985, pp. 556–588.
D.-V. Voiculescu, Circular and semicircular systems and free product factors, Operator Algebras, Unitary Representations, Enveloping Algebras, and Invariant Theory, Progress in Mathematics, vol. 92, Birkhauser, Boston, 1990, pp. 45–60.
D.-V. Voiculescu, Limit laws for random matrices and free products, Invent. math 104 (1991), 201–220.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory I, Commun. Math. Phys. 155 (1993), 71–92.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory II, Invent. Math. 118 (1994), 411–440.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, III, Geometric and Functional Analysis 6 (1996), 172–199.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, IV: Maximum entropy and freeness, Free Probability (D.-V. Voiculescu, ed.), American Mathematical Society, 1997, pp. 293–302.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probabilility, V, Invent. Math. 132 (1998), 189–227.
D.-V. Voiculescu, A strengthened asymptotic freeness result for random matrices with applications to free entropy, IMRN 1 (1998), 41 – 64.
D.-V. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability, VI, Adv. Math. 146 (1999), no. 2, 101–166.
D.-V. Voiculescu, Free entropy dimension $\leq 1$ for some generators of property $T$ factors of type $I I_{1}$ , J. reine Angew. Math. 514 (1999), 113–118.
Dan Voiculescu, Lectures on free probability theory, Lectures on probability theory and statistics (Saint-Flour, 1998), Lecture Notes in Math., vol. 1738, Springer, Berlin, 2000, pp. 279–349.
D.-V. Voiculescu, Free entropy, Bull. London Math. Soc. 34 (2002), no. 3, 257–278.
E.P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Annals of Math. 62 (1955), 548–564.

Department of Mathematics, UCLA, Los Angeles, CA 90095, USA E-mail address: shlyakht@math.ucla.edu

Dimitri Shlyakhtenko.