Such processes arise in several different areas. (a) The Markov chain tree theorem associates to any finite-state Markov chain an accompanying spanning-tree-valued chain (cf. Russ Lyons' talk). (b) Percolation-type processes on an infinite regular tree can be viewed as forest-valued processes (cf. Olle Haggstrom's talk). (c) There is scientific literature on coagulation-fragmentation models, in which N atoms are arranged in clusters, with specified rates for two clusters to merge into one cluster, or one cluster to split into two clusters. This typically leads to an irreversible Markov process. An alternative approach (cf. Whittle's book "Systems in Stochastic Equilibrium") is to first specify a stationary distribution via a Hamiltonian and then invent a reversible chain with that stationary distribution.
The talk will briefly survey (c), and then describe a new process T(t) which relates to all these areas. T(t) is a stationary tree-valued Markov process whose stationary distribution is PGW(1), the Galton-Watson tree with Poisson(1) offspring. It evolves by attachment to each vertex at rate 1 a copy of a PGW(1) tree; when a branch becomes infinite, it is cut down. The time-reversal also has a simple description: each edge is cut at rate 1, and at rate 1 new branches arrive with the distribution PGW^*(1), which is PGW(1) conditioned to be infinite.
One way T(t) arises is as a N \to infinity limit in (a) where the underlying chain is i.i.d. sampling from {1,...,N}. It relates to one aspect of (c), the "post-gelation solution of the Smoluchowski coagulation equation", which is roughly the classical random graph process (cf. Boris Pittel's talk) in which small components are forbidden to merge with large components.
A broad conceptual theme is that the idea of a mean-field model can often be implemented as either a "random graph" model or an "infinite tree" model. Parallel to T(t) is a forest-valued process F(t) inside a regular infinite tree. Curiously, it is easy to do calculations with F(t) but hard to prove rigorously that it exists!
Some relevant papers, in particular an extensive survey of coalescence-type processes (c), can be found via the author's homepage http://www.stat.berkeley.edu/users/aldous
\author{Richard Arratia \thanks{ Richard Arratia is professor of mathematics at the University of Southern California. His e-mail address is {\tt rarratia@math.usc.edu} \vskip .5pc \newline Work supported in part by NSF grants DMS 90-05833 and DMS 96-26412.} }
\date{January 31, 1997}
Abstract: The scale invariant Poisson processes on $(0,\infty)$ play a central role in number theory, combinatorics, and genetics. They give the continuous limits which underly and unify diverse discrete structures, including the prime factorization of a uniformly chosen integer, the factorization of polynomials over finite fields, the decomposition into cycles of random permutations, the decomposition into components of random mappings, and the Ewens sampling formula. They deserve attention as one of the fundamental objects of probability theory.
Scale invariance, say of a random set ${\cal X} \subset \BR$, is the property that for all $c>0$, $c {\cal X} \indist {\cal X}$. For random subsets of $(0,\infty)$, we can take logarithms to convert from scale invariance to translation invariance. Since the only translation invariant positive measures on $\BR$ are multiples $\theta \ dx$ of Lebesgue measure, with $\theta >0$, this gives a complete classification of the scale invariant Poisson processes on $(0,\infty)$. The net result is that scale invariant Poisson processes on $(0,\infty)$ have an intensity of the form $\theta \ dx \, /x$ for some $\theta >0$.
The first subtle thing to realize about these scale invariant process{\em es} is that the plural is essential! That is, for translation invariant Poisson processes on $\BR$ with intensity $\theta \ dx$, there is no need to study the effect of $\theta$, since all the processes are related simply by scaling. But no such simple relation holds for the scale invariant processes: the map which constructs say the general $\theta$ process from the special case $\theta=1$ is applying the $1/\theta$ power to the location of each point. This is not a simple mapping. In particular, there are qualitative phase transitions at $\theta =1$ and at $\theta = 1 / \log 2$.
For any $\theta >0$, start with the ordinary, translation invariant Poisson process on $\BR$ having intensity $\theta \ dx$ and label the points as $L_i, \ i \in \BZ$, with the natural order $L_i < L_{i+1}$. The scale-invariant process with intensity $\theta \ dx \, /x$ can be realized via $X_i := \exp(- L_i)$, which labels the points so that $X_i > X_{i+1}$. The $i^{th}$ spacing of the scale invariant Poisson process is thus $Y_i := X_i-X_{i+1}$. Theorem: for each $\theta > 0$, $\{ Y_i \} \indist \{ X_i \}$, i.e. the set of spacings, as a point process, has the same distribution as the original process.
Let $T$ be the sum of the locations of all points of the scale invariant Poisson process restricted to $(0,1)$. Theorem: the Poisson-Dirichlet process is this restricted process, conditional on $T=1$. [Barbour, Tavar\'e]
The random variable $T$ has a density $g$ which for $\theta=1$ is $g(u) = e^{-\gamma} \rho(u)$, where $\rho$ is Dickman's function, satisfying $\rho(u)=1 $ for $0\leq u \leq 1$, and $\rho'(u)=-\rho(u-1)/u$ for $u>1$. For $0 \leq a < b <\infty$ let $T_{a,b}$ be the sum of the locations of all points of the scale invariant Poisson process restricted to $(a,b)$. Buchstab's function $\omega$, which satisfies $(u \omega(u))'=\omega(u-1)$ for $u>2$ and $u \omega(u)=1$ for $1 \leq u \leq 2$, can be described as the density of the continuous part of $T_{1/u, 1}$ evaluated at 1 in the case $\theta=1$.
Consider, for any $\theta >0$, for $0\leq \beta \leq 1$, the total variation distance $H(\beta)$ between the restrictions to $[0,\beta]$ of the Poisson-Dirichlet process and the scale invariant Poisson process. We have $H(0)=0, H(1)=1$, and for $0<\beta<1$, $H(\beta) = d_{TV}( T_{0, \beta}, ( T_{0,\beta}|T=1) ).$ This can be expressed as an integral involving the densities of $T$ and $T_{\beta,1}$, [Tavar\'e] which in the special case $\theta=1$ reduces to \begin{eqnarray*} 2H(\beta) & = & e^\gamma \e | \omega(u-T)-e^{-\gamma}| \ + \rho(u) \\ & = & \int_{t>0} |\omega(u-t)-e^{-\gamma}| \rho(t) \ dt \ \ + \rho(u). \end{eqnarray*} This $H(\beta)$ gives the limit as $n \ra \infty$, of total variation distance, between the dependent system and its independent process limit, observing components of relative size at most $\beta$, for discrete systems, including the cycle decomposition of a random permutation of $n$ objects [Tavar\'e, Stark], the component decomposition of a random mapping [Stark], and the prime factorization of a random integer chosen uniformly from 1 to $n$ [Stark].
Consider the random set $A \equiv A(\theta)$, defined as the closure of the set of sums of locations of (some of) points of the scale invariant Poisson process. Trivially, $A$ is scale invariant, i.e. for any $c>0, \ cA \indist A$. Quite easily, with respect to Minkowski sums, the process $(A(\theta))_{\theta>0}$ has stationary, independent increments. The set $A$ for $\theta=1$ is the limit in distribution, using the Hausdorff metric, of spatial rescaling of the set of values $\log d$ for $d$ dividing an integer chosen uniformly from 1 to $n$. For general $\theta \neq 1$, the analogous statement holds, after conditioning on the large deviation that the random integer have $\theta \log\log n $ distinct prime divisors. It is easy to prove that for all $\theta, \p(1 \in A)<1$, and for $\theta \log 2 \leq 1, \p(1 \in A)=0$. Conversely, for $\theta \log 2 >1$, $\p(1 \in A) >0$, although we still do not have a simple probabilistic proof of this. [Tenenbaum]
Just when we think things have settled down, practitioners evolve interesting ideas and spectacular claims. I will review four of these.
1) Hybrid and Nonreversible Algorithms. The algorithm of Duane, et.al. (Phys. Lett B A5, 216-222) uses Hamiltonian mechanics to drive a sampler. In joint work with Susan Holmes and Radford Neal we have abstracted the idea and proved some things.
2) Deamon Algorithms. These use an army of Maxwell's Deamons to control a random walk. In joint work with Thomas Yan some rigorous results have emerged. See Bhanot, et.al. Nucl. Phys. B235, 417-434.
3) Crossover Algorithms. People in computational group theory (Celler et.al. Comm. Alg. 23, 4931-4948) report fantastic speedups by random mating of vectors of a population. With Laurant Saloff-Coste we have managed to prove a bit and suggest extensions far from group theory.
4) Beyond Groebner Bases. In joint work with Sturmfels (See e.g. chapters 5, 6 of B. Sturmfels, Groebner Bases and Convex Polytopes (1996)) computational algebra was used to build Markov chains. In recent implementations (with David Eisenbud, Susan Holmes and Asya Rabinowitch) new ideas have been introduced that speed up (or make feasible) large scale problems.
A self-organizing data structure dynamically maintains a file of records in easily retrievable order while using up little additional memory space. Recent analyses of simple Markov chain models have produced new information about such systems, including exact and asymptotic expressions, with rates of convergence to stationarity, for the distribution of search cost and for transition probabilities for the structure as a whole. My microsurvey will focus on some of the following: connections with card shuffling, list and tree structures, various reorganization rules, Markov dependent request sequences, and multi-record request schemes.
There has recently been a resurgence of interest in the {\it shortest common superstring} problem due to its important applications in molecular biology (e.g., recombination of DNA) and data compression. The problem is NP-hard, but it has been known for some time that greedy algorithms work well for this problem. More precisely, it was proved in a a recent sequence of papers that in the worst case a greedy algorithm produces a superstring that is at most $\beta$ times ($2 \leq \beta \leq 4$) worse than optimal. We analyze the problem in a probabilistic framework, and consider the optimal total overlap $O_n^{{\rm opt}}$ and the overlap $O_n^{{\rm gr}}$ produced by various greedy algorithms. These turn out to be asymptotically equivalent. We show that with high probability $\lim_{n \to \infty} {O_n^{{\rm opt}} \over n\log n} = \lim_{n \to \infty} {O_n^{{\rm gr}} \over n\log n} = {1 \over H}$ where $n$ is the number of original strings, and $H$ is the entropy of the underlying alphabet. Our results hold under a condition that the lengths of all strings are not too short. This is joint work with Wojciech Szpankowski.
We use decoupling methods due to Kwapie\'n and Woyczy\'nski (1992) to derive an analog of the Hoeffding-Azuma exponential bound when the martingale differences are not uniformly bounded. Related inequalities due to Pinelis and de la Peña will also be presented. Applications of the method to concentration of measure questions for
Dynamical percolation provides a way of introducing time dynamics in static percolation models such as the standard bond percolation setup. Any event which has probability 1 in the static model will almost surely happen at Lebesgue-almost every time $t$ in the dynamical counterpart. The main questions studied are therfore of the type "can there be exceptional times?". Interestingly, the answer is often yes if the static model is taken to be critical. I intend to survey what has been done in dynamical percolation since the subject entered the probability arena two years ago, and to state the main open problems.
Diaconis, Holmes, and Neal have proposed a variation of the Metropolis algorithm. Like the usual Metropolis algorithm, this variation is a Markov chain which converges to a stationary probability distribution, and ratios of this stationary probability are used in the transition probabilities. This variation involves the duplication of the set of states. The transition probabilities will depend on which duplicate the Markov chain is at just before the transition. For some probabilities and choices of a parameter which controls probabilities of going between the duplicates of the set of states, this variation converges to stationary faster than the corresponding ordinary Metropolis algorithm. This talk will outline proofs of how fast the variation converges to the uniform distribution in some of these circumstances.
A typical application of probability to algorithms is to sample a randomly chosen subset of the data and make an inference about all the data with high probability.
We discuss our recent algorithm of this flavour to compute approximations to a given matrix : given an m X n matrix A with entries between say -1 and 1, and an error parameter a between 0 and 1, we find a matrix D which is the sum of at most O(1/a^2) simple rank 1 matrices so that the sum of entries of any submatrix (among the 2^{m+n}) of (A-D) is at most (amn) in absolute value. Our algorithm takes time dependent only on a and the allowed probability of failure (not on m,n).
We draw on two lines of research to develop the algorithm : one is >from the papers of Arora, Karger and Karpinsky, Fernandez de la Vega and most directly Goldwasser, Goldreich and Ron who develop algorithms of this flavour for a set of graph problems, typical of which is the maximum cut problem. The second one is built around the fundamental Regularity Lemma of Szemeredi in Graph Theory.
>From the matrix approximation, the graph algorithms and the Regularity Lemma and several other results follow in a uniform way.
We also generalize our approximations to multi-dimensional arrays and >from that derive certain other algorithms.
Our talk concerns the Smoluchowski coagulation-fragmentation equation, which is an infinite set of non-linear ordinary differential equations describing the evolution of a mono-disperse system of particles in a well stirred solution.
Using a stochastic approximation, more precisely, by approximating the solutions of the Smoluchowski equations by a sequence of finite Markov chains defined on a compact subset of $l_2$ space, we investigate the qualitative behavior of the solutions.
Our talk includes the following:
Consider the Smoluchowski coagulation equation
$$ \dot{C_{t}(j)}={1\over 2}\sum_{k=1}^{j-1}K(j-k,k)C_{t}(j-k) C_t(k)-\sum_{k=1}^{\infty}K(j,k)C_t(j)C_t(k), $$
$j=1,2,3,\cdots$, where $C_t(j)\geq 0$ is the expected number of $j-$clusters ( a cluster consisting of $j-$particles) per unit volume, and $K$ is a nonnegative symmetric function which represents the coagulation rate of $i$ and $j-$clusters. Suppose $\epsilon (ij)^{\alpha} \leq K(i,j)\leq M(ij)^{\beta}$, where ${1\over 2}< \alpha\leq \beta<1,$ and $\epsilon,M >0$. Then there exists a solution $C_t$ and $0\leq t_0<\infty$ such that $C_0(i)=\rho\delta_{1i}$ and $\sum_{i=1} ^{\infty}iC_t(i)<\sum_{i=1}^{\infty}iC_0(i)=\rho$, for all $t> t_0,$ i.e., gelation phenomenon (or density dropping phenomenon, or emergence of giant cluster) occurs in a finite time.\\
We discuss some results and some open problems of the following type: Let G be an infinite, connected, locally finite graph, and let $\xi$ and $\eta$ be maps from V to {0,1,...,k-1}, where V is the vertex set of G (such maps are called k-colorings or sceneries). Let {S_n} be a random walk on G. Can we reconstruct $\xi$ if we observe the sequence {\xi(S_n)} ? If $\xi$ and $\eta$ are known and we observe either {\xi(S_n)} or {\eta(S_n)}, but we are not told which of these alternatives prevails, can we decide (with zero probability of error) which of the two sequences were observed ?
We will survey some recent results concerning two simple stochastic growth processes on the homogeneous tree of degree >2: (1) the contact process; and (2) branching random walk. We will focus on the ``weak survival'' phase, in which the population grows exponentially but almost surely vacates every finite subset of the tree. We will discuss several questions of natural interest, including (i) the size of the ``limit set'' of the process; (ii) the asymptotic behavior of hitting probabilities; and (iii) the existence of interesting invariant measures.
In randomized algorithms solving a variety of computational tasks (approximate enumeration, volume computation, integration, simulated annealing, generation of contingency tables etc.) the key element is to sample from a given distribution over a known but large and complicated set. The basic method is to construct an ergodic Markov chain with the given stationary distribution, and then run the chain for an appropriately large number of steps to get a state that has (approximately) the desired distribution. Mathematically, the main difficulty is to quantify ``appropriately large'', i.e., to estimate the ``mixing time'' of the chain. We may want to get close to the stationary distribution in different senses, and may have different starting conditions, leading to different notions of mixing time.
It turns out that this variation in the notion of mixing time should not worry us too much. All these variations fall in a small number of classes, where mixing measures in the same class are within absolute constant factors of each other This line of research was initiated by Aldous.
Most of this is joint work with Peter Winkler, some with David Aldous and Andrew Beveridge.
The suject of random spanning forests began in 1991, growing out of random spanning trees, which have been studied (in some sense) since Kirchhoff in 1847. They have intimate connections to random walks and electric networks, as well as to harmonic Dirichlet functions. We will micro-survey the area and describe some highlights.
See http://www.ma.huji.ac.il/~lyons/prbtree.html for a book in progress containing more details and http://ezinfo.ucs.indiana.edu/~rdlyons/ for my homepage.
A new approach to Markov chain Monte Carlo simulation was recently proposed by Propp and Wilson. This approach, unlike traditional ones, yields samples which have exactly the desired distribution. The Propp-Wilson algorithm requires this distribution to have a certain structure called monotonicity. In this talk, it is shown how the algorithm can be extended to the case where monotonicity is replaced by anti-monotonicity. As an illustrating example, simulations of the hard-core model is presented. This is joint work with Olle Haggstrom.
The random graph process on $n$ vertices is the probability space of all the nested sequences of graphs $$ G(n,0)\subset G(n,1)\subset\dots\subset G(n,N), $$ $N=\binom n2$, with vertex set $V=\{1,\dots,n\}$, such that $G(n,M)$ has $m$ edges, and each sample sequence has the same probability, $1/N!$. In particular, the random \lq\lq snapshot\rq\rq $G(n,m)$ is distributed uniformly on the set of all $\binom Nm$ graphs with $m$ edges. Alternatively, it is a Markov chain such that, given $G(n,m-1)$, the next graph $G(n,m)$ is obtained by adding a new edge whose location is chosen at random, uniformly among all $N-(m-1)$ still vacant sites. Erd\"os and R\'enyi undertook a first systematic study of this process back in 1960. Perhaps the most important result of their study was a discovery that, for $n$ large, the likely structure of $G(n,m)$ undergoes an abrupt change (phase transition) when $m$ passes through $n/2$. Namely, with high probability (whp), this a birth time of a giant component, that is a component of size of order $n$. Prior to this moment, the largest component is only of order $\log n$. Since then a lot of research effort has been spent on strengthening the original results and learning more about the likely evolution of the random graph, both before and after the birth of the giant component. The purpose of this talk is to survey some of the classic and more recent results and to stress the combinatorial-probabilistic techniques which were developed to obtain them.
Coupling from the past is an approach to Monte Carlo that, when applicable, can completely eliminate initialization-bias, and sometimes requires only minor modifications to existing algorithms. I will give an overview of the method, describe the conditions under which it can be applied, and present three illustrative cases. For more on schemes that eliminate initialization-bias, see David Wilson's web-page on perfectly random sampling with Markov chains.
Simple Markov chains are often used to study large combinatorial sets. For instance, if we want to sample 3-colorings on some finite three-dimensional lattice region L, we let the state space of the Markov chain be the set of proper three colorings. Then, starting with some 3-coloring, we can iteratively update the configuration by recoloring at most one vertex at a time so as to move to another proper coloring. Closely related algorithms based on local updates are used to generate random configurations for various combinatorial problems on lattices, such as tilings, Eulerian orientations and alternating sign matrices.
In each case, we will argue that the natural Markov chains are ``rapidly mixing'' (which guarantees that we can generate samples in polynomial time). We accomplish this be a two-step process. First, we use a very simple coupling scheme to argue that related Markov chains based on non-local moves are rapidly mixing. Then we use ``decomposition'' and ``clustering'' techniques to bound the mixing rate of the the original Markov chains in terms of the mixing rates of the non-local chains. The decomposition technique, which is work with Neal Madras, relates the mixing rate of a Markov chain to smaller chains derived from a decomposition of the state space. The clustering technique, which is work with Prasad Tetali, allows us to relate the mixing rates of the local and non-local chains. This is based on the comparison method of Diaconis and Saloff-Coste.
Coupling constructions for Poisson approximation using the Chen-Stein method is now a standard technique; a systematic study of related couplings for normal approximation using Stein's method has begun only a few years ago. This small survey of coupling methods for normal approximations includes size-bias couplings, which are natural for nonnegative random variables such as counts, and zero-bias couplings, which may be applied to mean zero variables and is especially useful for variates with vanishing third moments.
``Quadratic dynamical systems'' are a natural class of non-linear stochastic processes that are used to model various phenomena in the natural sciences, such as population genetics and the kinetic theory of ideal gases. Less classically, they also provide an appropriate framework for the study of genetic algorithms for combinatorial optimization. In contrast to linear systems, which are well understood, there is little general theory available for the quantitative analysis of quadratic systems.
In this talk, I will present several fundamental properties of the large class of symmetric quadratic systems acting on populations over a fixed set of types. I will go on to describe an analysis of some particular examples, including crossover systems in population genetics and a natural system defined on populations over the set of matchings in a tree. In particular, it will turn out that convergence to the limit in these systems is very rapid. This demonstrates that such systems, though non-linear, are sometimes amenable to analysis. I will also outline some of the main challenges for future work in this area.
Most of the results mentioned in the talk are joint work with Yuval Rabani, Yuri Rabinovich and Avi Wigderson.
We introduce new isoperimetric constants for Markov chains (and graphs) using the notion of a "discrete gradient." We derive bounds on these constants for the Cartesian product of Markov chains. Specializing our result to the hypercube, we derive a certain connectivity theorem of Margulis-Talagrand. Several intriguing questions on Cheeger-type inequalities remain open. (This is joint work with Christian Houdre'.)
This talk will survey known algorithms for generating random spanning trees of graphs (directed and undirected). These include algorithms based on linear algebra and algorithms based on random walks. The loop-erased random walk algorithm (the ``cycle-popping algorithm''), which in addition to being very efficient has also been a useful conceptual tool in the study of random spanning trees on certain infinite graphs, will be presented. Details can be found in the article How to Get a Perfectly Random Sample From a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph by the speaker and James Propp. Also see the speaker's web-page on perfectly random sampling with Markov chains.
We begin with the following odd lemma: for any sequence of events A_1,...,A_n there are i < j such that Pr(A_i and not A_j) < 1/4 + o(n^0). Generalization in three directions leads to:
(1) A combinatorial problem involving shift graphs;
(2) The answer to a question of Brightwell and Scheinerman concerning fractional dimension of interval orders; and
(3) A finite form of de Finetti's Theorem which jettisons the exchangeability hypothesis.
In this talk, I will survey extractors and their applications. An extractor is a procedure to extract randomness from a defective random source, using some additional truly random bits. Extractors have seemingly unrelated applications, such as using few random bits to do random sampling and randomized space-bounded computation, and constructing expander graphs that beat the eigenvalue bound.