9 Finite Probability Distributions

Definition 9.1

✓

A finite probability distribution is a function \(f : A \to \mathbb {R}\) from a finite type \(A\), such that \(f\) is nonnegative and the sum of \(f\) is 1.

Definition 9.2

✓

The uniform distribution on a nonempty set \(A\), \(\operatorname{\operatorname {Uniform}}(A)\), assigns \(\frac1{|A|}\) to all values in \(A\) and \(0\) to other values.

Definition 9.3

✓

Given two finite probability distributions \(f: A \to \mathbb {R}, g : B \to \mathbb {R}\), we have a probability distribution from \(A \times B\) defines as \((f \times g)(x, y) = f(x) g(y)\).

Definition 9.4

✓

Given a finite probability distribution \(f: A \to \mathbb {R}\) and a function \(g : A \to B\), we can apply \(g\) to the random variable represented by \(f\). This gives the distribution \(g \# f\).

We can directly transfer all theorems on \(f \# g\) to finite PMFs.

Definition 9.5

✓

Given two finite probability distributions \(f: A \to \mathbb {R}, g : A \to \mathbb {R}\), we have a probability distribution defines as \(f-g = s \# (f \times g)\) with \(s(x, y) = x-y\).

Definition 9.6

✓

Given two finite probability distributions \(f: A \to \mathbb {R}, g : A \to \mathbb {R}\), we have a probability distribution defines as \(f+g = a \# (f \times g)\) with \(a(x, y) = x+y\).

Definition 9.7

✓

Given a finite probability distribution \(f: A \to \mathbb {R}\), we have a probability distribution defines as \(-f = n \# f\) with \(n(x) = -x\).

Proposition 9.8

✓

These operations define a commutative monoid.

Lemma 9.9

✓

We have \((f \# a) \times (g \# b) = h \# (a \times b)\), with \(h(x, y) = (f(x), g(y))\).

Proof ▶

By calculation.

Lemma 9.10

✓

We have \(f \# (a \times b) = b \times a\) for \(f(x, y) = (y, x)\).

Proof ▶

Simple application of 8.5

Lemma 9.11

✓

We have \((f \# a) + (g \# b) = h \# (a \times b)\), with \(h(x, y) = f(x) + g(y)\).

Proof ▶

By simplification after 9.9.

Definition 9.12

✓

Given a finite probability distribution \(f: A \to \mathbb {R}\) and a list of finite probability distributions on \(B\), indexed by elements of \(A\), \(g\), we can define \(g(f)\) as the probability distribution obtained by sampling an element from \(f\), and then sampling an elemente from the corresponding distribution in \(g\).

Lemma 9.13

✓

We have \(f(g(a)) = h(a)\) with \(h(x) = g(f(x))\).

Proof ▶

By calculation.

Lemma 9.14

✓

We have \(g \# f(a) = h(a)\) with \(h(x) = g\# f(x)\).

Proof ▶

By calculation.

Definition 9.15

✓

We say that a distribution \(a\) is \(\varepsilon \)-close to \(N\) entropy if for all sets \(|A| \leq N\), \(\sum _{x \in A} a(x) \leq \varepsilon \). Note that this is a bit different than the usual definition.

Proposition 9.16

✓

If \(a\) is \(\varepsilon \)-close to \(\lfloor n \rfloor \) entropy it’s also \(\varepsilon \)-close to \(n\) entropy.

Proposition 9.17

✓

If \(a\) is \(\varepsilon _1\)-close to \(n\) entropy and \(\varepsilon _1 \leq \varepsilon _2\) it’s also \(\varepsilon _2\)-close to \(n\) entropy.

Lemma 9.18

✓

If \(e\) is an isomorphism and \(a\) is \(\varepsilon \)-close to \(n\) entropy, \(e \# a\) is also \(\varepsilon \)-close to \(n\) entropy.

Proof ▶

By definition, after using 8.5.

Lemma 9.19

✓

If \(a\) is \(\varepsilon \)-close to \(n\) entropy, then for any PMF \(b\), \(a+b\) is also \(\varepsilon \)-close to \(n\) entropy.

Proof ▶

\[ \sum _{x\in H}(a+b)(x) = \sum _{x \in H}\sum _v{b(v) a(x - v)} = \sum _v b(v) \sum _{x \in H}{a(x - v)} = \sum _v b(v) \sum _{x \in H - v}{a(x)} \leq \sum _v b(v) \varepsilon = \varepsilon \]

Proposition 9.20

✓

If, for all \(x\) such that \(0 {\lt} f(x)\), we have that \(g(x)\) is \(\varepsilon \)-close to \(n\) entropy, then \(g(f)\) is \(\varepsilon \)-close to \(n\) entropy.

Proposition 9.21

✓

For any probability distribution \(a\), there are at most \(n\) values such that \(a(x) {\gt} 1/n\).