User:Egm6936.f10/Probability concepts

=Probability concepts and notations=

djvu notes: [[media:vql.probability.cdf.djvu|Probability, distribution, density]]

Events(Samples, Outcomes)
Event (or sample or outcome) is a subset of results of random experiments, which designated by $$\mathbf\omega $$.

For example, the result of tossing a coin once should be either head or tail, i.e., $$\mathbf\omega =$$ head or $$\mathbf\omega =$$ tail. If we toss the coin twice, then $$\mathbf\omega =$$ {head, tail} or $$\mathbf\omega =$$ {head head} or $$\mathbf\omega =$$ {tail head} or $$\mathbf\omega =$$ {tail tail}.

Algebra of Events
In some ways, the algebra of events share some similarities with the algebra of real numbers, with intersection ( $$\cap$$ ) corresponding to multiplication ( $$\times $$ ), complement ( $$^{c}$$ ) to subtraction ( $$\mathbf -$$ or $$ \setminus $$ ) and union ( $$\cup$$ ) to addition ( $$\mathbf +$$ ).

The union of $$n$$ events $$ A_{1},\, A_{2},\, ..., A_{n} $$ is the set collecting all points in all those events $$ A_{1},\, A_{2},\, ..., A_{n} $$.

Notation:


 * {| style="width:100%" border="0"

$$\displaystyle A_{1} \cup A_{2} \cup ... \cup A_{n}$$, or $$\bigcup_{i=1}^{\infty}A_{i} $$ (1)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

The intersection of $$n$$ events $$ A_{1},\, A_{2},\, ..., A_{n} $$ is the set collecting all points belonging to all those events $$ A_{1},\, A_{2},\, ..., A_{n} $$.

We call they are disjoint if the intersection of sets is empty.

Intersection has the associative property.

Notation:


 * {| style="width:100%" border="0"

$$\displaystyle A_{1} \cap A_{2} \cap ... \cap A_{n} $$, or $$\bigcap_{i=1}^{\infty}A_{i} $$ (2)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }


 * {| style="width:100%" border="0"

$$\displaystyle (A_{1} \cap A_{2}) \cap A_{3} $$=$$ A_{1} \cap (A_{2} \cap A_{3}) $$ (3)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

The complement of a event $$A$$ in $$\Omega$$ is the set collecting all points in $$\Omega$$ but not in the event $$A$$. Generally, we can have two kinds of complement: relative complement and absolute complement.

Relative complement of $$A_{1}$$ with respect to $$A_{2}$$ is the set of points in $$A_{2}$$ but not in $$A_{1}$$. If union of all sets $$A_{1},\, A_{2},\, ..., A_{n}$$ considered to be $$\cup$$, the absolute complement of $$A_{1}$$ is the set of points in $$\cup$$, but not in $$A_{1}$$.

Notation:
 * {| style="width:100%" border="0"

$$\displaystyle A^{c} $$ (4)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

For schematic representations of union, intersection and complement, we can use Venn Diagram.

Fig 1: Venn Diagram

Sample space (Outcome space)
Sample space(Statistical Theory) or outcome space(Probability Theory) is a collection of all possible outcomes (or events or samples) of random experiments, which denoted by $$\mathbf \Omega$$.

For example, in the coin - tossing experiment, a coin is tossed once, the outcome space $$\mathbf \Omega$$ ={heads, tails}. If we tossing twice, the outcome space $$\mathbf\Omega =$$ { {head, tail}, {head head}, {tail head}, {tail tail} }.

( Xiu 2010, p.9 ;, Shao 2007, p.1 .).

Sigma-Field
Sigma-Field is a collection of subsets of a sample space $$\mathbf\Omega$$ (not necessary all), which denoted by $$\mathcal F$$. For instance, $$\mathcal F = \{ \emptyset, {\rm heads}, {\rm tails}, \mathbf\Omega\}$$ in the coin - tossing experiment.

Three conditions that the sigma-field must satisfy:

$$\bullet$$ Non-empty: $$\Omega \in \mathcal F$$ and $$\emptyset \in \mathcal F$$;

$$\bullet$$ Given $$A \in \mathcal F$$, then $$A^c \in \mathcal F$$;

$$\bullet$$ Given $$A_1$$, $$A_2$$,...$$\in \mathcal F$$, then

$$\bigcap_{i=1}^{\infty} A_{i} \in \mathcal F$$ and $$\bigcup_{i=1}^{\infty} A_{i} \in \mathcal F$$.

i.e., sum or union of any subsets of $$\mathcal F$$ is a subset of $$\mathcal F$$.

( Xiu 2010, p.10 ;, Shao 2007, p.2 .)

Note: $$\mathcal F$$ is called a " sigma-field " or " sigma-algebra ", written as $$\sigma$$-field  or  $$\sigma$$-algebra. $$\sigma$$ is mnemonic for " S ", and " Sum ", due to property.

Probability
Probability is used to measure the likelihood of the occurrence of certain event (or outcome). Probability of an event $$\mathbf \omega $$ belonging to an element $$A \in \mathcal F$$ is a non-negative number (or measure), which is mathematically denoted by


 * {| style="width:100%" border="0"

$$\displaystyle P(\omega \in A)=P(A) $$ (5)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

For example, in the coin - tossing experiment, $$ P(heads)=P(tails)=\frac{1}{2}$$, $$P(\emptyset)=0$$, $$ P(heads, tails)=P(heads)+P(tails)=\frac{1}{2}+\frac{1}{2}=1$$.

Algebra of Probability
The complement of an event $$ A $$ is the event not $$A$$ (that is, the event of $$A$$ not occurring); its probability is given by $$ P(not A) = 1 - P(A)$$. As an example, the chance of not rolling a six on a six-sided die is 1 – (chance of rolling a six)= $$ 1 - \frac{1}{6} = \frac{5}{6}$$.

If both events $$ A $$ and $$ B $$ occur on a single performance of an experiment, this is called the intersection or joint of $$ A $$ and $$ B $$, denoted as $$P(A \cap B)$$. If two events, $$ A $$ and $$ B $$ are independent, then the joint probability is


 * {| style="width:100%" border="0"

$$\displaystyle P(A \mbox{ and }B) = P(A \cap B) = P(A) P(B) $$ (6)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

For example, if two coins are tossed, the chance of both being heads is $$\frac{1}{2}\times\frac{1}{2} = \frac{1}{4}$$.

If either event $$ A $$ or event $$ B $$ or both events occur on a single performance of an experiment this is called the union of the events $$ A $$ and $$ B $$ denoted as $$P(A \cup B)$$.

If two events are mutually exclusive, then the probability of either occurring is


 * {| style="width:100%" border="0"

$$\displaystyle P(A\mbox{ or }B) = P(A \cup B)= P(A) + P(B) $$ (7)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

For example, the chance of rolling a 2 or 3 or 5 on a six-sided die is $$P(\{2,\, 3,\, 5\}) = P(\{2\}) + P(\{3\}) + P(\{5\}) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6}= \frac{1}{2}$$.

If the events are not mutually exclusive then


 * {| style="width:100%" border="0"

$$\displaystyle \mathrm{P}\left(A \hbox{ or } B\right)=\mathrm{P}\left(A\right)+\mathrm{P}\left(B\right)-\mathrm{P}\left(A \mbox{ and } B\right) $$ (8)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Random variable and vector
Intuitively, Random Variable is used to designate a random outcome (event or sample) in a random experiment, usually denoted in capital letters, e.g., $$\mathbf X $$ is a random variable. It's a numerical description of the outcome of an experiment.

Formally, it is a mapping from a probability space to the real numbers, which is measurable.


 * {| style="width:100%" border="0"

$$\displaystyle \mathbf X:(\Omega ,\mathcal F ) \to (\mathbb R, \mathcal B)$$ $$\displaystyle \omega \mapsto\mathbf X(\omega) $$ (9)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Where $$(\Omega ,\mathcal F )$$ is event space $$\Omega$$ endowed with $$\sigma$$ - algebra $$\mathcal F$$, $$(\mathbb R, \mathcal B)$$ is set of real numbers $$\mathbb R $$ endowed with " Borel $$\sigma$$ - algebra " $$\mathcal B$$ (sigma-algebra of finite open  subsets of $$\mathbb R$$). (Shao 2007, p.7 .)

$$\mathbf X(\omega) $$ = ( arbitrary ) number selected to represent each event $$\mathbf\omega$$ in $$\mathbf\Omega$$. For Example, typically, in the coin - tossing experiment, we can use number 1 to designate the heads and 0 for the tails, i.e., $$\mathbf X(\rm heads)= 1$$, $$\mathbf X(\rm tails)= 0$$. But it is also possible to select, event though not a good choice, since not as mnemonic as $$\{0,1\}$$, $$\mathbf X(\rm heads)= 5$$, $$\mathbf X(\rm tails)= -3$$.

Example:

$$\mathcal B = \sigma \Big( \{(a,b]: a,b \in \mathbb R \} \Big)$$

$$\{(a,b]: a,b \in \mathbb R \}$$ Set of finite open intervals in $$\mathcal B$$

This choice of $$\mathcal B$$ allows for the probability of

$$\mathbf X \in (a,b]$$, i.e., $$\mathbf P \big(\mathbf X \in (a,b]\big)$$.

(Xiu 2010, p.11 )

In the turbulent flows case, the sample space $$\Omega$$ can be thought of as a set of repeat experiments(samples) to verify, say, a hypothesis or observations on a given flow.


 * {| style="width:100%" border="0"

$$\displaystyle \Omega = \{ \omega_{1},\, \omega_{2},\, ...,\, \omega_{n_{exp}}\} $$
 * style="width:95%" |
 * style="width:95%" |
 * }

where $$n_{exp}$$ is the total number of repeated experiments, e.g., until the standard deviation is small enough compared to the mean.

The $$i$$th velocity component(a random variable) at $$(x,\, t)$$ in experiment $$\mathbb\omega_{k}$$ is $$U_{i}(x,\, t,\, \omega_{k})$$.

A random vector $$\displaystyle \mathbf X$$ is composed of real-valued random variables $$\displaystyle X_i, i=1,2,3...$$. A typical n-dimensional random vector can be represented as $$\displaystyle \mathbf X = (X_1, X_2, X_3,...)$$. 【Theorem 1】: Let $$\displaystyle \mathbf X = (X_1,X_2,...,X_n)$$ be a Gaussian random vector with distribution $$\displaystyle N(\mu, \mathbf C)$$ and let $$\displaystyle \mathbf A $$ be an $$\displaystyle m \times n $$ matrix. Then $$\displaystyle \mathbf {AX}^T $$ has an $$\displaystyle N(\mathbf A\mu^T, \mathbf {ACA}^T)$$ distribution.

Fig 2: 2-D random vector

In case events were already representable by real numbers, then the event space is already $$\Omega \equiv \mathbb R$$. It's then not necessary to mention $$\mathbf \omega$$, but directly $$\mathbf X$$. An example of such variable is a velocity component in a turbulent flow.(Pope 2000 )

$$\mathbf P_{\mathbf X}$$ Probability distribution
Probability Distribution is a function that describes the probability of a random variable taking certain values.


 * {| style="width:100%" border="0"

$$\displaystyle P_X = P\circ X^{-1}:\mathcal B \to \mathbb R_{0}^{+} $$ (10)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Fig 3: Mapping

In practice, only to refer to an open interval in $$\mathcal B$$


 * {| style="width:100%" border="0"

$$\displaystyle {[(a,\, b] \in \mathcal B] \mapsto [P_X((a,\, b]) \in \mathbb R_{0}^{+}]} $$ (11)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }


 * {| style="width:100%" border="0"

$$\displaystyle P_X((a,b]) \equiv P_X(X( \omega) \in (a,b]) $$ (12) i.e., the probability that $$a < X(\omega) \le b$$.
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

$$\mathbf F $$$\mathbf X$ Cumulative distribution function ( CDF )
Cumulative Distribution Function (CDF), or only Distribution Function, describes the probability a real-valued random variable $$\mathbf X$$ with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution.


 * {| style="width:100%" border="0"

$$\displaystyle F_X(x) := P_X((-\infty, x]) = P_X(X \le x) $$ (13)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

For random vectors,
 * {| style="width:100%" border="0"

$$\displaystyle F_X(\mathbf x) := P_X((-\infty, \mathbf x]) = P_X(X_1 \le x_1, X_1 \le x_2,...,X_n \le x_n), \mathbf x = (x_1, x_2,...,x_n) \in \mathbb R^n $$ (14)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Normal(Gaussian) Distribution


 * {| style="width:100%" border="0"

$$\displaystyle F_X(x) = \frac12\left[\, 1 + \operatorname{erf} \left(\displaystyle \frac{x}{\sqrt{2}} \right) \right]) $$ (15)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }

Fig 4: CDF of Normal Distribution

$$f_{\mathbf X}$$ Probability density function ( PDF )
Probability Density Function (PDF), or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the integral of this variable’s density over the region. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.


 * {| style="width:100%" border="0"

$$\displaystyle f_{\mathbf X}(x):=\frac{d}{dx}\mathbf F_{\mathbf X}(x) $$ (24)
 * style="width:95%" |
 * style="width:95%" |
 * 
 * }
 * {| style="width:100%" border="0"

$$\displaystyle P_{\mathbf X} (a < X \leq b) = \int_a^b f_{\mathbf X}(x) \, \mathrm{d}x $$ (25)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }


 * {| style="width:100%" border="0"

$$\displaystyle \mathbf F_{\mathbf X}(x)=\int_{-\infty}^{x} f_{\mathbf X}(t)dt $$ (26)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

For random vectors,
 * {| style="width:100%" border="0"

$$\displaystyle \mathbf F_{\mathbf X}(\mathbf x)=\mathbf F_{\mathbf X}(x_1,x_2,...,x_n)=\int_{-\infty}^{x_1}...\int_{-\infty}^{x_n} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_n. $$ (27) and
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \int_{-\infty}^{+\infty}...\int_{-\infty}^{+\infty} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_n = 1 $$ (28) If a vector $$\displaystyle \mathbb X$$ has density $$\displaystyle f_{\mathbb X}$$, then all its subsets have a density, called marginal densities.
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle f_{X_i}(x_i) = \int_{-\infty}^{+\infty}...\int_{-\infty}^{+\infty} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_{i-1}dt_{i+1}...dt_n $$ (29)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Normal(Gaussian) distribution


 * {| style="width:100%" border="0"

$$\displaystyle f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}{\rm exp}\left[{-\frac{(x-\mu)^2}{2\sigma^2}}\right] $$ (30)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Fig 7: Normal(Gaussian) distribution

Binomial distribution


 * {| style="width:100%" border="0"

$$\displaystyle P(X=k)=\binom{n}{k}p^{k}(1-p)^{n-k}, k=0,1,...,n $$ (31)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Fig 8: Binomial distribution

Poisson distribution
 * {| style="width:100%" border="0"

$$\displaystyle P(X=k)=e^{-\lambda}\frac{\lambda^{k}}{k!}, k=0,1,... $$ (32)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Fig 9: Poisson distribution

Note: 

Notation $$\mathbf X$$ and $$x$$

In recent literature, an uppercase  letter, e.g., $$\mathbf X$$, is used to designate a  random variable, whereas the corresponding  lowercase  letter, e.g., $$x$$, is used to designate the real variable that is the  upper bound  of $$\mathbf X$$.

Kolmogorov, 1933 ; Famous work influencing subsequent mathematical probability and statistics works.


 * {| style="width:100%" border="0"

$$\displaystyle F^{(x)}(a) = \displaystyle \int_{-\infty}^{a}f^{(x)}(a)da $$ (33) (Kolmogorov, 1933 p.24 )
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Expectations and Moments
The expectation, also mean value or the first moment, of a random variable $$\displaystyle X$$ is defined as:

For continuous distribution, with the probability density function $$\displaystyle f_X(x) $$ ,
 * {| style="width:100%" border="0"

$$\displaystyle \mu_X = \mathbb E[X] := \int_{-\infty}^{+\infty}xf_X(x)dx $$ (31) The expectation of $$\displaystyle g(X)$$, where $$\displaystyle g $$ is a real-valued function, is:
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \mathbb E[g(X)] := \int_{-\infty}^{+\infty}g(X)f_X(x)dx $$ (32) The $$\displaystyle k$$th moment of random variable $$\displaystyle X$$, where $$\displaystyle k \in \mathbb N$$, is:
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \mathbb E[X^k] := \int_{-\infty}^{+\infty}x^kf_X(x)dx $$ (33) The variance of random variable $$\displaystyle X$$, $$\displaystyle \sigma_X^2$$, is:
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \sigma_X^2 = var(X) := \int_{-\infty}^{+\infty}(x-\mu_X)^2f_X(x)dx =: \mathbb E[(X-\mu_X)^2] $$ (34) For discrete distribution, with the probability $$\displaystyle p_n=P(X = x_n) $$, the definition of above terms are:
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \mu_X = \mathbb E(X) := \sum_{n=1}^{+\infty}x_{n}p_{n} $$ (35)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \mathbb E[g(X)] := \sum_{n=1}^{+\infty}g(X_{n})p_{n} $$ (36)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \mathbb E[X^k] := \sum_{n=1}^{+\infty}x_{n}^{k}p_{n} $$ (37)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }
 * {| style="width:100%" border="0"

$$\displaystyle \sigma_X^2 = var(X) := \sum_{n=1}^{+\infty}(x_{n}-\mu_X)^2p_{n} =: \mathbb E[(X-\mu_X)^2] $$ (38)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

From equation (34)&(38), we can deduce that
 * {| style="width:100%" border="0"

$$\displaystyle \sigma_X^2 = \mathbb E[(X-\mu_X)^2] = \mathbb E[X^2 - 2X\mu_X + \mu_X^2] = \mathbb E[X^2] - \mathbb E[2X\mu_X] + \mathbb E[\mu_X^2] = \mathbb E[X^2] - \mu_X^2 $$ (39) Variance equals the mean of the square minus the square of the mean.
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

The expectation of a random vector $$\displaystyle \mathbf X$$ is
 * {| style="width:100%" border="0"

$$\displaystyle \mu_{\mathbf X}=\mathbb E[\mathbf X] = (\mathbb E[X_1],\mathbb E[X_2],...,\mathbb E[X_n]). $$ (40)
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

A very frequently used quantity of random vector is the covariance matrix, which is defined as
 * {| style="width:100%" border="0"

$$\displaystyle \mathbf C_{\mathbf X} = cov(X_i,X_j), i,j = 1,2,...,n. $$ (41) where $$\displaystyle cov(X_i,X_j)=\mathbb E[(X_i-\mu_{X_i})(X_j-\mu_{X_j})]= \mathbb E[X_iX_j]-\mu_{X_i}\mu_{X_j}$$ is the covariance of $$\displaystyle X_i$$ and $$\displaystyle X_j$$. We have $$\displaystyle cov(X_i,X_i) = \sigma_{X_i}^2$$.
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Convergence modes
Given a sequence of random variables $$\displaystyle X_1, X_2,...$$, we defined following convergence modes.

Convergence in distribution, $$\displaystyle X_n \overset{d}{\rightarrow} X$$

For all continuous points $$\displaystyle x$$ of the distribution function $$\displaystyle F_X $$, if we have the relation
 * {| style="width:100%" border="0"

$$\displaystyle F_{X_n}(x) \rightarrow F_X(x), \;\;\; n \rightarrow \infty $$ (42) is satisfied.
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Convergence in distribution is a weak convergence.

Convergence in probability, $$\displaystyle X_n \overset{P}{\rightarrow} X$$

If the probability of the difference between $$\displaystyle X_n $$ and $$\displaystyle X $$, with $$\displaystyle n \rightarrow \infty$$ larger than any positive $$\displaystyle \epsilon $$ tends to zero, then we call $$\displaystyle X_n $$ converges to $$\displaystyle X $$ in probability,which can be written as $$\displaystyle X_n \overset{P}{\rightarrow} X $$.
 * {| style="width:100%" border="0"

$$\displaystyle P(|X_n - X| > \epsilon) \rightarrow 0, \;\;\; n \rightarrow \infty $$ (43) Convergence in probability implies convergence in distribution. The converse is true if and only if $$\displaystyle X = x$$ for some constant $$\displaystyle x$$.
 * style="width:95%" |
 * style="width:95%" |
 * <p style="text-align:right;">
 * }

Almost sure Convergence, $$\displaystyle X_n \overset{a.s.}{\rightarrow} X$$

$$\displaystyle L^p$$ Convergence, $$\displaystyle X_n \overset{L^p}{\rightarrow} X$$

$$\displaystyle $$