6.1 Probability Theory

Created Date: 2025-05-18

6.1.1 Normal Distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is:

\(f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{{(x - \mu)}^2}{2 {\sigma}^2}}\)

The parameter \(\mu\) is the mean or expectation of the distribution (and also its median and mode), while the parameter \({\sigma}^2\) is the variance. The standard deviation of the distribution is \(\sigma\) (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

The file normal_distribute.py sets \(\mu = 2\) and \(\sigma^2 = 10\), and uses NumPy to plot the normal distribution:

Simple Normal Distribution

Image below illustrates a normal distribution curve, often referred to as a bell curve, showing how data is distributed around the mean (\(\mu\)) in a general normal distribution. The curve is symmetric with its peak at the mean \(\mu\), representing the highest probability density. The horizontal axis is labeled in units of standard deviation \(\sigma\), and the percentages indicate the proportion of data falling within each range:

Normal Distribution

The simplest case of a normal distribution is known as the standard normal distribution or unit normal distribution. This is a special case when \(\mu = 0\) and \({\sigma}^2 = 1\), and it is described by this probability density function (or density):

\(\varphi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}\)

The normal distribution is often referred to as \(N(\mu, {\sigma}^2)\) or \(\mathcal{N}(\mu, \sigma^2)\). Thus when a random variable \(X\) is normally distributed with mean \(\mu\) and stanard deviation \(\sigma\), one may write:

\(X \sim \mathcal{N}(\mu, \sigma^2)\)

6.1.1.1 Mean and Median

The mean is the sum of all values divided by the number of values. For numbers \(x_1, x_2, \cdots , x_n\):

\(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\)

The median is the middle number when the data is sorted.

  1. If the number of data points is odd, the median is the middle value.

  2. If even, it is the average of the two middle values.

6.1.1.2 Standard Deviation and Variance

Variance measures how spread out a set of numbers is. It's the average of the squared differences from the mean. You and your friends have just measured the heights of your dogs (in millimeters):

Statistics Dogs Graph

The heights (at the shoulders) are: 600 mm, 470 mm, 170 mm, 430 mm and 300 mm. Find out the Mean, the Variance, and the Standard Deviation.

Our first step is to find the Mean:

\(Mean = \frac{600 + 470 + 170 + 430 + 300}{5} = \frac{1970}{5} = 394\)

so the mean (average) height is 394 mm. Now we calculate each dog's difference from the Mean, let's plot this on the chart:

Statistics Dogs Deviation

To calculate the Variance, take each difference, square it, and then average the result:

\(Variance = \frac{{206}^2 + {76}^2 + {(-224)}^2 + {36}^2 + {(-94)}^2}{5} = \frac{108520}{5} = 21704\)

So the Variance is 21,704. And the Standard Deviation is just the square root of Variance, so:

\(Deviation = \sqrt {21704} = 147\)

And the good thing about the Standard Deviation is that it is useful. Now we can show which heights are within one Standard Deviation (147 mm) of the Mean:

One Standard Deviation

So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short, right?

Our example has been for a Population (the 5 dogs are the only dogs we are interested in). But if the data is a Sample (a selection taken from a bigger Population), then the calculation changes!

  • The Population: divide by \(N\) when calculating Variance (like we did)

  • A Sample: divide by \(N - 1\) when calculating Variance

The Population Standard Deviation:

\(\sigma = \sqrt {\frac{1}{N} \sum_{i=1}^N {(x_i - \mu)}^2}\)

The Sample Standard Deviation:

\(s = \sqrt {\frac{1}{N-1} \sum_{i=1}^N {(x_i - \bar x)}^2}\)

If our 5 dogs are just a sample of a bigger population of dogs, we divide by 4 instead of 5 like this:

\(Variance = \frac{108520}{4} = 27130\)

\(Deviation = \sqrt {27130} = 165\)

6.1.2 Covariance Matrix

Covariance is a statistical measure that tells us how two variables change together. In simpler terms, it indicates the direction of the linear relationship between two variables.

Here's a breakdown:

  • Positive Covariance: If the covariance is positive, it means that when one variable tends to be above its average, the other variable also tends to be above its average. Similarly, when one is below average, the other also tends to be below average. They move in the same direction.

  • Negative Covariance: If the covariance is negative, it means that when one variable tends to be above its average, the other variable tends to be below its average, and vice versa. They move in opposite directions.

  • Zero Covariance (or close to zero): If the covariance is zero or very close to zero, it suggests there's no linear relationship between the two variables. They tend to move independently of each other.

Covariance only tells you the direction of the relationship, not the strength. The magnitude of the covariance depends on the units of the variables, making it hard to compare covariances across different datasets. For strength, we use the correlation coefficient, which is a normalized version of covariance.

Simple Example: Study Hours and Test Scores

Let's say we want to see how the number of hours a student studies (Variable X) relates to their test score (Variable Y).

Here's some hypothetical data for 5 students:

Student Hours Studied (X) Test Score (Y)
1 2 60
2 4 75
3 6 85
4 8 90
5 5 70

Formula for sample convariance:

\(Cov(X, Y) = \frac{\sum_{i=1}^n (X_i - \bar X) (Y_i - \bar Y)}{n-1}\)

Where:

  • \(X_i\) is an individual ovservation of X.

  • \(Y_i\) is an individual ovservation of Y.

  • \(\bar X\) is the mean of X.

  • \(\bar Y\) is the mean of Y.

  • \(n\) is the number of observations.

Let's calculate step-by-step.

Step 1 - Calculate the means (\(\bar X\) and \(\bar Y\)):

\(\bar X = (2 + 4 + 6 + 8 + 5) / 5 = 5\)

\(\bar Y = (60 + 75 + 85 + 90 + 70) / 5 = 76\)

Step 2 - Calculate the deviations from the mean for each observation:

Student \(X_i\) \(Y_i\) \(X_i - \bar X\) \(Y_i - \bar Y\) \((X_i - \bar X)(Y_i - \bar Y)\)
1 2 60 (2 - 5) = -3 (60 - 76) = - 16 (-3) * (-16) = 48
2 4 75 (4 - 5) = -1 (75 - 76) = -1 (-1) * (-1) = 1
3 6 85 (6 - 5) = 1 (85 - 76) = 9 1 * 9 = 9
4 8 90 (8 - 5) = 3 (90 - 76) = 14 3 * 14 = 42
5 5 70 (5 - 5) = 0 (70 - 76) = -6 0 * (-6) = 0

Step 3 - Sum the products of the deviations:

\(\sum (X_i - \bar X)(Y_i - \bar Y) = 48 + 1 + 9 + 42 + 0 = 100\)

Step 4 - Divide by \(n - 1\), since we have 5 students:

\(Cov(X, Y) = \frac{100}{4} = 25\)

6.1.3 Euler–Maruyama Method

6.1.4 Heat Kernel