Principle and application of Monte Carlo method Approximate distribution of statistics and quantile calculation Python drawing demonstration

Directory


1 Preliminary introduction and basic principles of Monte Carlo probability method

1.1 Definition of Monte Carlo method
1.2 Origin of Monte Carlo method
1.3 Examples of Monte Carlo methods

2 Use Monte Carlo to calculate statistical distribution and quantile

2.1 Description of mathematical statistics problems
2.2 Normal distribution and exponential distribution
2.3 Empirical distribution and probability calculation
2.4 Solution ideas and general process

3 The actual process of solving problems using the Monte Carlo method

3.1 Obtain corresponding distributed random numbers
3.2 Drawing of approximate distribution function
3.3 Quantile of target statistics
3.4 Error in sample size changes
3.5 The relationship between quantiles and randomness

4 Statement of reference materials and references related to this article

1 Preliminary introduction and basic principles of Monte Carlo probability method

1.1 Definition of Monte Carlo method

Monte Carlo method, also known as statistical simulation method and random sampling technology, is a random simulation method, a calculation method based on probability and statistical theory methods, which uses random numbers (or more commonly known as pseudo-random numbers) to solve Many approaches to computational problems. When using the Monte Carlo method, it is necessary to connect the problem to be solved with a certain probability model, and then use an electronic computer to implement statistical simulation or sampling to obtain an approximate solution to the problem.

1.2 The origin of the Monte Carlo method

The Monte Carlo method was first proposed by S.M. Ulam and J. von Neumann, members of the “Manhattan Project” that the United States developed the atomic bomb in World War II in the 1940s. The mathematician von Neumann named this method after the world-famous casino Monte Carlo in Monaco, which casts a layer of mystery on it. The Monte Carlo method is widely used in financial engineering, macroeconomics, biomedicine, computational physics (such as particle transport calculations, quantum thermodynamics calculations, aerodynamic calculations, nuclear engineering) and other fields.

1.3 Examples of Monte Carlo methods

We can use the Monte Carlo method to calculate

π

\pi

The approximate value of π, the specific process is: there is a tangent circle inside the square, and the ratio of their areas is

π

4

\frac{\pi}{4}

4π?, as shown in the figure below.

At this time, if we generate random n points inside the square (n is large enough), then the ratio of the number of points inside the circle to the number of points inside the square will be approximately equal to the ratio of the areas of the circle and the square in a probability sense , assuming this ratio is

k

k

k , then there is the formula:

π

=

4

k

\pi = 4k

π=4k.

2 Use Monte Carlo to calculate statistical distribution and quantile

2.1 Description of mathematical statistics problems

2.2 Normal distribution and exponential distribution



2.3 Empirical distribution and probability calculation



2.4 Solution ideas and general process

We use the case of standard normal distribution to illustrate. The idea of the corresponding exponential distribution is similar. In this section, we mainly describe the general steps, and the specific answers are given in Chapter 4.
The first step is to start from the standard normal distribution

N

(

0

,

1

)

N(0,1)

Randomly sample from N(0,1) to obtain enough

m

m

m points:

x

1

,

x

2

,

.

.

.

,

x

m

x_{1},x_{2},…,x_{m}

x1?,x2?,…,xm?
The second step is to

m

m

m points are randomly divided into

k

k

k groups, each group contains

n

n

n points, and calculate according to the formula in the question:

t

1

,

t

2

,

.

.

.

,

t

k

t_{1} ,t_{2},…,t_{k}

t1?,t2?,…,tk?
The third step is to obtain the

k

k

k points

t

1

,

t

2

,

.

.

.

,

t

k

t_{1} ,t_{2},…,t_{k}

t1?,t2?,…,tk?, find the corresponding approximate distribution expression according to the definition of empirical distribution
The fourth step is to use the empirical distribution expression obtained above to complete the quantile calculation task of our target statistics.

3 The actual specific process of solving problems using the Monte Carlo method

3.1 Obtain corresponding distributed random numbers

We can use python code to obtain random numbers from normal distribution and exponential distribution. The relevant code is as follows:

import numpy as np
import math
import matplotlib.pyplot as plt

# Generate n*100,000 standard normal distribution random numbers and standard exponential distribution random numbers
n=20
dataX_nor = np.random.normal(0, 1, n * 10 * 10000)
dataX_exp = np.random.exponential(1, n * 10 * 10000)

Then, we need to compare the obtained 1 million random numbers with each group

n

n

Group n forms, the relevant code is as follows:

# Group the generated normal distribution and exponential distribution random numbers into groups of n
dataZ_nor = np.reshape(dataX_nor, (n * 10 * 10000 // n, n))
dataZ_exp = np.reshape(dataX_exp, (n * 10 * 10000 // n, n))
# Use the obtained group random numbers to calculate several observation values of the target statistic T
dataT_nor = np.zeros(n * 10 * 10000 // n, np.float32) # Create an empty array
dataT_exp = np.zeros(n * 10 * 10000 // n, np.float32) # Create an empty array
for i in range(n * 10 * 10000 // n):
    sum_nor, sum_exp = 0, 0
    for j in range(n):
        sum_nor + = (dataZ_nor[i][j] - dataZ_nor[i].mean()) ** 2
        sum_exp + = (dataZ_exp[i][j] - dataZ_exp[i].mean()) ** 2
    dataT_nor[i] = math.sqrt(sum_nor / n) / (dataZ_nor[i].mean() - dataZ_nor[i].min())
    dataT_exp[i] = math.sqrt(sum_exp / n) / (dataZ_exp[i].mean() - dataZ_exp[i].min())
3.2 Drawing of approximate distribution function

We can construct a program function to represent the approximate distribution function

F

(

x

)

F(x)

F(x), the input of the program function is

x

x

x and the data set, the output is

F

(

x

)

F(x)

F(x), the processing process can be based on the definition of empirical distribution. The relevant code is as follows:

# Use programs to represent approximate distribution functions, input independent variables, and output function values.
def F(x, data):
    k = len(data)
    u = 0
    for index in range(k):
        if x > data[index]:
            u + = 1
    return u/k

After we have the approximate distribution function, we can draw the image of its approximate distribution function. The relevant code is as follows:

# Draw function graph
X = np.linspace(0, 1.5, 150) # Create 100 data points equally spaced from -1 to 9
Y_nor = np.zeros(len(X), np.float32) # Create an empty array
Y_exp = np.zeros(len(X), np.float32) # Create an empty array
for index in range(len(X)):
    Y_nor[index] = F(X[index], dataT_nor)
    Y_exp[index] = F(X[index], dataT_exp)
plt.plot(X, Y_nor) # Draw function graph
plt.xlabel('x') # Set x-axis label
plt.ylabel('y') # Set the y-axis label
plt.title('Function-Nor') # Set image title
plt.grid(True) #Display grid lines
plt.show() # Display image

With the above steps and code, we roughly get the following approximate distribution curve, which is marked

N

o

r

Nor

Nor’s curve is calculated from a normal distribution of data, marked with

E

x

p

Exp

The curve calculation data of Exp comes from exponential distribution, both of which satisfy

n

=

10

n=10

n=10:

3.3 Quantile of target statistics


From the approximate distribution curve image, we can easily estimate roughly that the desired quantile is in the interval

(

0

,

1

)

(0,1)

In (0,1), based on the knowledge of empirical distribution and order statistics, we can easily get the formula

t

α

=

T

(

k

α

)

t_{\alpha}=T_{(k\alpha )}

tα?=T(kα)? , we can

t

1

,

t

2

,

.

.

.

,

t

k

t_{1} ,t_{2},…,t_{k}

After t1?,t2?,…,tk? is sorted, the corresponding value is obtained according to this formula. The relevant code is as follows:

# Define function to calculate statistic quantile
def A(alpha, data):
    k = len(data)
    data_sort = np.sort(data) # Sort
    return data_sort[int(k * alpha)]

Pick

n

=

10

n=10

n=10, according to the program running results, we get:

Standard Normal Distribution Standard exponential distribution

α

=

0.01

\alpha =0.01

α=0.01

0.3936533 0.56543356

α

=

0.05

\alpha =0.05

α=0.05

0.4370795 0.65696585

α

=

0.10

\alpha =0.10

α=0.10

0.4659587 0.7118661
3.4 Error of sample size change

In order to observe the relationship between quantile value and sample size, we take the values respectively

n

=

10

,

n

=

20

,

.

.

.

,

n

=

70

n=10,n=20,…,n=70

n=10,n=20,…,n=70 are observed.
For the standard normal distribution, the results are as follows:

α

\alpha

α \ n

n

=

10

n=10

n=10

n

=

20

n=20

n=20

n

=

30

n=30

n=30

n

=

40

n=40

n=40

n

=

50

n=50

n=50

n

=

60

n=60

n=60

n

=

70

n=70

n=70

0.01 0.39365 0.33851 0.31550 0.30416 0.29674 0.29148 0.28938
0.05 0.43708 0.38108 0.35766 0.34427 0.33517 0.32763 0.32317
0.10 0.46596 0.40843 0.38308 0.36831 0.35759 0.34942 0.34489

For the standard exponential distribution with parameter 1, the results are as follows:

α

\alpha

α \ n

n

=

10

n=10

n=10

n

=

20

n=20

n=20

n

=

30

n=30

n=30

n

=

40

n=40

n=40

n

=

50

n=50

n=50

n

=

60

n=60

n=60

n

=

70

n=70

n=70

0.01 0.56543 0.64118 0.69019 0.71926 0.74442 0.76087 0.77441
0.05 0.65697 0.72240 0.76146 0.78683 0.80469 0.81802 0.82034
0.10 0.71187 0.76891 0.80247 0.82376 0.83893 0.84743 0.85112

Through the above running results, we can easily find that with

n

n

As the value of n increases, the calculation results tend to be stable and show a monotonically increasing or monotonically decreasing pattern. Based on this, it can be guessed that the calculation error is getting smaller.

The relationship between 3.5 quantiles and randomness

As an off-topic discussion, we have added a new research project: How do the quantile results we calculate relate to the initial random numbers? If the initial random number selection is changed, do the results change significantly? In this section, we explore whether the quantile calculation results are sensitive to the initial value of the random number. For this purpose, we set

n

=

100

,

α

=

0.05

n=100,\alpha = 0.05

n=100, α=0.05, and then continuously change the initial random number to observe the changes in the calculation results. The relevant tables are as follows:

Random result 1 Random result 2 Random result 3 Random result 4 Random result 5
Standard normal distribution 0.31474 0.30983 0.31064 0.30978 0.31054
Standard exponential distribution 0.84381 0.84988 0.85204 0.84550 0.85045

Through the above table, we can easily find that although the calculation results fluctuate, they generally fall within a certain reasonable range and show obvious stability.

4 Statement of reference materials and references related to this article

  1. The introduction and principles of the Monte Carlo method come from Internet information sites including Baidu Encyclopedia.
  2. For part of using Monte Carlo to calculate statistical distribution and quantile, please refer to Teacher Wei Wei’s courseware.
  3. The programming language is python, the editor is pycharm, and the environment is anaconda + Win10
  4. The mathematical statistics topic in the article is my graduate homework, and the code inside is written by me myself.

The overall code is as follows, some codes have been slightly modified to obtain different results:

import numpy as np
import math
import matplotlib.pyplot as plt

# Generate n*100,000 standard normal distribution random numbers and standard exponential distribution random numbers
n=100
dataX_nor = np.random.normal(0, 1, n * 10 * 10000)
dataX_exp = np.random.exponential(1, n * 10 * 10000)
#Group the generated normal distribution and exponential distribution random numbers into groups of n
dataZ_nor = np.reshape(dataX_nor, (n * 10 * 10000 // n, n))
dataZ_exp = np.reshape(dataX_exp, (n * 10 * 10000 // n, n))
# Use the obtained group random numbers to calculate several observation values of the target statistic T
dataT_nor = np.zeros(n * 10 * 10000 // n, np.float32) # Create an empty array
dataT_exp = np.zeros(n * 10 * 10000 // n, np.float32) # Create an empty array
for i in range(n * 10 * 10000 // n):
    sum_nor, sum_exp = 0, 0
    for j in range(n):
        sum_nor + = (dataZ_nor[i][j] - dataZ_nor[i].mean()) ** 2
        sum_exp + = (dataZ_exp[i][j] - dataZ_exp[i].mean()) ** 2
    dataT_nor[i] = math.sqrt(sum_nor / n) / (dataZ_nor[i].mean() - dataZ_nor[i].min())
    dataT_exp[i] = math.sqrt(sum_exp / n) / (dataZ_exp[i].mean() - dataZ_exp[i].min())


# Use programs to represent approximate distribution functions, input independent variables, and output function values.
def F(x, data):
    k = len(data)
    u = 0
    for index in range(k):
        if x > data[index]:
            u + = 1
    return u/k


#Define function to calculate statistic quantiles
def A(alpha, data):
    k = len(data)
    data_sort = np.sort(data) # Sort
    return data_sort[int(k * alpha)]


print(A(0.05, dataT_nor))
print(A(0.05, dataT_exp))
# Draw function graph
X = np.linspace(0, 2, 200) # Create 100 data points equally spaced from -1 to 9
Y_nor = np.zeros(len(X), np.float32) # Create an empty array
Y_exp = np.zeros(len(X), np.float32) # Create an empty array
for index in range(len(X)):
    Y_nor[index] = F(X[index], dataT_nor)
    Y_exp[index] = F(X[index], dataT_exp)
plt.plot(X, Y_exp) # Draw function graph
plt.xlabel('x') # Set x-axis label
plt.ylabel('y') # Set the y-axis label
plt.title('Function-Exp') # Set image title
plt.grid(True) #Display grid lines
plt.show() # Display image