Question A: In-depth explanation of earthquake source attribute identification model construction and magnitude prediction with complete code attached – specific modeling process and source code

Question A: Construction of earthquake source attribute identification model and magnitude prediction

Problem background:

Earthquake is a relatively complex crustal movement phenomenon, and countless earthquake disasters occur around the world every year. Earthquake early warning and forecasting technology aimed at reducing earthquake disasters needs to effectively identify natural earthquake events in daily earthquake monitoring, eliminate artificial earthquake records or abnormal interference signals, and then perform subsequent operations. Accurate identification of seismic signals is an important part of seismological research and seismic observation technology. However, with the rapid increase in urban engineering construction projects and the expansion of seismic network monitoring scope, unnatural seismic events such as blasting, mining earthquakes, weapons tests, and collapses have become increasingly common. has occurred, interfering with the recording of recent earthquake events, emergency response to major earthquakes, and the daily management of earthquake catalogs, it is necessary to enhance the reliability and accuracy of the identification model. Magnitude prediction is one of the important goals of earthquake prediction (epicenter, earthquake onset time, magnitude, etc.). The accurate determination of earthquake magnitude relies on feature mining of a large number of historical events and seismic wave energy estimation, and is helpful in developing targeted earthquake emergency plans. formulated to reduce losses. Figure 1 Typical natural seismic signals and artificial blasting signals (from contribution) With the development of computer technology and artificial intelligence disciplines, the application of artificial intelligence seismology has emerged, and the use of machine learning and neural network models to solve conventional seismological problems has gradually replaced The traditional method goes deep into the construction of earthquake source attribute identification model and earthquake magnitude prediction. Please resolve:

Question 1: Find a series of suitable indicators for the seismic wave data in attachments 1 to 8

Criteria and criteria are used to construct a seismic source attribute identification model and conduct natural earthquake events (Appendices 1 to 7)

Accurate distinction from unnatural earthquake events (Annex 8);

Question 2: The amplitude and waveform characteristics of seismic waves are significantly related to the magnitude. root

According to the data in attachments 1 to 7 with known magnitudes (magnitudes are: 4.2, 5.0, 6.0, respectively)

6.4, 7.0, 7.4, 8.0), appropriately select events and samples, and establish a magnitude prediction model,

Attempt to give the exact magnitude (to one decimal place) of the seismic events in Attachment 9.

Question 3: Reservoir depth, storage capacity, fault type, tectonic activity/basic intensity, lithology, etc.

is an important factor affecting the magnitude of reservoir-induced earthquakes. Please follow the 102 items in Annex 10

Reservoir earthquake samples, try to establish a relationship model between the basic attribute data of the reservoir and the magnitude, and provide

Give a reasonable basis.

Note: The signal sampling rate is uniformly 200 Hz; each attachment in attachments 1 to 9 represents one

Independent earthquake events, each sample in the attachment comes from different stations in the same earthquake event

Observation data, and the physical meaning of the data is the same (acceleration or velocity); the depth unit is m,

The storage capacity unit is 10 8m 3.

References:

[1] Pang Cong, Ding Wei, Cheng Cheng, et al. Particle swarm optimization generalized regression neural network and HHT sample

Research on earthquake identification based on entropy combination[J]. Progress in Geophysics, 2022, 37(04):1457-1463.

[2] Pang Cong, Jiang Yong, Liao Chengwang, etc. Based on MFCC sample entropy and gray wolf algorithm optimization support

Automatic identification of natural earthquakes and artificial blasting using vector machines [J]. Earthquake Engineering

Journal, 2022, 44(05):1169-1175.

[3] Pang Cong, Jiang Yong, Wu Tao, et al. Influence of neural network parameters on earthquake type identification[J]. Ke

Science Technology and Engineering, 2022,22(18):7765-7772.

[4] Wang Bo, Jiang Haikun, Song Jin. Statistical Research on Magnitude Prediction of Reservoir Induced Earthquake[J].Earthquake

Journal of the Chinese Academy of Sciences, 2012,34(5):689-697,727.

[5] Wu Fang, et al. Earthquakes in mainland China based on wavelet analysis and least squares support vector machine

Research on magnitude prediction[J].Earthquake,2010(2):54-60.

[6] Wang Chenhui, Yuan Ying, Liu Lishen, et al. Optimizing generalized regression neural network based on principal component analysis

Earthquake magnitude prediction based on network[J]. Science Technology and Engineering, 2022,22(29):12733-12738.

Question 1: Find a suitable set of indicators for the seismic wave data in Annexes 1-8

Criteria and criteria, build a seismic source attribute identification model, and carry out natural earthquake events (Appendix 1-7)

Accurate distinction from non-natural seismic events (Appendix 8);

Problem-solving ideas:

Part 1:

Process the data, classify the data, and find a series of characteristic data such as the maximum value, minimum value, range, variance, etc. in the data. , merging natural and non-natural seismic data together. Do a good job of marking which accessory and which platform it is, and label it.

Form of generating CSV file

cb771c47cb6b49ab90e7a31781e63877.png

Part 2:

Conduct a regression analysis on the characteristic data to form a heat map of correlation. Specifically analyze which characteristic values are related to natural earthquakes and unnatural earthquakes to find key data.

f6c27e051e7b4aacb8e2982a6290881d.png

Part Three:

Consult information online and use the knowledge about artificial intelligence that you have learned before, such as: decision trees, random forests, linear regression and other algorithms, read the data, divide the training set and the test set

Finally, build the model.

Part 4: Optimize the algorithm.

92ffc0cc9d1a4706a1ce603fb6f179e6.png

b08acc2771554eac84a9c121c904b7f8.png

Summary

We first based on the data in the attachment, initially determined that this was a data analysis topic, and used the method of creating a svm classifier to conduct data analysis. First, read the relevant data, then convert the data and labels into numpy arrays, then divide the training set and the test set, then create the svm classifier, then use the fit method to train the model and then predict the model

Answer content:

Predicted label = sign(w^T * seismic wave data + b)

It should be noted that the specific weight vector (w) and bias term (b) are obtained through the optimization algorithm during the model training process and are determined by the input data and labels.

Problem 1 code analysis:

Data processing, labeling

#Data processing
import pandas as pd
import numpy as np
df_feature=pd.DataFrame([],columns=['event','observation station','mean','amplitude','standard deviation','minimum value' ,'maximum value','kurtosis','skewness','whether it is natural'])
def get_features(path,event,station):#For natural events 1~7
    fr = open(path, 'r')
    all_lines = fr.readlines()
    dataset = []
    for line in all_lines:
        line = line.strip().split(' ')
        dataset.append(line)
    #Convert to dataframe
    df = pd.DataFrame(dataset).T
    df[0] = pd.to_numeric(df[0], errors='coerce')
    range=df[0].max()-df[0].min()
    biaoqian=0 #Attachments 1-7
    features_list=[event,station,df[0].mean(),range,df[0].std(),df[0].min(),
                   df[0].max(),df[0].kurt(),df[0].skew(),biaoqian]
    return features_list
def get_features_8(path,event,station):#For unnatural events 8
    fr = open(path, 'r')
    all_lines = fr.readlines()
    dataset = []
    for line in all_lines:
        line = line.strip().split('\\
')
        dataset.append(line)
    #Convert to dataframe
    df = pd.DataFrame(dataset)
    df[0] = pd.to_numeric(df[0], errors='coerce')
    range=df[0].max()-df[0].min()
    biaoqian=1 #Attachment 8
    features_list=[event,station,df[0].mean(),range,df[0].std(),df[0].min(),
                   df[0].max(),df[0].kurt(),df[0].skew(),biaoqian]
    return features_list
for i in range(1,8):
    for j in range(1,21):
        path = 'A/attachment' + str(i) + '/' + str(j) + '.txt'
        df_feature.loc[len(df_feature),:]=get_features(path,i,j)
for i in range(8,9):
    for j in range(1,31):
        path = 'A/attachment' + str(i) + '/' + str(j) + '.txt'
        df_feature.loc[len(df_feature),:]=get_features_8(path,i,j)
print(df_feature)
df_feature.to_csv('Feature construction.csv',index=False)

0df77a9f9afb4d1b81ebe3f360c16271.png

a50be220654849609165dbb9218c7fc6.png

Filter data and generate heat maps

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#Read data file
data = pd.read_csv('Feature construction.csv')

#Extract the specified category name
columns = ['mean', 'amplitude', 'standard deviation', 'minimum value', 'maximum value', 'kurtosis', 'skewness']
data_selected = data[columns]

# Calculate the correlation coefficient matrix
corr_matrix = data_selected.corr()

# Draw correlation map (heat map)
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', square=True,
            xticklabels=columns, yticklabels=columns)

# Set axis label font
font_properties = {'family': 'Arial Unicode MS', 'size': 12}

plt.xticks(fontproperties=font_properties)
plt.yticks(fontproperties=font_properties)

#Set graphic title
plt.title('Correlation Heatmap')

# Display graphics
plt.show()

301bce7a32874ffa90327180f99ce60e.png

Generate a model and draw a line chart

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.font_manager as fm
# Read the seismic wave data and labels in the folder
data = []
labels = []
folder_path = "A/"

# Read all data files in the folder
for i in range(1, 8):
    for j in range(1, 21):
        file_path = folder_path + f"Attachment{i}/{j}.txt"
        with open(file_path, "r") as file:
            lines = file. readlines()
        wave_data = []
        for line in lines:
            line_data = line.strip().split() # Split into a single floating point number
            line_data = np.array([float(num) for num in line_data]) # Convert to floating point type
            wave_data. extend(line_data)
        data.append(wave_data)
        labels.append("natural earthquake")

for i in range(1, 31):
    file_path = folder_path + f"Attachment 8/{i}.txt"
    with open(file_path, "r") as file:
        lines = file. readlines()
    wave_data = []
    for line in lines:
        line_data = line.strip().split() # Split into a single floating point number
        line_data = np.array([float(num) for num in line_data]) # Convert to floating point type
        wave_data. extend(line_data)
    data.append(wave_data)
    labels.append("Unnatural earthquake")

#Convert data to numpy array
data = np.array(data)
labels = np.array(labels)

# Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

# create SVM classifier
model = SVC()

#Train model
model. fit(X_train, y_train)

# make predictions on the test set
y_pred = model. predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Draw a line chart
sns.set(style="darkgrid")
plt.rcParams['font.family'] = 'Arial Unicode MS'
# show real and predicted results
plt.plot(y_test, label='True Labels')
plt.plot(y_pred, label='Predicted Labels')
plt.xlabel("Sample Index")
plt.ylabel("Label")
plt.title("True and Predicted Labels")
plt.legend()

plt.tight_layout()
plt.show()

e6848dc54fd14fba81721ea2cda9836f.png

Question 2: The amplitude and waveform characteristics of seismic waves are significantly related to the magnitude. root

According to the data in Annexes 1 to 7 with known magnitudes (magnitudes are: 4.2, 5.0, 6.0, respectively)

6.4, 7.0, 7.4, 8.0), appropriately select events and samples, and establish a magnitude prediction model,

Attempt to give the exact magnitude (to one decimal place) of the seismic events in Attachment 9.

Problem-solving ideas:

Part 1:

Since the first question is to preliminarily classify earthquake types through the established model, it is considered that the amplitude and waveform characteristics of seismic waves are significantly related to the magnitude. Therefore, linear regression models are mainly used to predict the magnitude of earthquake events. First, the training set data is read and processed, and then the linear regression model is trained using X_train and y_train.

Part 2:

Next, read the test set data and process it into the feature matrix X_test. Finally, the trained model is used to predict X_test, and the prediction result y_pred of n earthquake magnitudes is obtained.

Answer content:

This code uses a linear regression model to predict the magnitude of earthquake events. The formula of the linear regression model is:

y = w^T * x + b

Where, y is the predicted earthquake magnitude, x is the input seismic wave data, w is the weight vector of the model, and b is the bias term.

Use the LinearRegression() function to create a linear regression model, use the fit() function to train the training set data, and then use the predict() function to predict the test set data.

The line chart shows the predicted earthquake magnitude for a seismic event, with the event number on the x-axis and the earthquake magnitude on the y-axis.

Code display:

import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import matplotlib
matplotlib.rcParams['font.family'] = 'Arial Unicode MS'
#Training set data path
train_folder = 'A/'
train_labels = [4.2, 5.0, 6.0, 6.4, 7.0, 7.4, 8.0]
X_train = []
y_train = []

# Read training set data
for i, label in enumerate(train_labels):
    folder_path = os.path.join(train_folder, f"attachment{i + 1}")
    for file in os.listdir(folder_path):
        if file.endswith('.txt'):
            file_path = os.path.join(folder_path, file)
            with open(file_path, 'r') as f:
                lines = f.readlines()
                magnitudes = []
                for line in lines[:20]: # Only read the first 20 data of each file
                    magnitude = float(line. strip(). split()[0])
                    magnitudes.append(magnitude)
                X_train.append(magnitudes)
                y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

# Create a linear regression model
model = LinearRegression()
# Model training
model. fit(X_train, y_train)

#Test set data path
test_folder = os.path.join(train_folder, 'Attachment 9')
X_test = []

#Read test set data
for file in os.listdir(test_folder):
    if file.endswith('.txt'):
        file_path = os.path.join(test_folder, file)
        with open(file_path, 'r') as f:
            lines = f.readlines()
            magnitudes = []
            for line in lines[:12]: # only read 12 data
                magnitude = float(line. strip(). split()[0])
                magnitudes.append(magnitude)
            X_test.append(magnitudes)

X_test = np.array(X_test)

# Predict earthquake magnitude
y_pred = model. predict(X_test)

# Generate line chart
x_axis = np.arange(1, len(y_pred) + 1)
plt.plot(x_axis, y_pred, marker='o')

# Set horizontal axis scale and label
plt. xticks(x_axis)

# Set chart title and axis labels
plt.title('Predicting earthquake magnitude of earthquake events')
plt.xlabel('Event number')
plt.ylabel('Earthquake magnitude')

plt.show()

1010757daba843b4b7b2fa53c1bb7677.png

Question 3: Reservoir depth, storage capacity, fault type, tectonic activity/basic intensity, lithology, etc.

It is an important factor affecting the magnitude of reservoir-induced earthquakes. Please follow the 102 items in Annex 10

Reservoir earthquake samples, try to establish a relationship model between basic reservoir attribute data and earthquake magnitude, and provide

Give a reasonable basis.

Question Three:

Problem solving ideas:

Part 1:

We first use the Pandas library to read the data in the CSV file. We then use LabelEncoder to label encode the categorical variables.

Part Two:

Next, we prepare the independent variable X and the dependent variable y. Then, we use the train_test_split function to split the dataset into training and test sets.

Part Three:

We then built a linear regression model and fit it using the training set data.

Part Four:

Finally, we used the test set data to make predictions and calculated the root mean square error (RMSE) between the prediction results and the true values as the model evaluation metric.

Answer content:

Magnitude = intercept + w1 * reservoir depth/m + w2 * reservoir volume + w3 * fault type + w4 * tectonic activity/basic intensity + w5 * lithology

Among them,
The intercept is the value of model.intercept_ ,
w1 ~ w5 is the coefficient corresponding to the value of model.coef_

Code Analysis:

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#Read data
data = pd.read_csv('A\Attachment 10.csv')

# Label-encode categorical variables
label_encoder = LabelEncoder()
data['fault type'] = label_encoder.fit_transform(data['fault type'])
data['tectonic activity/basic intensity'] = label_encoder.fit_transform(data['tectonic activity/basic intensity'])
data['lithology'] = label_encoder.fit_transform(data['lithology'])

# Prepare independent variables and dependent variables
X = data[['reservoir depth/m', 'reservoir capacity', 'fault type', 'tectonic activity/basic intensity', 'lithology']]
y = data['magnitude']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a linear regression model and fit the data
model = LinearRegression()
model. fit(X_train, y_train)

#Print regression coefficients and intercepts
print('regression coefficient:', model.coef_)
print('intercept:', model.intercept_)

# predict and evaluate the model
y_pred = model. predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
print('root mean square error:', rmse)

eb81f0bb83a442888f304904003f6756.png