Question A: Construction of earthquake source attribute identification model and magnitude prediction: code analysis:

Question 1:
For the seismic wave data in appendices 1 to 8, find a series of suitable index

Standards and criteria, build a seismic source attribute identification model, and carry out natural earthquake events (Appendix 1-7)

Accurate distinction from unnatural seismic events (Annex 8);

Question one:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.font_manager as fm
# Read the seismic wave data and labels in the folder
data = []
labels = []
folder_path = "A/"

# Read all data files in the folder
for i in range(1, 8):
    for j in range(1, 21):
        file_path = folder_path + f"Attachment{i}/{j}.txt"
        with open(file_path, "r") as file:
            lines = file. readlines()
        wave_data = []
        for line in lines:
            line_data = line.strip().split() # Split into a single floating point number
            line_data = np.array([float(num) for num in line_data]) # convert to floating point type
            wave_data.extend(line_data)
        data.append(wave_data)
        labels.append("natural earthquake")

for i in range(1, 31):
    file_path = folder_path + f"Attachment 8/{i}.txt"
    with open(file_path, "r") as file:
        lines = file. readlines()
    wave_data = []
    for line in lines:
        line_data = line.strip().split() # Split into a single floating point number
        line_data = np.array([float(num) for num in line_data]) # Convert to floating point type
        wave_data.extend(line_data)
    data.append(wave_data)
    labels.append("Unnatural earthquake")

#Convert data to numpy array
data = np.array(data)
labels = np.array(labels)

# Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

# create SVM classifier
model = SVC()

#Train model
model. fit(X_train, y_train)

# make predictions on the test set
y_pred = model. predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Draw a line chart
sns.set(style="darkgrid")
plt.rcParams['font.family'] = 'Arial Unicode MS'
# show real and predicted results
plt.plot(y_test, label='True Labels')
plt.plot(y_pred, label='Predicted Labels')
plt.xlabel("Sample Index")
plt.ylabel("Label")
plt. title("True and Predicted Labels")
plt.legend()

plt.tight_layout()
plt.show()

When analyzing this code in chunks, it can be divided into the following parts:
first part:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns from sklearn.svm
import SVCfrom sklearn.model_selection
import train_test_splitfrom sklearn.metrics
import accuracy_scoreimport matplotlib.font_manager as fm

This part of the code includes the introduction of the Python libraries that need to be used.
the second part:

# Read seismic wave data and labels in the folder
data = []
labels = []
folder_path = "A/"
# Read all data files in the folder for i in range(1, 8):
    for j in range(1, 21):
        file_path = folder_path + f"attachment {i}/{j}.txt"
        with open(file_path, "r") as file:
            lines = file. readlines()
        wave_data = []
        for line in lines:
            line_data = line.strip().split() # Split into a single floating point number
            line_data = np.array([float(num) for num in line_data]) # Convert to floating point type
            wave_data. extend(line_data)
        data.append(wave_data)
        labels.append("Natural Earthquake")
for i in range(1, 31):
    file_path = folder_path + f"Attachment 8/{i}.txt"
    with open(file_path, "r") as file:
        lines = file. readlines()
    wave_data = []
    for line in lines:
        line_data = line.strip().split() # Split into a single floating point number
        line_data = np.array([float(num) for num in line_data]) # Convert to floating point type
        wave_data. extend(line_data)
    data.append(wave_data)
    labels.append("Non-natural earthquake")

This part of the code reads the seismic wave data and labels in the folder, stores the data in the data variable, and stores the labels in the labels variable.
Through nested loops, read each data file in turn, concatenate the file paths and read the file contents through the open function, and use the readlines method to obtain the data of each line. Then, through string processing and type conversion, each row of data is converted into a numpy array, and the data is added to the data list according to the attachment type, and the labels are added to the labels list.
the third part:

# Convert data to numpy array
data = np.array(data)
labels = np.array(labels)

This part of the code converts the data and labels lists into numpy arrays.
fourth part:

# Divide training set and test set
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

This part of the code uses the train_test_split function to divide the data and labels into a training set and a test set. The test_size parameter specifies the proportion of the test set size, and the random_state parameter is used to specify the random number seed.
the fifth part:

# Create SVM classifier
model = SVC()
#Train model
model.fit(X_train, y_train)

This part of the code creates a support vector machine classifier model and trains the model through the fit method.
Part VI:

# make predictions on the test set
y_pred = model. predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)print("Accuracy:", accuracy)

This part of the code uses the trained model to predict the test set, then calculates the accuracy of the model on the test set through the accuracy_score function, and prints the results.
Part Seven:

# Draw a line chart
sns.set(style="darkgrid")
plt.rcParams['font.family'] = 'Arial Unicode MS'# Display real results and predicted results
plt.plot(y_test, label='True Labels')
plt.plot(y_pred, label='Predicted Labels')
plt.xlabel("Sample Index")
plt.ylabel("Label")
plt. title("True and Predicted Labels")
plt.legend()

plt.tight_layout()
plt.show()

This part of the code uses matplotlib.pyplot and the seaborn library to draw a line chart to visually display the real results and predicted results. The abscissa and ordinate labels are set, the chart title is added, and the legend is added using the legend function.
Finally, use the tight_layout function to adjust the chart layout, and use the show function to display the chart.

Question 2:
The amplitude, waveform characteristics and magnitude of seismic waves are significantly related. root

According to the data in Annexes 1 to 7 with known magnitudes (magnitudes are: 4.2, 5.0, 6.0,

6.4, 7.0, 7.4, 8.0), appropriately select events and samples, and establish a magnitude prediction model.

Attempt to give the exact magnitude (to one decimal place) of the seismic events in Appendix 9.

Question two:

import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import matplotlib
matplotlib.rcParams['font.family'] = 'Arial Unicode MS'
#Training set data path
train_folder = 'A/'
train_labels = [4.2, 5.0, 6.0, 6.4, 7.0, 7.4, 8.0]
X_train = []
y_train = []

# Read training set data
for i, label in enumerate(train_labels):
    folder_path = os.path.join(train_folder, f"attachment{i + 1}")
    for file in os.listdir(folder_path):
        if file.endswith('.txt'):
            file_path = os.path.join(folder_path, file)
            with open(file_path, 'r') as f:
                lines = f.readlines()
                magnitudes = []
                for line in lines[:20]: # Only read the first 20 data of each file
                    magnitude = float(line. strip(). split()[0])
                    magnitudes.append(magnitude)
                X_train.append(magnitudes)
                y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

# Create a linear regression model
model = LinearRegression()
# Model training
model. fit(X_train, y_train)

#Test set data path
test_folder = os.path.join(train_folder, 'Appendix 9')
X_test = []

#Read test set data
for file in os.listdir(test_folder):
    if file.endswith('.txt'):
        file_path = os.path.join(test_folder, file)
        with open(file_path, 'r') as f:
            lines = f.readlines()
            magnitudes = []
            for line in lines[:12]: # only read 12 data
                magnitude = float(line.strip().split()[0])
                magnitudes.append(magnitude)
            X_test.append(magnitudes)

X_test = np.array(X_test)

# Predict earthquake magnitude
y_pred = model. predict(X_test)

# Generate line chart
x_axis = np.arange(1, len(y_pred) + 1)
plt.plot(x_axis, y_pred, marker='o')

# Set horizontal axis scale and label
plt.xticks(x_axis)

# Set chart title and axis labels
plt.title('Predicting earthquake magnitude of earthquake events')
plt.xlabel('event number')
plt.ylabel('earthquake magnitude')

plt.show()

When chunking this code, it can be broken down into the following sections:
first part:
import os import numpy as np from sklearn. linear_model import LinearRegression
This part of the code includes importing the Python libraries that need to be used.
the second part:

# Training set data path
train_folder = 'A/'
train_labels = [4.2, 5.0, 6.0, 6.4, 7.0, 7.4, 8.0]
X_train = []
y_train = []
# Read the training set data for i, label in enumerate(train_labels):
    folder_path = os.path.join(train_folder, f"attachment{i+1}")
    for file in os.listdir(folder_path):
        if file.endswith('.txt'):
            file_path = os.path.join(folder_path, file)
            with open(file_path, 'r') as f:
                lines = f.readlines()
                magnitudes = []
                for line in lines[:20]: # Only read the first 20 data of each file
                    magnitude = float(line.strip().split()[0])
                    magnitudes.append(magnitude)
                X_train.append(magnitudes)
                y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

This part of the code defines the folder path of the training set, the label list, and the empty X_train and y_train lists.
Through nested loops, the folders and files of the training set are traversed in turn. Use the os.path.join function to join the folder path and file name, open the file through the open function, and use the readlines method to obtain the data of each line. Then, through string processing and type conversion, the first value of each row of data is converted into a floating point number, and the first 20 data are added to the X_train list, and the corresponding labels are added to the y_train list.
Finally, convert the X_train list and y_train list to numpy arrays.
the third part:

# Build a linear regression model
model = LinearRegression()# model training
model.fit(X_train, y_train)

This part of the code creates a linear regression model and trains the model through the fit method.
fourth part:

# Test set data path
test_folder = os.path.join(train_folder, 'Appendix 9')
X_test = []
# Read test set data for file in os.listdir(test_folder):
    if file.endswith('.txt'):
        file_path = os.path.join(test_folder, file)
        with open(file_path, 'r') as f:
            lines = f.readlines()
            magnitudes = []
            for line in lines[:12]: # Only read 12 data
                magnitude = float(line.strip().split()[0])
                magnitudes.append(magnitude)
            X_test.append(magnitudes)

X_test = np.array(X_test)

This part of the code defines the folder path of the test set and the empty X_test list.
Use the os.path.join function to splice the folder path and file name, use the open function to open the file, and use the readlines method to obtain the data of each line. Then, through string processing and type conversion, the first value of each row of data is converted into a floating point number, and the first 12 data are added to the X_test list.
Finally, convert the X_test list to a numpy array.
the fifth part:

# Predict earthquake magnitude
y_pred = model. predict(X_test)
#Print the prediction results print('The earthquake magnitude predicted for the earthquake event is:') for i, pred in enumerate(y_pred):
    print(f'event{i + 1}: {round(pred, 1)}')

This part of the code uses the trained linear regression model to predict the test set, stores the prediction results in the y_pred variable, and prints the predicted earthquake magnitude for each event through a loop.

When analyzing this code in chunks, it can be divided into the following parts:
first part:

import osimport numpy as np
import matplotlib.pyplot as pltfrom sklearn.linear_model
import LinearRegression
import matplotlib
matplotlib.rcParams['font.family'] = 'Arial Unicode MS'

This part of the code includes importing the Python library that needs to be used, and setting the font used to ‘Arial Unicode MS’.
the second part:

# Training set data path
train_folder = 'A/'
train_labels = [4.2, 5.0, 6.0, 6.4, 7.0, 7.4, 8.0]
X_train = []
y_train = []
# Read the training set data for i, label in enumerate(train_labels):
    folder_path = os.path.join(train_folder, f"attachment{i + 1}")
    for file in os.listdir(folder_path):
        if file.endswith('.txt'):
            file_path = os.path.join(folder_path, file)
            with open(file_path, 'r') as f:
                lines = f.readlines()
                magnitudes = []
                for line in lines[:20]: # Only read the first 20 data of each file
                    magnitude = float(line.strip().split()[0])
                    magnitudes.append(magnitude)
                X_train.append(magnitudes)
                y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

This part of the code defines the folder path of the training set, the label list, and the empty X_train and y_train lists.
Through nested loops, iterate through the folders and files of the training set in sequence. Use the os.path.join function to splice the folder path and file name, open the file through the open function, and use the readlines method to obtain the data of each line. Then, through string processing and type conversion, the first value of each row of data is converted into a floating point number, and the first 20 data are added to the X_train list, and the corresponding labels are added to the y_train list.
Finally, convert the X_train list and y_train list to numpy arrays.
the third part:

# Build a linear regression model
model = LinearRegression()# model training
model.fit(X_train, y_train)

This part of the code creates a linear regression model and trains the model through the fit method.
fourth part:

# test set data path
test_folder = os.path.join(train_folder, 'Appendix 9')
X_test = []
# Read test set data for file in os.listdir(test_folder):
    if file.endswith('.txt'):
        file_path = os.path.join(test_folder, file)
        with open(file_path, 'r') as f:
            lines = f.readlines()
            magnitudes = []
            for line in lines[:12]: # only read 12 data
                magnitude = float(line. strip(). split()[0])
                magnitudes.append(magnitude)
            X_test.append(magnitudes)

X_test = np.array(X_test)

This part of the code defines the folder path of the test set and an empty X_test list.
Use the os.path.join function to join the folder path and file name, use the open function to open the file, and use the readlines method to obtain the data of each line. Then, through string processing and type conversion, the first value of each row of data is converted into a floating point number, and the first 12 data are added to the X_test list.
Finally, convert the X_test list to a numpy array.
the fifth part:

# Predict earthquake magnitude
y_pred = model. predict(X_test)

This part of the code uses the trained linear regression model to make predictions on the test set, and stores the prediction results in the y_pred variable.
Part VI:

# Generate a line chart
x_axis = np.arange(1, len(y_pred) + 1)
plt.plot(x_axis, y_pred, marker='o')
# Set horizontal axis scale and label
plt. xticks(x_axis)
# Set chart title and axis labels
plt.title('Predicting earthquake magnitude of earthquake events')
plt.xlabel('event number')
plt.ylabel('earthquake magnitude')

plt.show()

This part of the code uses the Matplotlib library to generate a line chart, where x_axis defines the horizontal axis scale, plt.plot draws the line chart and sets the marker points, plt.xticks sets the horizontal axis scale and labels, plt.title, plt.xlabel and plt. ylabel sets the chart title and axis labels. Finally, display the generated line chart via plt.show.

Question 3:
Reservoir depth, storage capacity, fault type, tectonic activity/basic intensity, lithology, etc.

It is an important factor affecting the magnitude of reservoir-induced earthquakes. Please follow the 102 items in Annex 10

Reservoir seismic samples, try to establish a relationship model between the basic attribute data of the reservoir and the magnitude, and give

give reasonable grounds

Question three:

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#Read data
data = pd.read_csv('A\Attachment 10.csv')

# Label-encode categorical variables
label_encoder = LabelEncoder()
data['fault type'] = label_encoder.fit_transform(data['fault type'])
data['tectonic activity/basic intensity'] = label_encoder.fit_transform(data['tectonic activity/basic intensity'])
data['lithology'] = label_encoder.fit_transform(data['lithology'])

# Prepare independent variables and dependent variables
X = data[['reservoir depth/m', 'reservoir capacity', 'fault type', 'tectonic activity/basic intensity', 'lithology']]
y = data['magnitude']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a linear regression model and fit the data
model = LinearRegression()
model. fit(X_train, y_train)

#Print regression coefficients and intercepts
print('Regression coefficient:', model.coef_)
print('intercept:', model.intercept_)

# predict and evaluate the model
y_pred = model. predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
print('Root mean square error:', rmse)

When chunking this code, it can be broken down into the following sections:
first part:

import pandas as ppdf from sklearn.linear_model
import LinearRegression from sklearn. preprocessing
import LabelEncoderfrom sklearn.model_selection
import train_test_splitfrom sklearn.metrics
import mean_squared_error

This part of the code includes importing the Python libraries that need to be used.
the second part:

# read data
data = pd.read_csv('A\Attachment 10.csv')

This part of the code uses the read_csv function of the pandas library to read the CSV file named “Attachment 10.csv” and store the data in the data variable.
the third part:

# Label-encode categorical variables
label_encoder = LabelEncoder()
data['fault type'] = label_encoder.fit_transform(data['fault type'])
data['tectonic activity/basic intensity'] = label_encoder.fit_transform(data['tectonic activity/basic intensity'])
data['lithology'] = label_encoder.fit_transform(data['lithology'])

This part of the code uses the LabelEncoder class of the sklearn.preprocessing library to encode the labels of the categorical variables in the data and convert them into numerical forms.
fourth part:

# Prepare independent variables and dependent variables
X = data[['reservoir depth/m', 'reservoir capacity', 'fault type', 'tectonic activity/basic intensity', 'lithology']]
y = data['magnitude']

This part of the code stores the encoded independent variable in the X variable and the dependent variable in the y variable.
the fifth part:

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

This part of the code uses the train_test_split function of the sklearn.model_selection library to split the data set into a training set and a test set, of which the training set accounts for 80% and the test set accounts for 20%.
Part VI:

# Build a linear regression model and fit the data
model = LinearRegression()
model.fit(X_train, y_train)

This part of the code uses the LinearRegression class of the sklearn.linear_model library to build a linear regression model and uses the training set data to fit the model.
Part VII:

# Print regression coefficients and intercepts
print('Regression coefficient:', model.coef_)
print('Intercept:', model.intercept_)

This part of the code prints out the regression coefficients and intercepts of the linear regression model.
Part VIII:

# Predict and evaluate the model
y_pred = model. predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False) print('root mean square error:', rmse)

This part of the code uses the trained regression model to predict the test set data, calculates the root mean square error between the predicted results and the actual results, and finally prints the root mean square error.