Practical implementation of motor imagery EEG classification task based on CSP

Public data set based on motor imagery: Data set IVa (BCI Competition III)1
For data description, please refer to the previous article: https://blog.csdn.net/qq_43811536/article/details/134224005?spm=1001.2014.3001.5501
For time-frequency spatial domain analysis of EEG signals, please refer to the previous article: https://blog.csdn.net/qq_43811536/article/details/134273470?spm=1001.2014.3001.5501
For CSP-based motor imagery EEG feature extraction and visualization, please refer to the previous article: https://blog.csdn.net/qq_43811536/article/details/134296308?spm=1001.2014.3001.5501
CSP (Common Spatial Patterns) – For detailed explanation of EEG feature extraction method, please refer to the previous article: https://blog.csdn.net/qq_43811536/article/details/134296840?spm=1001.2014.3001.5501

This article uses some subject data from the public data set Data set IVa. The data has been made public and can be obtained from the network disk:
Link: https://pan.quark.cn/s/5425ee5918f4
Extraction code: hJFz

Directory

  • 1. Experiment introduction
  • 2. Classification of motor imagery
    • 2.1 Classification performance
    • 2.2 Conclusion
  • 3. Core Python code

1. Experiment introduction

The experimental data for this task came from a healthy subject, codenamed al. The subject completed one of the following three motor imaginations within 3.5 s after the visual cue appeared: (L) left hand, (R) right hand, (F) right foot. The data in the classification task only included the right hand and right foot, with a total of 280 trials. During the experiment, an EEG cap was used to record 118 channels of EEG signals. The electrode positions are shown in Figure 1. The collected EEG signal is first band-pass filtered (0.05-200Hz), and then digitized and down-sampled to obtain a signal with a sampling rate of 100Hz.


Figure 1 Electrode location

2. Motor imagery classification

Based on CSP features, we use LDA classifier for classification and perform ten-fold cross-validation to evaluate the performance. The evaluation index is the test set accuracy, which is the proportion of correctly classified trials to the total trials.

2.1 Classification performance

We compared the results for different bandpass filters and time windows.

  • In Figure 1, the horizontal axis is the time window compared to the starting time of the prompt appearance. Different polylines represent different window lengths. We found that a window length of 3s can achieve higher classification accuracy. The time window starting from 0.5s after the prompt appears has better results, and the classification accuracy reaches 1.
  • Figure 2 shows the impact of the filter cutoff frequency on accuracy. It can be seen that the accuracy can reach 1 when the low-frequency cutoff frequency is 10-12Hz.
  • We also compared the performance of the LDA classifier with linear regression (LR) and random forest (RF) methods, and the results are shown in Table 1. The accuracy of the LDA classifier is higher than that of LR and RF, but the classification performance is higher.
  • Finally, we remove the module for extracting CSP features and directly use the LDA classifier on the original signal. The results are shown in Figure 3. After removing the CSP extraction module, the classification accuracy dropped from 1 to about 0.6.


Figure 1 Time window search results. The horizontal axis is the time window compared to the start time of the prompt appearance. Different polylines represent different window lengths.


Figure 2 Filter parameter search results. The horizontal axis is the low-frequency cutoff frequency, and the bandwidth is fixed at 16Hz.




Table 1 Classification results of different classifiers

Method Accuracy
LDA 1
LR 0.99±0.01
RF 0.99±0.01


Figure 3 Ablation experiment. Using LDA directly on the original signal has poor performance.

2.2 Conclusion

Experiments have shown that the EEG differences in motor imagination of the right hand and the right foot are concentrated in the μ rhythm signal (8-15Hz) and the β rhythm (18-24Hz), which are reflected in the C3 and C4 channels, that is, the sensorimotor area. The features extracted using the CSP algorithm have high linear separability. Using the LDA classifier can achieve an accuracy of 1, which can effectively distinguish the two types of motor imagery. Experiments have found that the time window range and bandpass filtering range used for classification tasks have a great impact on classification accuracy. The optimal time window is 0.5s-3.5s after the prompt appears, and the optimal frequency band is 12Hz-28Hz.

3. Core Python code

  • Some variable descriptions:
    • raw: Created by the mne.io.RawArray() function and represents raw EEG data
    • epochs: Created by the mne.Epochs() function, representing all data corresponding to an event (event). In this data set, an event is “right hand” or “foot” imaginary movement
# BP Filter
l_fr, h_fr = 12.0, 28.0
tMin, tMax = 0.5, 3.5

# MNE object
info = mne.create_info(
    ch_names=[i[0] for i in ch_name],
    sfreq=eeg_fs,
    ch_types='eeg')
pos_dic = dict(zip(info.ch_names, ch_pos))
montage = mne.channels.make_dig_montage(pos_dic)

info.set_montage(montage)
raw = mne.io.RawArray(eeg_data.T, info)
# Apply band-pass filter
raw.filter(l_fr, h_fr, fir_design="firwin", skip_by_annotation="edge")

#Decoding


events = np.vstack((cues_pos, np.zeros(len(cues_pos)), target_label[0, :])).T.astype(int)
picks = mne.pick_types(raw.info, meg=False, eeg=True, stim=False, eog=False, exclude="bads")

#Epoches
epochs = mne.Epochs(
    raw,
    events,
    events_id,
    tMin,
    tMax,
    proj=True,
    picks=picks,
    baseline=None,
    preload=True,
)

# Prepare data for training
x = epochs.get_data()
y = target_label[0, :]

# ten-fold cross-validation
cv = ShuffleSplit(10, test_size=test_r, random_state=42)

# Classification with LDA on CSP features
lda = LinearDiscriminantAnalysis()
csp = CSP(n_components=10, reg=None, log=True, norm_trace=False)
clf = Pipeline([("CSP", csp), ("LDA", lda)])

from sklearn.metrics import accuracy_score

train_x, test_x = x[:224], x[224:]
train_y, test_y = y[:224], y[224:]

clf.fit(train_x,train_y)

pred1 = clf.predict(train_x)
accuracy1 = accuracy_score(train_y,pred1)
print('Accuracy on training set: %.4f'?curacy1)

pred2 = clf.predict(test_x)
accuracy2 = accuracy_score(test_y,pred2)
print('Accuracy on test set: %.4f'?curacy2)
# Model comparison
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

lda = LinearDiscriminantAnalysis()
csp = CSP(n_components=10, reg=None, log=True, norm_trace=False)
clf_lda = Pipeline([("CSP", csp), ("LDA", lda)])
scores_lda = cross_val_score(clf_lda, x, y, cv=cv, n_jobs=None)

lr = LogisticRegression()
csp = CSP(n_components=10, reg=None, log=True, norm_trace=False)
clf_lr = Pipeline([("CSP", csp), ("LR", lr)])
scores_lr = cross_val_score(clf_lr, x, y, cv=cv, n_jobs=None)

rfc = RandomForestClassifier()
csp = CSP(n_components=10, reg=None, log=True, norm_trace=False)
clf_rfc = Pipeline([("CSP", csp), ("RFC", rfc)])
scores_rfc = cross_val_score(clf_rfc, x, y, cv=cv, n_jobs=None)
print(scores_lda, scores_lr, 'scores_svc', scores_rfc)
# Without CSP
lda = LinearDiscriminantAnalysis()
scores_lda_only = cross_val_score(lda, x.reshape(-1,118*301), y, cv=cv, n_jobs=None)
print(scores_lda_only)

plt.plot(scores_lda,'-o',linewidth=2)
plt.plot(scores_lda_only,'-d',linewidth=2)
plt.xlabel('Folds',fontsize=16)
plt.ylabel('Accuracy',fontsize=16)
plt.legend(['CSP + LDA','LDA'],fontsize=16)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.ylim([0,1.1])
plt.show()


  1. https://bbci.de/competition/iii/desc_IVa.html