Python: How to analyze the correlation coefficient between climate factors based on sliding window? (pixel by pixel)

Directory

01 A brief description of the conventional correlation coefficient

02 Correlation coefficient analysis under sliding window


I recently dealt with the statistical analysis of some climate factors and encountered some problems, so please record them.

01 Simple description of conventional correlation coefficient

Before studying the sliding window, let’s study the conventional correlation coefficient analysis. In order to simplify the problem, suppose we now have the monthly precipitation data and NDVI images in Sichuan Province in 2019. We want to study the precipitation and NDVI in different regions The correlation, that is, to study the spatial correlation between precipitation and NDVI.

The basic idea is: for each pixel, there are 12 values of precipitation and NDVI index from January to December, then based on these two columns of data, we can calculate the correlation coefficient between precipitation and NDVI at the pixel location (here I recommend using the Spearman coefficient, because the Pearson coefficient requires that our two columns of data satisfy a normal distribution, but in fact we often assume that our data satisfy a normal distribution, so the calculated correlation coefficient is actually biased ), we can calculate a correlation coefficient for each pixel, so that the spatial distribution map of the correlation coefficient within Sichuan Province can be obtained.

After the idea is finished, we can get the following image.

02 Correlation coefficient analysis under sliding window

Here comes the point, we want to see the correlation coefficient within a certain range, what should we do?

Explanation: I currently have 4 excel files, which are precipitation data, NDVI data, surface temperature data, and soil moisture data. For each excel file, the first 5 columns represent topographic factor data, and the next 13 columns are precipitation/NDVI/surface temperature/soil moisture data from 2003 to 2015, and each row represents the topographic factor value and climate factor of a pixel value.

Roughly as follows:

Idea: Assume that the sliding window is 3*3 (think of the spatial analysis tools in ArcGIS – regional statistics – focal statistics, a little bit similar), starting from the first pixel position, for the moving window of the cell (including itself and surrounding 8 pixels), we do the following processing for each pixel in the window: get the precipitation value of the window pixel (a total of 13 values from 2003 to 2015) and NDVI value, and calculate the pixel Correlation coefficient for location. Then we calculated a total of 9 correlation coefficients for this window, and then we calculated the average value of these 9 correlation coefficients as the correlation coefficient value of the central pixel. Similarly, this is done for precipitation and surface temperature, precipitation and soil moisture.

Here is the code implemented using python:

# @fried eggplant 2023-05-18

import numpy as np
import pandas as pd
from scipy.stats import spearmanr
# Note: The Pearson coefficient is used here to calculate the correlation coefficient, but the Pearson coefficient requires the data to meet the normal distribution, but in fact many data do not meet the normal distribution, so try to use the Spearman coefficient for correlation here Calculation of coefficients
# Note: The Spearman coefficient requires that the data must be ordered, that is, the data has a comparable size relationship
"""
The current program uses the calculation of the correlation coefficient between two variables under different spatial windows to explore whether there is a certain relationship between the two variables in space.
"""
# Read annual average precipitation and NDVI data, surface temperature data, soil moisture data
precip_data_year = pd.read_excel(r'E:\PRCP\table\kinds_aver\precipitation_sc_year.xlsx', sheet_name='precipitation annual average data')
ndvi_data_year = pd.read_excel(r'E:\PRCP\table\kinds_aver\
dvi_sc_year.xlsx', sheet_name='NDVI annual average data')
temperature_data_year = pd.read_excel(r'E:\PRCP\table\kinds_aver\temperature_sc_year.xlsx', sheet_name='surface temperature annual mean data')
soil_moisture_data_year = pd.read_excel(r'E:\PRCP\table\kinds_aver\soil_moisture_sc_year.xlsx', sheet_name='soil moisture annual average data')
# Read precipitation data and NDVI and other data (do not read terrain factors and longitude and latitude columns)
_precip_data_year = precip_data_year.iloc[:, 5:]
_ndvi_data_year = ndvi_data_year.iloc[:, 5:]
_temperature_data_year = temperature_data_year.iloc[:, 5:]
_soil_moisture_data_year = soil_moisture_data_year.iloc[:, 5:]
# Create an empty three-dimensional array for the transformed result (83, 113, number_of_years)
precip_data_year = np.zeros((83, 113, len(_precip_data_year.columns)), dtype=np.float32)
ndvi_data_year = np.zeros((83, 113, len(_ndvi_data_year.columns)), dtype=np.float32)
temperature_data_year = np.zeros((83, 113, len(_temperature_data_year.columns)), dtype=np.float32)
soil_moisture_data_year = np.zeros((83, 113, len(_soil_moisture_data_year.columns)), dtype=np.float32)
# Convert a 2D array to a 3D array
for i in range(len(_precip_data_year.columns)):
    precip_data_year[:, :, i] = _precip_data_year.iloc[:, i].values.reshape(83, 113)
    ndvi_data_year[:, :, i] = _ndvi_data_year.iloc[:, i].values.reshape(83, 113)
    temperature_data_year[:, :, i] = _temperature_data_year.iloc[:, i].values.reshape(83, 113)
    soil_moisture_data_year[:, :, i] = _soil_moisture_data_year.iloc[:, i].values.reshape(83, 113)

# List of window sizes
window_sizes = [5, 7, 9, 11, 13, 15, 17, 19, 21]
# Initialize the correlation coefficient DataFrame
correlation_dict = {}

for window_size in window_sizes:
    # Initialize the correlation coefficient matrix (np.nan initialization)
    correlation_matrix = np.zeros((precip_data_year.shape[0], precip_data_year.shape[1], 3), dtype=np.float32)
    correlation_matrix[:] = np.nan

    # For each window, calculate the Spearman correlation coefficient
    for i in range(precip_data_year.shape[0] - window_size):
        for j in range(precip_data_year.shape[1] - window_size):
            # Extract the data in the window
            window_precipitation = precip_data_year[i:i + window_size, j:j + window_size, :]
            window_ndvi = ndvi_data_year[i:i + window_size, j:j + window_size, :]
            window_temperature = temperature_data_year[i:i + window_size, j:j + window_size, :]
            window_soil_moisture = soil_moisture_data_year[i:i + window_size, j:j + window_size, :]

            # Compute the correlation coefficient on each pixel in the window
            corr_per_pixel = np.zeros((window_size, window_size, 3), dtype=np.float32)
            for k in range(window_size):
                for l in range(window_size):
                    corr_per_pixel[k, l, 0], _ = spearmanr(window_precipitation[k, l, :], window_ndvi[k, l, :])
                    corr_per_pixel[k, l, 1], _ = spearmanr(window_precipitation[k, l, :], window_temperature[k, l, :])
                    corr_per_pixel[k, l, 2], _ = spearmanr(window_precipitation[k, l, :], window_soil_moisture[k, l, :])

            # Calculate the average correlation coefficient within the window
            correlation_matrix[i, j, 0] = np.mean(corr_per_pixel[:, :, 0])
            correlation_matrix[i, j, 1] = np.mean(corr_per_pixel[:, :, 1])
            correlation_matrix[i, j, 2] = np. mean(corr_per_pixel[:, :, 2])

    # Add the correlation coefficient matrix to correlation_matrices
    correlation_dict['precipitation_ndvi_' + str(window_size)] = pd.Series(
        correlation_matrix[:, :, 0][~np.isnan(correlation_matrix[:, :, 0])].flatten())
    correlation_dict['precipitation_temperature_' + str(window_size)] = pd.Series(
        correlation_matrix[:, :, 1][~np.isnan(correlation_matrix[:, :, 1])].flatten())
    correlation_dict['precipitation_soil_moisture_' + str(window_size)] = pd.Series(
        correlation_matrix[:, :, 2][~np.isnan(correlation_matrix[:, :, 2])].flatten())
# Save the correlation coefficient matrix to an Excel file
df = pd. concat(correlation_dict, axis=1)
df.to_excel(r'E:\PRCP\table\window_corr\correlation_matrices.xlsx', index=False, sheet_name='Annual average correlation coefficient')

Note that it is recommended to edit and modify the value of window_sizes during runtime. Since I am not using parallel processing, so many window size loops will cause long runtime.

Finally, we draw the violin plot of the output excel data (only the data with a window size of 5 is used for drawing):