Flush Supermind Quantitative Trading Financial Analysis Modeling-Statistical Arbitrage: Using Correlation Coefficients for Paired Trading

In this article, you will learn how to use correlation coefficients to find stock targets suitable for arbitrage, write arbitrage strategies, and conduct arbitrage transactions.

Part 4: Statistical Arbitrage: Using Correlation Coefficients for Pair Trading

Introduction: In this article, you will learn how to use correlation coefficients to find stock targets suitable for arbitrage, write arbitrage strategies, and conduct arbitrage transactions.

What is Pairs Trading?

Pair trading refers to a market-neutral investment strategy proposed by a quantitative analysis team established by Nunzio Tartaglia, a quantitative trader at Morgan Stanley, a famous Wall Street investment bank, in the mid-1980s. Ganapathy Vidyamurthy defines pairs trading in the book “Pairs Trading: Quantitative Methods and Analysis” as two types: one is pair trading based on statistical arbitrage, and the other is pair trading based on risk arbitrage.

The paired trading strategy based on statistical arbitrage is a market-neutral strategy. Specifically, it refers to pairing stocks with similar historical stock price trends from the market. When the price difference (Spreads) of the paired stocks deviates from the historical mean, then Shorting stocks with higher stock prices while buying stocks with lower stock prices, waiting for them to return to the long-term equilibrium relationship, thereby earning rewards for the price convergence of the two stocks.

Pair Trading Principle

The basic principle of the pair trading strategy is based on two stocks or other securities with high correlation. If they maintain a good correlation in the future period, once there is a divergence between the two, and this divergence will occur in the future, If corrected, arbitrage opportunities may arise. For the practice of pair trading, if there is a divergence between two stocks or other securities with high correlation, the one with relatively poor performance should be bought and the one with relatively good performance should be sold. When the divergence between the two is corrected in the future, the opposite closing operation can be performed to obtain profits.

Because pairs trading exploits short-term mispricing between pairs, by holding relatively undervalued stocks and shorting relatively overvalued stocks, it is essentially a reversal investment strategy, and its core is the mean reversion of stock prices in academic literature. Although the paired trading strategy is very simple, it is widely used. The main reasons why it can be widely used are: first, the returns of paired trading are independent of the market, that is, market neutral, which means that it is consistent with the rise or fall of the market. It has nothing to do with the decline; secondly, the volatility of its returns is relatively small; thirdly, its returns are relatively stable.

Statistical arbitrage: using correlation coefficients for pair trading

1. When you first think about using statistical arbitrage, you may think about whether the correlation coefficient of two stocks will lead to a specific relationship between the trends of the two stocks.

In [1]:

#Import the corresponding python library package
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

First, the formula of the correlation coefficient is defined as follows. Standardize the covariance to get the correlation coefficient we need.

Let’s take a look at what a data set with a high correlation coefficient looks like.

In [2]:

X = np.random.rand(50)
Y = X + np.random.normal(0, 0.1, 50)

plt.scatter(X,Y,alpha=0.7)
plt.xlabel('X Value')
plt.ylabel('Y Value')

print ('Correlation coefficient:' + str(np.corrcoef(X, Y)[0, 1]))

Correlation coefficient: 0.963266983769

From the image point of view, if the data basically falls on a straight line, the correlation between them will be very high. Next, let’s take a look at what a high correlation between two stock prices looks like. Let’s take China Airlines and China Southern Airlines as examples.

In [3]:

start = '20140101'#The time here must correspond to the time of the backtest, because the correlation may be inconsistent at different times.
end = '20180101'
stock1='601111.SH'
stock2='600029.SH'
a1 = get_price(stock1, start, end, '1d', ['close'], False)['close']
a2 = get_price(stock2, start, end, '1d', ['close'], False)['close']
#上图
plt.scatter(a1,a2,alpha=0.7)
plt.xlabel(stock1)
plt.ylabel(stock2)
plt.title('Stock prices from ' + start + ' to ' + end)
print (Correlation coefficient between stock1 + " and " + stock2 + ": ", np.corrcoef(a1,a2)[0,1])

Correlation coefficient between 601111.SH and 600029.SH: 0.952849080194

Most of the same data is concentrated on a straight line

To find stock pairs with high correlation, we need to study the price difference between them, because this is the key to our strategic arbitrage

In [5]:

a3=a1-a2
a3.plot(figsize=(14,7))

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fcd9d062898>

As can be seen from the figure, the correlation coefficient of the two stocks is high, and the price difference between the two fluctuates roughly around a constant. Then whether the price difference is stable, we use the ADF unit root test to determine whether the sequence is stationary.

In [7]:

from statsmodels.tsa.stattools import adfuller
adftest = adfuller(a3)#Use adf unit root to test stationarity
result = pd.Series(adftest[0:4], index=['Test Statistic','p-value','Lags Used','Number of Observations Used'])
for key,value in adftest[4].items():
        result['Critical Value (%s)'%key] = value
print(result)

Test Statistic -3.049276
p-value 0.030542
Lags Used 22.000000
Number of Observations Used 954.000000
Critical Value (10%) -2.568386
Critical Value (5%) -2.864574
Critical Value (1%) -3.437223
dtype: float64

According to the ADF unit root test results, it is found that the price difference is stable, and the price difference is basically beneficial to the implementation of arbitrage strategies.
Tips for judging ADF unit root test results:
    A. Look at the p-value. Generally, it is stable if it is less than 0.05.
    B. Look at the Test statistic value. Generally, a value smaller than the Critical Value (5%) is considered stable.

Further, let's take a look at the average price difference plus or minus one price difference standard deviation as the upper and lower rails for arbitrage. Does the upper and lower rail range include most of the price difference range?


It can be seen that the upper and lower rail ranges include most of the price difference ranges. However, in the bull market, the price difference between the two fluctuates greatly, which is not conducive to the implementation of arbitrage strategies. However, after the bull market, the price difference between the two fluctuates stably between the upper and lower rails, which is very suitable. Carry out arbitrage strategies.

Let us calculate the 60-day rolling correlation between the two and see the changing trends of the two.

In [8]:

rolling_correlation_cn = pd.rolling_corr(a1, a2, 60)
rolling_correlation_cn.plot()
plt.xlabel('Day')
plt.ylabel('New 60-day Rolling Correlation')

Out[8]:

<matplotlib.text.Text at 0x7fcd813898d0>

Judging from the 60-day rolling correlation of the correlation coefficient, the correlation between these two stocks is actually very high. Except for a period of time when the correlation decreases during the bull market, it remains above 0.7 the rest of the time.

Through a simple arbitrage strategy, we conducted arbitrage backtesting on Air China and China Southern Airlines. The backtesting results are as follows:

The arbitrage strategy code is as follows:

In [ ]:

import numpy as np

def init(context):
    #Select stock pairs with higher correlation coefficients after research
    g.s1 = '601111.SH'
    g.s2 = '600029.SH'
    
def handle_bar(context, bar_dict):
    # Get the closing prices of the two selected stocks for one year and calculate the difference
    price_stock1=history(g.s1,['close'],250,'1d',True)['close'].values
    price_stock2=history(g.s2,['close'],250,'1d',True)['close'].values
    diff=price_stock1-price_stock2
    #Use the mean plus twice the standard deviation as the upper opening line
    up=np.mean(diff) + np.std(diff)
    #Use the mean minus one standard deviation as the lower opening line
    down=np.mean(diff)-np.std(diff)
    #Get the closing price of the stock the day before the trade was made and calculate the difference
    yesterday_price1=history(g.s1,['close'],1,'1d',True)['close'][0]
    yesterday_price2=history(g.s2,['close'],1,'1d',True)['close'][0]
    yesterday_diff=yesterday_price1-yesterday_price2
    #If the price difference reaches the upper opening line the day before, sell stock s1 and go long s2
    if yesterday_diff>up:
        order_target_percent(g.s1,0)
        order_target_percent(g.s2,1)
    #If the price difference of the previous day reaches the lower opening line, sell stock s2 and go long s1
    if yesterday_diff<down:
        order_target_percent(g.s2,0)
        order_target_percent(g.s1,1)

To view the details of the above strategies, please go to the supermind quantitative trading official website to view: Financial Analysis Modeling-Statistical Arbitrage: Using Correlation Coefficients for Paired Trading