Algorithmic Trading: Finding the Alpha Factor

Financial Feature Engineering: How to research Alpha Factors

This article describes algorithmic trading strategies in finance, where signals indicate when to buy or sell assets to generate superior returns relative to a benchmark, such as an index.
The portion of an asset's return that is not explained by the benchmark is known as alpha, so signals designed to generate such uncorrelated returns are also known as alpha factors.
This article describes how to conduct financial feature engineering to study alpha factors, including using NumPy, pandas, and TA-Lib libraries to process data, smoothing techniques such as wavelets and Kalman filters to reduce noise, and using the trading simulator Zipline to evaluate predictive performance wait.
In addition, the article provides code samples and references for factor research and algorithmic trading platforms in the financial domain. Technical terms used in it include ML (machine learning), alpha factor, NumPy, pandas, TA-Lib, Kalman filter, wavelet, trading simulator, etc.

Algorithmic trading strategies are driven by indicators that indicate when to buy or sell an asset to generate excess returns relative to a benchmark, such as an index. The portion of asset returns that cannot be explained by benchmark-related exposures is known as alpha, so signals designed to generate such uncorrelated returns are also known as alpha factors.

If you are already familiar with machine learning, you probably know that feature engineering is a key ingredient to successful predictions. There is no exception in trading. However, the field of investing is particularly rich in decades of research on how markets work and which features may explain or predict price movements better than others. This chapter provides an overview as a starting point for your own search for alpha factors.

This chapter also introduces some key tools to help calculate and test alpha factors. We will introduce how NumPy, pandas, and TA-Lib libraries can easily process data, and common smoothing techniques such as manifolds and Kalman filters that help reduce noise in data.

We also previewed how to use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We will discuss key alpha factor metrics such as information coefficient and factor turnover. This is followed by an in-depth approach to backtesting trading strategies using machine learning in Chapter 6, which covers the ML4T workflow we use throughout the book for evaluating trading strategies.

See Appendix – Alpha Factor Library, which contains additional material on this topic, including extensive code examples for computing various alpha factors.

content:
Alpha Factors in Practice: From Data to Signals
Building on decades of factor research
References:
Alpha factor engineering to construct predictive returns
Code example: How to do factor engineering with pandas and NumPy
Code Example: How to Create a Technical Alpha Factor Using TA-Lib
Code Example: How to Denoise the Alpha Factor Using a Kalman Filter
Code Example: How to Preprocess Noisy Signals Using Wavelets
resource:
From Signals to Trades: Backtesting with Zipline
Code Example: How to Backtest a One-Factor Strategy Using Zipline
Code Example: Combining Factors from Different Data Sources on the Quantopian Platform
Code Example: Separating Signal and Noise – How to use alphalens
Other Algorithmic Trading Libraries and Platforms
Alpha Factors in Practice: From Data to Signals
Alpha factors are transformations of market, fundamental and alternative data that contain predictive signals. They are designed to capture the risks that drive asset returns. A set of factors describes fundamental economic variables such as growth, inflation, volatility, productivity, and population risk. Another group of factors includes market investment styles such as market portfolio, value growth investing, and momentum investing.

There are also factors that explain price movements based on financial market economics or institutional settings, investor behavior, including known biases in this behavior. The economic theory behind a factor may be rational, where the factor returns higher in the long run to compensate for its lower return in bad times, or behavioral, where the factor risk premium derives from agents that may be biased or not perfectly rational behavior, which is not eliminated by arbitrage.

Based on decades of factor research:
In an idealized world, classes of risk factors should be independent of each other (orthogonal), generate positive risk premiums, and form a complete set that covers all risk dimensions for a given class of assets and accounts for systematic risk. In practice, these requirements can only be met approximately.

References:

  • Anatomy of Anomalies by Eugene Fama and Ken French (2008)
  • Explaining Stock Returns: A Literature Review by James L. Davis (2001)
  • Market Efficiency, Long-Run Returns, and Behavioral Finance by Eugene Fama (1997)
  • The Efficient Market Hypothesis and Its Criticisms by Burton Malkiel (2003)
  • The New Palgrave Dictionary of Economics by Steven Durlauf and Lawrence Blume (2008), 2nd Edition
  • G. William Schwert (2003) Anomalies and Market Efficiency (Chapter 15 of The Handbook of Financial Economics, Constantinides, Harris, and Stulz)
  • Investor Psychology and Asset Pricing by David Hirshleifer (2001)

Practical Advice for Analysis of Large-Scale Complex Datasets, Patrick Riley, The Unofficial Google Data Science Blog
Alpha factor engineering to construct predictive returns:
Based on a conceptual understanding of key factor categories, their rationale, and measures of popularity, a key task is to identify new factors that might better capture the risks embodied by the previously described return drivers, or to find new factors. In any case, the performance of the innovation factor will be compared to known factors to determine incremental signal returns.

Code example: How to do factor engineering with pandas and NumPy:
The notebook feature_engineering.ipynb in the data directory demonstrates how to perform engineering of fundamental factors.

Code example: How to create technical alpha factors using TA-Lib:
The notebook how_to_use_talib demonstrates how to use TA-Lib, including various common technical indicators. What these indicators have in common is that they use only market data, namely price and volume information.

The notebook common_alpha_factors in the appendix contains dozens of additional examples.

Code example: How to use a Kalman filter to denoise your alpha factors:
The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter for smoothing using the PyKalman package; we will also use it when developing the pair trading strategy in Chapter 9.

Code example: How to preprocess a noisy signal using wavelets:
The notebook kalman_filter_and_wavelets also demonstrates how to use the PyWavelets package to process wavelets.

resource:

  • Fama French database
  • numpy website Quick Start Tutorial
  • pandas Website User Guide 10 Minutes Getting Started with pandas Python
  • Pandas Tutorial: A Complete Introduction for Beginners alphatools
  • Python quantitative financial research tool mlfinlab
  • Package based on the research of Dr Marcos Lopez de Prado, about advances in financial machine learning PyKalman documentation
  • Tutorial: Kalman Filters Understanding and Applying Kalman Filters
  • How the Kalman filter works (explained with pictures)
  • PyWavelets – Wavelet Transforms in Python

Wavelet introduction
Wavelet Tutorial

  • BARRA Handbook of Equity Risk Models Active Portfolio Management: A Quantitative Approach to Generating Benchmark Returns and Controlling Risk, Richard Grinold and Ronald Kahn, 1999
  • Modern Investment Management: An Equilibrium Approach, Bob Litterman, 2003
  • Quantitative Equity Portfolio Management: Modern Techniques and Applications Edward Qian, Ronald Hua, and Eric Sorensen Spearman Rank Correlation

From Signals to Trading: Backtesting with Zipline:
The open-source zipline library is an event-driven backtesting system maintained and used by crowdsourced quantitative investment fund Quantopian to facilitate algorithm development and real-time trading. It automates algorithmic reactions to trading events and provides current and historical peer-to-peer data, avoiding forward-looking bias.

Code sample: how to use

Zipline Backtesting Single Factor Strategy
The notebook zipline_simple_alpha_model shows how to use Zipline to backtest a basic alpha factor model.

Code Example: Combining Factors from Different Data Sources on the Quantopian Platform
The notebook zipline_pipeline demonstrates how to use Zipline’s pipeline functionality to combine multiple alpha factors from different data sources into a single strategy.

Code Example: Separating Signal and Noise – How to use alphalens
The notebook alphalens_basic_usage demonstrates how to evaluate and analyze alpha factors using the alphalens library. The library provides a useful set of tools for evaluating the predictive power of alpha factors, factor stability, and factor performance under different market conditions.

Other algorithmic trading libraries and platforms:

  • pyfolio – Python financial analysis tool for evaluating performance and risk of trading strategies.
  • bt – Grayscale financial backtesting framework with flexibility and powerful functions.
  • vnpy – Python framework for developing quantitative trading strategies, supports multiple trading interfaces.
  • backtrader – A Python framework for systematic and offline backtesting with great flexibility and extensibility.
  • quantconnect – Quantitative finance research platform with cloud-based backtesting and trading support.
  • TradeStation – Provides a graphical interface and programming tools for developing and executing trading strategies.
  • Quantopian – Browser-based platform for developing, backtesting and executing trading strategies.
    Platforms such as Backtrader, QuantConnect, TradeStation, and Quantopian also provide some quantitative financial data for strategy development and backtesting. These platforms typically provide access to historical prices, volumes, financial data, and more. You can use this data to build and test alpha factors, and to develop and evaluate trading strategies.

Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to a benchmark such as an index. The portion of an asset’s return that is not explained by exposure to this benchmark is called al pha, and hence the Signals that aim to produce such uncorrelated returns are also called alpha factors.

If you are already familiar with ML, you may know that feature engineering is a key ingredient for successful predictions. This is no different in trading. Investment, however, is particularly rich in decades of research into how markets work and which features may work better than others to explain or predict price movements as a result. This chapter provides an overview as a starting point for your own search for alpha factors.

This chapter also presents key tools that facilitate the computing and testing alpha factors. We will highlight how the NumPy, pandas and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data.

We also preview how you can use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We discuss key alpha factor metrics like the information coefficient and factor turnover. An in-depth introduction to backtesting trading strategies that use machine learning follows in Chapter 6, which covers the ML4T workflow that we will use throughout the book to evaluate trading strategies.

Please see the Appendix – Alpha Factor Library for additional material on this topic, including numerous code examples that compute a broad range of alpha factors.

Content

  1. Alpha Factors in practice: from data to signals
  2. Building on Decades of Factor Research
    • References
  3. Engineering alpha factors that predict returns
    • Code Example: How to engineer factors using pandas and NumPy
    • Code Example: How to use TA-Lib to create technical alpha factors
    • Code Example: How to denoise your Alpha Factors with the Kalman Filter
    • Code Example: How to preprocess your noisy signals using Wavelets
    • Resources
  4. From signals to trades: backtesting with Zipline
    • Code Example: How to use Zipline to backtest a single-factor strategy
    • Code Example: Combining factors from diverse data sources on the Quantopian platform
    • Code Example: Separating signal and noise – how to use alphalens
  5. Alternative Algorithmic Trading Libraries and Platforms

Alpha Factors in practice: from data to signals

Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility city, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.

There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitrated away.

Building on Decades of Factor Research

In an idea, Categories of Risk Factors Should Be Independent of Each Other (Orthogonal), Yield Positive Risk Premia, and Form A Complete Set That Spans all dimensions of risk and explains the systematic risks for assets in a given class. These requirements will hold only approximately.

References

  • Dissecting Anomalies by Eugene Fama and Ken French (2008)
  • Explaining Stock Returns: A Literature Review by James L. Davis (2001)
  • Market Efficiency, Long-Term Returns, and Behavioral Finance by Eugene Fama (1997)
  • The Efficient Market Hypothesis and It’s Critics by Burton Malkiel (2003)
  • The New Palgrave Dictionary of Economics (2008) by Steven Durlauf and Lawrence Blume, 2nd ed.
  • Anomalies and Market Efficiency by G. William Schwert25 (Ch. 15 in Handbook of the- “Economics of Finance”, by Constantinides, Harris, and Stulz, 2003)
  • Investor Psychology and Asset Pricing, by David Hirshleifer (2001)
  • Practical advice for analysis of large, complex data sets, Patrick Riley, Unofficial Google Data Science Blog

Engineering alpha factors that predict returns

Based on a conceptual understanding of key factor categories, their rationale and popular metrics, a key task is to identify new factors that may better capture the risks embodied by the return drivers laid out previously, or to find new ones. will be important to compare the performance of innovative factors to that of known factors to identify incremental signal gains.

Code Example: How to engineer factors using pandas and NumPy

The notebook feature_engineering.ipynb in the data directory illustrates how to engineer basic factors.

Code Example: How to use TA-Lib to create technical alpha factors

The notebook how_to_use_talib illustrates the usage of TA-Lib, which includes a broad range of common technical indicators. These indicators have in common that they only use market data, i.e., price and volume information.

The notebook common_alpha_factors in th appendix contains dozens of additional examples.

Code Example: How to denoise your Alpha Factors with the Kalman Filter

The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter using the PyKalman package for smoothing; we will also use it in Chapter 9 when we develop a pairs trading strategy.

Code Example: How to preprocess your noisy signals using Wavelets

The notebook kalman_filter_and_wavelets also demonstrates how to work with wavelets using the PyWavelets package.

Resources

  • Fama French Data Library
  • numpy website
    • Quickstart Tutorial
  • pandas website
    • User Guide
    • 10 minutes to pandas
    • Python Pandas Tutorial: A Complete Introduction for Beginners
  • alphatools – Quantitative finance research tools in Python
  • mlfinlab – Package based on the work of Dr Marcos Lopez de Prado regarding his research with respect to Advances in Financial Machine Learning
  • PyKalman documentation
  • Tutorial: The Kalman Filter
  • Understanding and Applying Kalman Filtering
  • How a Kalman filter works, in pictures
  • PyWavelets – Wavelet Transforms in Python
  • An Introduction to Wavelets
  • The Wavelet Tutorial
  • Wavelets for Kids
  • The Barra Equity Risk Model Handbook
  • Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk by Richard Grinold and Ronald Kahn, 1999
  • Modern Investment Management: An Equilibrium Approach by Bob Litterman, 2003
  • Quantitative Equity Portfolio Management: Modern Techniques and Applications by Edward Qian, Ronald Hua, and Eric Sorensen
  • Spearman Rank Correlation

From signals to trades: backtesting with Zipline

The open source zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm’s reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.

  • Chapter 8 contains a more comprehensive introduction to Zipline.
  • Please follow the instructions in the installation folder, including to address know issues.

Code Example: How to use Zipline to backtest a single-factor strategy

The notebook single_factor_zipline develops and tests a simple mean-reversion factor that measures how much recent performance has deviated from the historical average. Short-term reversal is a common strategy that takes advantage of the weakly predictive pattern that stock price increase s are likely to mean- revert back down over horizons from less than a minute to one month.

Code Example: Combining factors from diverse data sources on the Quantopian platform

The Quantopian research environment is tailored to the rapid testing of predictive alpha factors. The process is very similar because it builds on zipline, but offers much richer access to data sources.

The notebook multiple_factors_quantopian_research illustrates how to compute alpha factors not only from market data as previously but also from fundamental and alternative data.

Code Example: Separating signal and noise – how to use alphalens

The notebook performance_eval_alphalens introduces the alphalens library for the performance analysis of predictive (alpha) factors, open-sourced by Quantopian. It demonstrates how it integrates with the backtesting library zipline and the portfolio performance and risk analysis sis library pyfolio that we will explore in the next chapter.

alphalens facilitates the analysis of the predictive power of alpha factors concerning the:

  • Correlation of the signals with subsequent returns
  • Profitability of an equal or factor-weighted portfolio based on a (subset of) the signals
  • Turnover of factors to indicate the potential trading costs
  • Factor-performance during specific events
  • Breakdowns of the preceding by sector

The analysis can be conducted using tearsheets or individual computations and plots. The tearsheets are illustrated in the online repo to save some space.

  • See here for a detailed alphalens tutorial by Quantopian

Alternative Algorithmic Trading Libraries and Platforms

  • QuantConnect
  • Alpha Trading Labs
    • Alpha Trading Labs is no longer active
  • WorldQuant
  • Python Algorithmic Trading Library PyAlgoTrade
  • pybacktest
  • Trading with Python
  • Interactive Brokers