The Art of Backtesting
26 August 2015 / Dr Matthew Killeya and Claus Simon
Backtesting is at the heart of systematic investment. Done correctly, and it can recreate reality closely enough to identify systematic patterns which are likely to persist in the future. Patterns discovered by a robust backtest can be exploited to generate returns. But there are many subtle pitfalls to be avoided, and this is where the best researchers earn their salt.
Systematic firms combine three key pillars: data, technology and people.
Historical data related to financial instruments is critical to discovering and refining hypotheses. State of the art technology is essential if
one is to extract meaningful information from decades and decades of such data - no serious statistical analysis can take place
without serious computing. Although the third pillar is sometimes emphasised less than the
people are just as crucial
as the first two pillars. High powered computing and reams of data are worth little without the skill and
expertise to extract meaningful information from data. Misunderstanding the
difference between information and data is a common mistake. Data is not information: once data has
been processed, aggregated, visualised and transformed it becomes information. Data and information must be distinguished: if the processing, aggregation, transformation or visualisation processes are flawed then
the resultant information is incorrect.(footnote)(footnote)
Any successful investment firm requires exceptional people in every area of its business, from operations and finance
to research and software development. The best systematic
firms hire teams of PhD scientists or quantitative researchers to analyse data.(footnote)
Computing skills aside, scientific training is
essential to extract information from data but an abundance of computing power and software
makes it easy to do statistical analysis badly. The subtleties involved in analysing data correctly are so
important, that much of the intellectual property of a systematic firm centres on the creation of clean
and bias-free research pipelines.(footnote)
Let's make this concrete through an example. Consider the general problem of building a portfolio of systematic
strategies. Ultimately we want to discover stable relationships which have predicted financial markets and,
most importantly, stable relationships which are likely to persist into the future.(footnote)
But there are many nuances before we
even attempt to build statistical models. In fact there are many more than we could
ever hope to cover here. So let us focus on three examples.
The Price is Right (…or maybe not)
Many systematic firms have their origins in trading futures and forwards markets. In recent times,
some — ourselves included — have broadened their horizons to single stock cash equities
Even with basic instruments such as cash equities, choices that appear straightforward at first are
more difficult than one might think. For example, one might assume that finding the price of Apple shares can't be that hard? Let's
take a look at the market price of Apple since its December 1980 IPO.
One might assume that the historical share price is the correct price to analyse.
Most of the time that is true. However, raw price series, such as the one above often are characterised
by large sudden jumps (or drops) like the one seen in mid-2014. These drops in price do not correspond
to corporate scandals or even investor losses, they are simply when a corporation decides to issue new
shares to existing shareholders (effectively decreasing the price of the company's minimum investible
unit). In the case of Apple's recent stock split, Apple, Inc. distributed 6 additional shares for each
share held by investors. Given that the total equity didn't change, the price per stock fell
by a factor of 7. This means that a shareholder has the same amount of equity as before. The example
also demonstrates why the raw share price is meaningless — taking it at face value would falsely imply
a dramatic reduction in portfolio value.
Thus the “adjusted price series”, where these splits are factored out like below, are the correct
returns to analyse. Stock splits are not the only events that need to be factored out. Stocks often pay
dividends, and when they do, cash exits a business and goes to investors. The business' equity has
fallen and therefore shares, which are a claim on a business' assets, must fall in price
Dividends can be factored out similarly to splits,
so raw prices may end up having multiple layers of adjustments.(footnote)
Even though corporate actions are rare in futures contracts, and futures do not entitle the
holder to dividends, systematic firms typically still need to adjust historical futures prices
in a similar way. Since futures contracts expire shortly after they cease to trade, firms must
splice together individual expiring contracts if they wish to have a long, uninterrupted series
of prices. Additional adjustments can be necessary too. Although futures exchanges establish
clear specifications which outline the quantity, quality and type of a future's deliverable,
certain conditions may lead any two expirations to vary substantially. One recent example of
this happened in the U.S. Treasury bond futures market, due to a lack of bond auctions
15 years prior.
In rare cases the actual underlying market may change, for example when the US gasoline market transitioned from regular gasoline via unleaded gasoline to the current reformulated gasoline contract.
If we concentrate for the moment on stocks and shares, a second seemingly innocuous question is what stocks
shall we analyse. What is the “universe“? A reasonable starting point might be to pick an index such
as the FTSE 100
or the Dow Jones Industrial Average
and use all the stocks
in that index.(footnote)
Given that index providers
regularly alter the components of their indices, if we wish to have a static set of
stocks, we must compile the constituents as of a particular date. Having fixed the set
of stocks, one can proceed to build statistical models for that investible universe. However, the
simple act of having fixed a date for the constituents can lead to deleterious biases, as the figure below demonstrates.
To understand the bias, imagine, for example, two hypothetical portfolios during the 2008 financial crisis:
- Portfolio A contains only companies that were part of the S&P 100 in 2007
- Portfolio B contains only companies that were part of the same index in 2009.
Portfolio A contains — among others — companies that went bust during the 2008 financial
crisis. However, portfolio B does not. Clearly then, since A contains companies
that “went to zero” in 2008, but B does not, A's returns will have been much worse
than B's during the crisis. This thought experiment is exactly what the chart above
demonstrates: the backtest with constituents chosen as of the most recent time performs
the best and the one set at the start of the backtest performs worst. Why is this? Well for
our long-only strategy, using the most recent index will exclude companies that
underperformed, or even went bust during the duration of history. The problem is we wouldn't
have known which stocks would be in the index at the start of the
backtest. A nasty forward-looking statistical bias has crept into
We chose a universe that we wouldn't have known at the time, and because of the way the index is constructed, we get
stocks that have gone up.
The figure also shows the same strategy with a dynamic universe where the stock universe
changes each month to reflect index changes. This is a fairer and truer selection and
is essential for clean and bias-free backtests.(footnote)
in the chart above that this effect is actually quite strong. Over the course of a
15 year period, the '09 portfolio finishes over 10 percentage points above the '07 portfolio.
There are other seemingly plausible ways to choose a universe, which introduce pernicious
biases. For example, stocks with short histories can be a problem for statistical models which need
enough data to learn patterns. A simple solution would be to discard them from our universe
prior to backtest. However, we can't do this because again we wouldn't know which
stocks are going to have short histories prior to entering positions. Such stocks tend to fall
in price on average and those with long track records tend to have
increased in price.
If a stock keeps going down then it is going to exit the index because its market capitalisation will be too low. If the price
keeps going down eventually it reaches zero and the firm ceases to exist.
In practice, systematic firms use historical data to discover and harvest more
exotic sources of alpha than traditional long only investment, such as momentum,
value and quality.
These approaches often increase returns,
but the additional complexity can mean these biases are subtler to detect still.
You might assume the arguments above don't apply to macro contracts such as futures and forwards. Anyone
trading currencies in the early nineties would disagree. Positions in French Francs, German Marks, Italian Lira and
many others converged into positions in Euro.
They no longer
exist, but they would have been in
the portfolio over that period. During mid-2015, it seemed as if the Greek Drachma was about to make an
The Price is Right II: Buying Low and Selling High
Let's now backtest something slightly more complex than the long only example. Let's consider a very simple
daily rebalancing buy low/sell high strategy. The idea of buying undervalued assets and selling
over-valued assets is as old as investment itself.
However, as we shall see, the clichéd old aphorism is true: the devil is in the detail. For this experiment we will:
- Use a universe of 100 stocks
- Buy 1 unit of the lowest priced stocks (those below the median price)
- Sell 1 unit of the highest prices (those above the median price)
- Re-balance positions every day
- Give the strategy a fixed risk allocation (in this case 10% volatility)
The figure below shows the performance of the strategy.
The performance of the strategy looks rather good.
A key working
assumption of a quantitative analyst is that if something simple (and frankly, in this case, nonsense) looks surprisingly good, there has to be a
mistake. So what have we done wrong? Returns are from the adjusted prices we established as
necessary in the first section. There is no pollution from splits and dividends in the evaluation of the
strategy's performance. What about positions in these stocks? Buy low/sell high, based on price. What
price? The same price we used for returns. Herein lies the problem. Adjustments make sense for returns, but not
for signals based on absolute price level. In fact, stocks that have gone up through history will
tend to have had several splits, as directors try to keep them
within the ideal investible range
splits back through history in these cases results in a low adjusted price historically. Thus our
strategy tends to go long stocks that subsequently go up. This is fine, except that we can
never know about future splits, and the low priced stocks are entirely a consequence
of these splits. Thus the money we make in our backtest is a
Cheap, unlimited computing power and vast databases of historical data have kick-started a technological
revolution in investment that mirrors the one we all see in many other aspects of our lives. But
computers are only as smart as the people who program them and, as we have seen, there are
of these pitfalls is part of the collection of intellectual property that firms
like ours have: it is where our “edge” lies. Investors should be reassured that
highly qualified scientists, complete with a healthy scepticism for the sensational, concern
themselves with these issues and address them on their behalf.
We have just scratched the surface in terms of backtest biases. Overfitting, including the right
contracts, and cognitive biases, are other areas in which researchers can obtain superficially plausible results
that will fail to materialise when the models contact reality. The list is long
no doubt we will write about them in future posts.