7 min read

Introduction to backtesting in R

Introduction to backtesting

In this post I’ll go over a simple example of how to backtest a strategy in R using the packages: quantmod, xts and PerformanceAnalytics.

Reading financial data

I’ll start explaining how we can get financial data into R using getSymbols. In this case the source (as defined by the src argument) is yahoo finance.

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(xts)
library(dygraphs)
library(PerformanceAnalytics)
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend

options("getSymbols.warning4.0"=FALSE)

df = getSymbols('QQQ', src='yahoo', auto.assign=FALSE)
head(df)
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03    43.46    44.06   42.52     43.24  167689500     38.32703
## 2007-01-04    43.30    44.21   43.15     44.06  136853500     39.05387
## 2007-01-05    43.95    43.95   43.48     43.85  138958800     38.86772
## 2007-01-08    43.89    44.12   43.64     43.88  106401600     38.89431
## 2007-01-09    44.01    44.29   43.63     44.10  121577500     39.08933
## 2007-01-10    43.96    44.66   43.82     44.62  121070100     39.55023

It’s also possible to get data of multiple tickers and store them with a list. getSymbols also allows the user to define a date range with the from and to parameters.

symbols = c("SPY", "QQQ", "IEF", "VNQ")

data = lapply(symbols, function(symbol) {
  getSymbols(symbol, src='yahoo', from='2007-10-01', to='2020-06-19', auto.assign=FALSE)
})

lapply(data, tail, n=2)
## [[1]]
##            SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2020-06-17   314.07   314.39  310.86    311.66   82954600     310.2945
## 2020-06-18   310.01   312.30  309.51    311.78   80828700     310.4140
## 
## [[2]]
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2020-06-17   244.40   245.50  242.73    243.62   34560100     243.1971
## 2020-06-18   243.29   244.53  242.49    244.28   30578600     243.8559
## 
## [[3]]
##            IEF.Open IEF.High IEF.Low IEF.Close IEF.Volume IEF.Adjusted
## 2020-06-17   121.16   121.27  120.97    121.23    2308300       121.23
## 2020-06-18   121.51   121.56  121.42    121.50    1694600       121.50
## 
## [[4]]
##            VNQ.Open VNQ.High VNQ.Low VNQ.Close VNQ.Volume VNQ.Adjusted
## 2020-06-17    83.87    83.95   82.16     82.17    5041500        82.17
## 2020-06-18    81.19    81.72   80.82     81.27    5005900        81.27

data is a list and each element has an xts object.

Compute returns

Let’s compute the returns of each ticker and create one xts object.

R = lapply(data, function(df) Cl(df)/Lag(Cl(df), k=1) - 1)
lapply(R, tail, n=2)
## [[1]]
##                SPY.Close
## 2020-06-17 -0.0041538441
## 2020-06-18  0.0003850189
## 
## [[2]]
##              QQQ.Close
## 2020-06-17 0.003170636
## 2020-06-18 0.002709154
## 
## [[3]]
##              IEF.Close
## 2020-06-17 0.001569746
## 2020-06-18 0.002227147
## 
## [[4]]
##              VNQ.Close
## 2020-06-17 -0.01451186
## 2020-06-18 -0.01095291

Now we have a list where each element has the simple returns of each security. This could be a perfectly valid format to work with the data but in this case I’ll convert it to a 2D format. In order to do this we use the do.call function that basically calls a function multiple times, in the case cbind.

R = do.call(cbind, R)
R = R[-1, ]
names(R) = gsub("\\.Close", "", names(R))
head(R, 3)
##                     SPY           QQQ           IEF          VNQ
## 2007-10-02 -0.001361030  0.0001922692  0.0021513088  0.018169427
## 2007-10-03 -0.002011792 -0.0069216692 -0.0007155277 -0.004830283
## 2007-10-04  0.001560704  0.0023232913  0.0017902017  0.010246758

Evalute buy-hold strategies

Next let’s see how each of the securities we have selected did from 2007 until 2020. The plot shows the cumulative return as if we would start with one dolar invested on 2007 and held it until 2020.

charts.PerformanceSummary(R, wealth.index=TRUE, 
                          geometric=FALSE)

Here is a brief description of what each security is in order to provide some context.

  • SPY: Is an ETF that tracks the S&P 500 index.
  • QQQ: Is the ETF that tracks the Nasdaq-100 Index
  • IEF: Intermediate term (7 to 10 years) US treasuries
  • VNQ: A REITs index ETF

So we have 3 asset classes. Stocks (SPY and QQQ), Bonds (IEF) and REITs (VNQ). The results show IEF has less volatility and smaller Drawdowns in crisis. A tipycal balanced portfolio is investing 60% in stocks and 40% in bonds. Let’s add this option and plot the returns, I’ll refer to it as the balanced portfolio from now on. In these examples, by multiplying the returns by a constant weight we are assuming that the rebalancing is done daily.

R$Balanced = R$SPY * 0.6 + R$IEF * 0.4

The green line is the balanced portfolio. This is of course a trade-off every investor can make but this seems like a simple and effective passive investment strategy. Even in the GFC (2008) the maximum drawdown of this was close to -40% whereas for stocks it was lower than -60%.

charts.PerformanceSummary(R[, c("SPY", "IEF", "Balanced")],
                          wealth.index=TRUE, geometric=FALSE)

This leads us to the next question. How do the risk adjusted returns are for these different portfolios. Does the balanced porfolio have a lower risk adjusted return? I’ll use the table.AnnualizedReturns function to answer this question.

table.AnnualizedReturns(R[, c("SPY", "IEF", "Balanced")])
##                              SPY    IEF Balanced
## Annualized Return         0.0569 0.0298   0.0539
## Annualized Std Dev        0.2111 0.0693   0.1171
## Annualized Sharpe (Rf=0%) 0.2698 0.4300   0.4602

The balanced portfolio achieves a return close to stocks with almost half the volatility. The Sharpe ratio summarizes this result. Considering we have started at a bad time (before the GFC) a 5.3% annualized return is not a bad outcome. Had we started our long position in 2009 (after the crisis) the annualized results would be much higher, 8% for the balanced portfolio and 11% for the SPY ETF. Still the sharpe ratio of the balanced portfolio is higher.

table.AnnualizedReturns(R["2009/2020", c("SPY", "IEF", "Balanced")])
##                              SPY    IEF Balanced
## Annualized Return         0.1144 0.0185   0.0813
## Annualized Std Dev        0.1834 0.0659   0.1014
## Annualized Sharpe (Rf=0%) 0.6237 0.2803   0.8019

During the GFC the IEF ETF did a good job reducing the Drawdown. We can see this by plotting the period between 2007 and 2012.

charts.PerformanceSummary(R["2007/2012", c("SPY", "IEF", "Balanced")],
                          wealth.index=TRUE, geometric=FALSE)

Something similar happened during the COVID crisis. However the drawdown of stocks was much more extreme during 2008.

charts.PerformanceSummary(R["2020-01/2020", c("SPY", "IEF", "Balanced")],
                          wealth.index=TRUE, geometric=FALSE)

In order to compute the cumulative returns we can use the cumsum function. As we did in the previous chapter it’s also possible to use an interactive plot with dygraphs.

R_cumulative = 1 + cumsum(R)
dygraph(R_cumulative[, c("SPY", "IEF", "Balanced")]) %>%
  dyRangeSelector(dateWindow = c("2007-10-01", "2013-01-01")) %>%
  dyAxis("y", 
         valueFormatter = 'function(d){return Math.round(d*100) + "%"}',
         axisLabelFormatter = 'function(d){return Math.round(d*100) + "%"}')

Evaluating other strategies

Let’s start by looking at the returns of each asset. It’s clear that QQQ had the best results during the 2007-2020 period. Compared to the S&P 500 it had more than 2 times the return with almost the same variability. How would the strategy work if instead of using the SPY ETF we use the QQQ for the balanced portfolio.

R$Balanced_QQQ = R$QQQ * 0.6 + R$IEF * 0.4
table.AnnualizedReturns(R)
##                              SPY    QQQ    IEF    VNQ Balanced Balanced_QQQ
## Annualized Return         0.0569 0.1295 0.0298 0.0083   0.0539       0.0974
## Annualized Std Dev        0.2111 0.2236 0.0693 0.3375   0.1171       0.1251
## Annualized Sharpe (Rf=0%) 0.2698 0.5793 0.4300 0.0245   0.4602       0.7789

It’s also interesting to look at what happened during the GFC and COVID crisis.

During the GFC the balanced portfolio using the Nasdaq-100 did ok. It generated an annualized 3.7% return while the S&P 500 index had negative returns.

table.AnnualizedReturns(R["2007/2010"])
##                               SPY    QQQ    IEF     VNQ Balanced Balanced_QQQ
## Annualized Return         -0.0609 0.0143 0.0358 -0.0822  -0.0095       0.0372
## Annualized Std Dev         0.2955 0.2923 0.0905  0.5671   0.1652       0.1631
## Annualized Sharpe (Rf=0%) -0.2062 0.0489 0.3959 -0.1450  -0.0573       0.2281

During the COVID-19 crisis (2020-01 to 2020-06) the Balanced-QQQ did much better than the Balanced portfolio. This is not surprising considering technology stock businesses were less affected by the crisis.

table.AnnualizedReturns(R["2020-01/2020-06"])
##                               SPY    QQQ    IEF     VNQ Balanced Balanced_QQQ
## Annualized Return         -0.0662 0.3486 0.2335 -0.2484   0.0782       0.3442
## Annualized Std Dev         0.4581 0.4564 0.0976  0.5771   0.2553       0.2536
## Annualized Sharpe (Rf=0%) -0.1446 0.7638 2.3922 -0.4304   0.3063       1.3576

I’ll include the plot of both periods. The plots show a more complete picture. During the GFC the Balanced-QQQ portfolio recovered faster than the original balanced portfolio.

names(R)[5] = 'Balanced_SPY'
charts.PerformanceSummary(R["2007/2010", c("Balanced_SPY", "Balanced_QQQ")],
                          wealth.index=TRUE, geometric=FALSE)

During the COVID-19 crisis something similar happened. The Balanced-QQQ portfolio recovered much faster. The correction lasted a bit more than a month.

charts.PerformanceSummary(R["2020-01/2020", c("Balanced_SPY", "Balanced_QQQ")],
                          wealth.index=TRUE, geometric=FALSE)