Friday, November 17, 2017

Optimizing trading strategies without overfitting

By Ernest Chan and Ray Ng

===

Optimizing the parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance. Whatever optimal parameters one found are likely to suffer from data snooping bias, and there may be nothing optimal about them in the out-of-sample period. That's why parameter optimization of trading strategies often adds no value. On the other hand, optimizing the parameters of a time series model (such as a maximum likelihood fit to an autoregressive or GARCH model) is more robust, since the input data are prices, not trades, and we have plenty of prices. Fortunately, it turns out that there are clever ways to take advantage of the ease of optimizing time series models in order to optimize parameters of a trading strategy.

One elegant way to optimize a trading strategy is to utilize the methods of stochastic optimal control theory - elegant, that is, if you are mathematically sophisticated and able to analytically solve the Hamilton-Jacobi-Bellman (HJB) equation (see Cartea et al.) Even then, this will only work when the underlying time series is a well-known one, such as the continuous Ornstein-Uhlenbeck (OU) process that underlies all mean reverting price series. This OU process is neatly represented by a stochastic differential equation. Furthermore, the HJB equations can typically be solved exactly only if the objective function is of a simple form, such as a linear function. If your price series happens to be neatly represented by an OU process, and your objective is profit maximization which happens to be a linear function of the price series, then stochastic optimal control theory will give you the analytically optimal trading strategy: with exact entry and exit thresholds given as functions of the parameters of the OU process. There is no more need to find such optimal thresholds by trial and error during a tedious backtest process, a process that invites overfitting to sparse number of trades. As we indicated above, the parameters of the OU process can be fitted quite robustly to prices, and in fact there is an analytical maximum likelihood solution to this fit given in Leung et. al.

But what if you want something more sophisticated than the OU process to model your price series or require a more sophisticated objective function? What if, for example, you want to include a GARCH model to deal with time-varying volatility and optimize the Sharpe ratio instead? In many such cases, there is no representation as a continuous stochastic differential equation, and thus there is no HJB equation to solve. Fortunately, there is still a way to optimize without overfitting.

In many optimization problems, when an analytical optimal solution does not exist, one often turns to simulations. Examples of such methods include simulated annealing and Markov Chain Monte Carlo (MCMC). Here we shall do the same: if we couldn't find an analytical solution to our optimal trading strategy, but could fit our underlying price series quite well to a standard discrete time series model such as ARMA, then we can simply simulate many instances of the underlying price series. We shall backtest our trading strategy on each instance of the simulated price series, and find the best trading parameters that most frequently generate the highest Sharpe ratio. This process is much more robust than applying a backtest to the real time series, because there is only one real price series, but we can
we can simulate as many price series (all following the same ARMA process) as we want. That means we can simulate as many trades as we want and obtain optimal trading parameters with as high a precision as we like. This is almost as good as an analytical solution. (See flow chart below that illustrates this procedure - click to enlarge.)

Optimizing a trading strategy using simulated time series

Here is a somewhat trivial example of this procedure. We want to find an optimal strategy that trades  AUDCAD on an hourly basis. First, we fit a AR(1)+GARCH(1,1) model to the data using log midprices. The maximum likelihood fit is done using a one-year moving window of historical prices, and the model is refitted every month. We use MATLAB's Econometrics Toolbox for this fit. Once the sequence of monthly models are found, we can use them to predict both the log midprice at the end of the hourly bars, as well as the expected variance of log returns. So a simple trading strategy can be tested: if the expected log return in the next bar is higher than K times the expected volatility (square root of variance) of log returns, buy AUDCAD and hold for one bar, and vice versa for shorts. But what is the optimal K?

Following the procedure outlined above, each time after we fitted a new AR(1)+GARCH(1, 1) model, we use this to simulate the log prices for the next month's worth of hourly bars. In fact, we simulate this 1,000 times, generating 1,000 time series, each with the same number of hourly bars in a month. Then we simply iterate through all reasonable value of K and remember which K generates the highest Sharpe ratio for each simulated time series. We pick the K that most often results in the best Sharpe ratio among the 1,000 simulated time series (i.e. we pick the mode of the distribution of optimal K's across the simulated series). This is the sequence of K's (one for each month) that we use for our final backtest. Below is a sample distribution of optimal K's for a particular month, and the corresponding distribution of Sharpe ratios:

Histogram of optimal K and corresponding Sharpe ratio for 1,000 simulated price series

Interestingly, the mode of the optimal K is 0 for any month. That certainly makes for a simple trading strategy: just buy whenever the expected log return is positive, and vice versa for shorts. The CAGR is about 4.5% assuming zero transaction costs and midprice executions. Here is the cumulative returns curve:


You may exclaim: "This can't be optimal, because I am able to trade AUDCAD hourly bars with much better returns and Sharpe ratio!" Of course, optimal in this case only means optimal within a certain universe of strategies, and assuming an underlying AR(1)+GARCH(1, 1) price series model. Our universe of strategies is a pretty simplistic one: just buy or sell based on whether the expected return exceeds a multiple of the expected volatility. But this procedure can be extended to whatever price series model you assume, and whatever universe of strategies you can come up with. In every case, it greatly reduces the chance of overfitting.

P.S. we invented this procedure for our own use a few months ago, borrowing similar ideas from Dr. Ng’s computational research in condensed matter physics systems (see Ng et al here or here). But later on, we found that a similar procedure has already been described in a paper by Carr et al

===

About the authors: Ernest Chan is the managing member of QTS Capital Management, LLC. Ray Ng is a quantitative strategist at QTS. He received his Ph.D. in theoretical condensed matter physics from McMaster University. 

===

Upcoming Workshops by Dr. Ernie Chan

November 18 and December 2:  Cryptocurrency Trading with Python

I will be moderating this online workshop for Nick Kirk, a noted cryptocurrency trader and fund manager, who taught this widely acclaimed course here and at CQF in London.

February 24 and March 3: Algorithmic Options Strategies

This online course focuses on backtesting intraday and portfolio option strategies. No pesky options pricing theories will be discussed, as the emphasis is on arbitrage trading.



Thursday, September 07, 2017

StockTwits Sentiment Analysis


By Colton Smith
===

Exploring alternative datasets to augment financial trading models is currently the hot trend among the quantitative community. With so much social media data out there, its place in financial models has become a popular research discussion. Surely the stock market’s performance influences the reactions from the public but if the converse is true, that social media sentiment can be used to predict movements in the stock market, then this would be a very valuable dataset for a variety of financial firms and institutions.

When I began this project as a consultant for QTS Capital Management, I did an extensive literature review of the social media sentiment providers and academic research. The main approach is to take the social media firehose, filter it down by source credibility, apply natural language processing (NLP), and create a variety of metrics that capture sentiment, volume, dispersion, etc. The best results have come from using Twitter or StockTwits as the source. A feature of StockTwits that distinguishes it from Twitter is that in late 2012 the option to label your tweet as bullish or bearish was added. If these labels accurately capture sentiment and are used frequently enough, then it would be possible to avoid using NLP. Most tweets are not labeled as seen in Figure 1 below, but the percentage is increasing.

Figure 1: Percentage of Labeled StockTwits Tweets by Year

This blog post will compare the use of just the labeled tweets versus the use of all tweets with NLP. To begin, I did some basic data analysis to better understand the nature of the data. In Figure 2 below, the number of labeled tweets per hour is shown. As expected there are spikes around market open and close.

Figure 2: Number of Tweets Per Hour of the Day

The overall market sentiment can be estimated by aggregating the number of bullish and bearish labeled tweets each day. Based on the previous literature, I expected a significant bullish bias. This is confirmed in Figure 3 below with the daily mean percetage of bullish tweets being 79%.

Figure 3: Percentage of Bullish Tweets Each Day

When writing a StockTwits tweet, users can tag multiple symbols so it is possible that the sentiment label could apply to more than one symbol. Tagging more than one symbol would likely indicate less specific sentiment and predictive potential so I hoped to find that most tweets only tag a single symbol. Looking at Figure 4 below, over 90% of the tweets tag a single symbol and a very small percentage tag 5+.

Figure 4: Relative Frequency Histogram of the Number of Symbols Mentioned Per Tweet

The time period of data used in my analysis is from 2012-11-01 to 2016-12-31. In Figure 5 below, the top symbols, industries, and sectors by total labeled tweet count are shown. By far the most tweeted about industries were biotechnology and ETFs. This makes sense because of how volatile these industries are which hopefully means that they would be the best to trade based on social media sentiment data.

Figure 5: Top Symbols, Industries, and Sectors by Total Tweet Count

Now I needed to determine how I would create the sentiment score to best encompass the predictive potential of the data. Though there are obstacles to trading an open to close strategy including slippage, liquidity, and transaction costs, analyzing how well the sentiment score immediately before market open predicts open to close returns is a valuable sanity check to see if it would be useful in a larger factor model. The sentiment score for each day was calculated using the tweets from the previous market day’s open until the current day’s open:

S-Score =  (#Bullish-#Bearish)/(#Bullish+#Bearish)

This S-Score then needs to be normalized to detect the significance of a specific day’s sentiment with respect to the symbol’s historic sentiment trend. To do this, a rolling z-score is applied to the series. By changing the length of the lookback window the sensitivity can be adjusted. Additionally, since the data is quite sparse, days without any tweets for a symbol are given an S-Score of 0. At the market open each day, symbols with an S-Score above the positive threshold are entered long and symbols with an S-Score below the negative threshold are entered short. Equal dollar weight is applied to the long and short legs. These positions are assumed to be liquidated at the day’s market close. The first test is on the universe of equities with previous day closing prices > $5. With a relatively small long-short portfolio of ~250 stocks, its performance can be seen in Figure 6 below (click on chart to enlarge).

Figure 6: Price > $5 Universe Open to Close Cumulative Returns

The thresholds were cherry-picked to show the potential of a 2.11 Sharpe Ratio but the results vary depending on the thresholds used. This sensitivity is likely due to the lack of tweet volume on most symbols. Also, the long and short thresholds are not equal in an attempt to maintain roughly equal number of stocks in each leg. The neutral basket contains all of the stocks in the universe that do not have an S-Score extreme enough to generate a long or short signal. Using the same thresholds as above, the test was ran on a liquidity universe which is defined as the top quartile of 50-day Average Dollar Volume stocks. As seen in Figure 7 below, the Sharpe drops to a 1.24 but is still very encouraging.

Figure 7: Liquidity Universe Open to Close Cumulative Returns

The sensitivity of these results needs to be further inspected by performing analysis on separate train and test sets but I was very pleased with the returns that could be potentially generated from just labeled StockTwits data.

In July, I began working for Social Market Analytics, the leading social media sentiment provider. Here at SMA, we run all the StockTwits tweets through our proprietary NLP engine to determine their sentiment scores. Using sentiment data from 9:10 EST which looks at an exponentially weighted sentiment aggregation over the last 24 hours, the open to close simulation can be ran on the price > $5 universe. Each stock is separated into its respective quintile based on its S-Score in relation to the universe’s percentiles that day. A long-short portfolio is constructed in a similar fashion as previously with long positions in the top quintile stocks and short positions in the bottom quintile stocks. In Figure 8 below you can see that the results are much better than when only using sentiment labeled data.

Figure 8: SMA Open to Close Cumulative Returns Using StockTwits Data

The predictive power is there as the long-short boasts an impressive 4.5 Sharpe ratio. Due to having more data, the results are much less sensitive to long-short portfolio construction. To avoid the high turnover of an open-to-close strategy, we have been exploring possible long-term strategies. Deutsche Bank’s Quantitative Research Team recently released a paper about strategies that solely use our SMA data which includes a longer-term strategy. Additionally, I’ve recently developed a strong weekly rebalance strategy that attempts to capture weekly sentiment momentum.

Though it is just the beginning, my dive into social media sentiment data and its application in finance over the course of my time consulting for QTS has been very insightful. It is arguable that by just using the labeled StockTwits tweets, we may be able to generate predictive signals but by including all the tweets for sentiment analysis, a much stronger signal is found. If you have questions please contact me at coltonsmith321@gmail.com.

Colton Smith is a recent graduate of the University of Washington where he majored in Industrial and Systems Engineering and minored in Applied Math. He now lives in Chicago and works for Social Market Analytics. He has a passion for data science and is excited about his developing quantitative finance career. LinkedIn: https://www.linkedin.com/in/coltonfsmith/
===
Upcoming Workshops by Dr. Ernie Chan

September 11-15City of London workshops

These intense 8-16 hours workshops cover Algorithmic Options StrategiesQuantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits.

November 18 and December 2:  Cryptocurrency Trading with Python

I will be moderating this online workshop for Nick Kirk, a noted cryptocurrency trader and fund manager, who taught this widely acclaimed course here and at CQF in London.

Friday, July 21, 2017

Building an Insider Trading Database and Predicting Future Equity Returns

By John Ryle, CFA
===
I’ve long been interested in the behavior of corporate insiders and how their actions may impact their company’s stock. I had done some research on this in the past, albeit in a very low-tech way using mostly Excel. It’s a highly compelling subject, intuitively aligned with a company’s equity performance - if those individuals most in-the-know are buying, it seems sensible that the stock should perform well. If insiders are selling, the opposite is implied. While reality proves more complex than that, a tremendous amount of literature has been written on the topic, and it has shown to be predictive in prior studies.

In generating my thesis to complete Northwestern’s MS in Predictive Analytics program, I figured employing some of the more prominent machine learning algorithms to insider trading could be an interesting exercise. I was concerned, however, that, as the market had gotten smarter over time, returns from insider trading signals may have decayed as well, as is often the case with strategies exposed to a wide audience over time. Information is more readily available now than at any time in the past. Not too long ago, investors needed to visit SEC offices to obtain insider filings. The standard filing document, the form 4 has only required electronic submission since 2003. Now anyone can obtain it freely via the SEC’s EDGAR website. If all this data is just sitting out there, can it continue to offer value?

I decided to inquire by gathering the filings directly by scraping the EDGAR site.  While there are numerous data providers available (at a cost), I wanted to parse the raw data directly, as this would allow for greater “intimacy” with the underlying data. I’ve spent much of my career as a database developer/administrator, so working with raw text/xml and transforming it into a database structure seemed like fun. Also, since I desired this to be a true end-to-end data science project, including the often ugly 80% of the real effort – data wrangling, was an important requirement.  That being said, mining and cleansing the data was a monstrous amount of work. It took several weekends to work through the code and finally download 2.4 million unique files. I relied heavily on Powershell scripts to first parse through the files and shred the xml into database tables in MS SQL Server.

With data from the years 2005 to 2015, the initial 2.4 million records were filtered down to 650,000 Insider Equity Buy transactions. I focused on Buys rather than Sells because the signal can be a bit murkier with sells. Insider selling happens for a great many innocent reasons, including diversification and paying living expenses. Also, I focused on equity trades rather than derivatives for similar reasons -it can be difficult to interpret the motivations behind various derivative trades.  Open market buy orders, however, are generally quite clear.

After some careful cleansing, I had 11 years’ worth of useful SEC data, but in addition, I needed pricing and market capitalization data, ideally which would account for survivorship bias/dead companies. Respectively, Zacks Equity Prices and Sharadar’s Core US Fundamentals data sets did the trick, and I could obtain both via Quandl at reasonable cost (about $350 per quarter.)

For exploratory data analysis and model building, I used the R programming language. The models I utilized were linear regression, recursive partitioning, random forest and multiplicative adaptive regression splines (MARS).  I intended to make use of a support vector machine (SVM) models as well, but experienced a great many performance issues when running on my laptop with a mere 4 cores. SVMs have trouble with scaling. I failed to overcome this issue and abandoned the effort after 10-12 crashes, unfortunately.

For the recursive partitioning and random forest models I used functions from Microsoft’s RevoScaleR package, which allows for impressive scalability versus standard tree-based packages such as rpart and randomForest. Similar results can be expected, but the RevoScaleR packages take great advantage of multiple cores. I split my data into a training set for 2005-2011, a validation set for 2012-2013, and a test set for 2014-2015. Overall, performance for each of the algorithms tested were fairly similar, but in the end, the random forest prevailed.

For my response variable, I used 3-month relative returns vs the Russell 3000 index. For predictors, I utilized a handful of attributes directly from the filings and from related company information. The models proved quite predictive in the validation set as can be seen in exhibit 4.10 of the paper, and reproduced below:
The random forest’s predicted returns were significantly better for quintile 5, the highest predicted return grouping, relative to quintile 1(the lowest). Quintiles 2 through 4 also lined up perfectly - actual performance correlated nicely with grouped predicted performance.  The results in validation seemed very promising!

However, when I ran the random forest model on the test set (2014-2015), the relationship broke down substantially, as can be seen in the paper’s Exhibit 5.2, reproduced below:


Fortunately, the predicted 1st decile was in in fact the lowest performing actual return grouping. However, the actual returns on all remaining prediction deciles appeared no better than random. In addition, relative returns were negative for every decile.  

While disappointing, it is important to recognize that when modeling time-dependent financial data, as the time-distance moves further away from the training set’s time-frame, performance of the model tends to decay. All market regimes, gradually or abruptly, end. This represents a partial (yet unsatisfying) explanation for this relative decrease in performance. Other effects that may have impaired prediction include the use of price, as well as market cap, as predictor variables. These factors certainly underperformed during the period used for the test set. Had I excluded these, and refined the filing specific features more deeply, perhaps I would have obtained a clearer signal in the test set.

In any event, this was a fun exercise where I learned a great deal about insider trading and its impact on future returns. Perhaps we can conclude that this signal has weakened over time, as the market has absorbed the informational value of insider trading data. However, perhaps further study, additional feature engineering and clever consideration of additional algorithms is worth pursuing in the future.

John J Ryle, CFA lives in the Boston area with his wife and two children. He is a software developer at a hedge fund, a graduate of Northwestern’s Master’s in Predictive Analytics program (2017), a huge tennis fan, and a machine learning enthusiast. He can be reached at john@jryle.com. 

===
Upcoming Workshops by Dr. Ernie Chan

July 29 and August 5Mean Reversion Strategies

In the last few years, mean reversion strategies have proven to be the most consistent winner. However, not all mean reversion strategies work in all markets at all times. This workshop will equip you with basic statistical techniques to discover mean reverting markets on your own, and describe the detailed mechanics of trading some of them. 

September 11-15: City of London workshops

These intense 8-16 hours workshops cover Algorithmic Options Strategies, Quantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits.

===
Industry updates
  • scriptmaker.net allows users to record order book data for backtesting.
  • Pair Trading Lab offers a web-based platform for easy backtesting of pairs strategies.


Thursday, May 04, 2017

Paradox Resolved: Why Risk Decreases Expected Log Return But Not Expected Wealth

I have been troubled by the following paradox in the past few years. If a stock's log returns (i.e. change in log price per unit time) follow a Gaussian distribution, and if its net returns (i.e. percent change in price per unit time) have mean m and standard distribution s, then many finance students know that the mean log returns is m-s2 /2That is, the compound growth rate of the stock is m-s2 /2. This can be derived by applying Ito's lemma to the log price process (see e.g. Hull), and is intuitively satisfying because it is saying that the expected compound growth rate is lowered by risk ("volatility"). OK, we get that - risk is bad for the growth of our wealth.

However, let's find out what the expected price of the stock is at time t. If we invest our entire wealth in one stock, that is really asking what our expected wealth is at time t. To compute that, it is easier to first find out what the expected log price of the stock is at time t, because that is just the expected value of the sum of the log returns in each time interval, and is of course equal to the sum of the expected value of the log returns when we assume a geometric random walk. So the expected value of the log price at time t is just t * (m-s2 /2). But what is the expected price (not log price) at time t? It isn't correct to say exp(t * (m-s2 /2)), because the expected value of the exponential function of a normal variable is not equal to the exponential function of the expected value of that normal variable, or E[exp(x)] !=exp(E[x]). Instead, E[exp(x)]=exp(μ+σ2 /2) where μ and σ are the mean and standard deviation of the normal variable (see Ruppert). In our case, the normal variable is the log price, and thus μ=t * (m-s2 /2), and σ2=t *s. Hence the expected price at time t is exp(t*m). Note that it doesn't involve the volatility s. Risk doesn't affect the expected wealth at time t. But we just argued in the previous paragraph that the expected compound growth rate is lowered by risk. What gives?

This brings us to a famous recent paper by Peters and Gell-Mann. (For the physicists among you, this is the Gell-Mann who won the Nobel prize in physics for inventing quarks, the fundamental building blocks of matter.) This happens to be the most read paper in the Chaos Journal in 2016, and basically demolishes the use of the utility function in economics, in agreement with John Kelly, Ed Thorp, Claude Shannon, Nassim Taleb, etc., and against the entire academic economics profession. (See Fortune's Formula for a history of this controversy. And just to be clear which side I am on: I hate utility functions.) To make a long story short, the error we have made in computing the expected stock price (or wealth) at time t, is that the expectation value there is ill-defined. It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average". Finite-time average of wealth is what a specific investor would experience up to time t, for large t. Ensemble average is the average wealth of many millions of similar investors up to time t. Naturally, since we are just one specific investor, the finite-time average is much more relevant to us. What we have computed above, unfortunately, is the ensemble average.  Peters and Gell-Mann exhort us (and other economists) to only compute expected values of ergodic variables, and log return (as opposed to log price) is happily an ergodic variable. Hence our average log return is computed correctly - risk is bad. Paradox resolved!

===

My Upcoming Workshops

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques as applied to trading strategies, with plenty of in-class exercises, and with emphasis on nuances and pitfalls of these techniques.

June 5-9: London in-person workshops

I will teach 3 courses there: Quantitative Momentum, Algorithmic Options Strategies, and Intraday Trading and Market Microstructure.

(The London courses may qualify for continuing education credits for CFA Institute members.)


Friday, March 03, 2017

More Data or Fewer Predictors: Which is a Better Cure for Overfitting?

One of the perennial problems in building trading models is the spareness of data and the attendant danger of overfitting. Fortunately, there are systematic methods of dealing with both ends of the problem. These methods are well-known in machine learning, though most traditional machine learning applications have a lot more data than we traders are used to. (E.g. Google used 10 million YouTube videos to train a deep learning network to recognize cats' faces.)

To create more training data out of thin air, we can resample (perhaps more vividly, oversample) our existing data. This is called bagging. Let's illustrate this using a fundamental factor model described in my new book. It uses 27 factor loadings such as P/E, P/B, Asset Turnover, etc. for each stock. (Note that I call cross-sectional factors, i.e. factors that depend on each stock, "factor loadings" instead of "factors" by convention.) These factor loadings are collected from the quarterly financial statements of SP 500 companies, and are available from Sharadar's Core US Fundamentals database (as well as more expensive sources like Compustat). The factor model is very simple: it is just a multiple linear regression model with the next quarter's return of a stock as the dependent (target) variable, and the 27 factor loadings as the independent (predictor) variables. Training consists of finding the regression coefficients of these 27 predictors. The trading strategy based on this predictive factor model is equally simple: if the predicted next-quarter-return is positive, buy the stock and hold for a quarter. Vice versa for shorts.

Note there is already a step taken in curing data sparseness: we do not try to build a separate model with a different set of regression coefficients for each stock. We constrain the model such that the same regression coefficients apply to all the stocks. Otherwise, the training data that we use from 200701-201112 will only have 1,260 rows, instead of 1,260 x 500 = 630,000 rows.

The result of this baseline trading model isn't bad: it has a CAGR of 14.7% and Sharpe ratio of 1.8 in the out-of-sample period 201201-201401. (Caution: this portfolio is not necessarily market or dollar neutral. Hence the return could be due to a long bias enjoying the bull market in the test period. Interested readers can certainly test a market-neutral version of this strategy hedged with SPY.) I plotted the equity curve below.




Next, we resample the data by randomly picking N (=630,000) data points with replacement to form a new training set (a "bag"), and we repeat this K (=100) times to form K bags. For each bag, we train a new regression model. At the end, we average over the predicted returns of these K models to serve as our official predicted returns. This results in marginal improvement of the CAGR to 15.1%, with no change in Sharpe ratio.

Now, we try to reduce the predictor set. We use a method called "random subspace". We randomly pick half of the original predictors to train a model, and repeat this K=100 times. Once again, we average over the predicted returns of all these models. Combined with bagging, this results in further marginal improvement of the CAGR to 15.1%, again with little change in Sharpe ratio.

The improvements from either method may not seem large so far, but at least it shows that the original model is robust with respect to randomization.

But there is another method in reducing the number of predictors. It is called stepwise regression. The idea is simple: we pick one predictor from the original set at a time, and add that to the model only if BIC  (Bayesian Information Criterion) decreases. BIC is essentially the negative log likelihood of the training data based on the regression model, with a penalty term proportional to the number of predictors. That is, if two models have the same log likelihood, the one with the larger number of parameters will have a larger BIC and thus penalized. Once we reached minimum BIC, we then try to remove one predictor from the model at a time, until the BIC couldn't decrease any further. Applying this to our fundamental factor loadings, we achieve a quite significant improvement of the CAGR over the base model: 19.1% vs. 14.7%, with the same Sharpe ratio.

It is also satisfying that the stepwise regression model picked only two variables out of the original 27. Let that sink in for a moment: just two variables account for all of the predictive power of a quarterly financial report! As to which two variables these are - I will reveal that in my talk at QuantCon 2017 on April 29.

===

My Upcoming Workshops

March 11 and 18: Cryptocurrency Trading with Python

I will be moderating this online workshop for my friend Nick Kirk, who taught a similar course at CQF in London to wide acclaim.

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques such as those described above, with other examples and in-class exercises. As usual, nuances and pitfalls will be covered.

Wednesday, November 16, 2016

Pre-earnings Annoucement Strategies

Much has been written about the Post-Earnings Announcement Drift (PEAD) strategy (see, for example, my book), but less was written about pre-earnings announcement strategies. That changed recently with the publication of two papers. Just as with PEAD, these pre-announcement strategies do not make use of any actual earnings numbers or even estimates. They are based entirely on announcement dates (expected or actual) and perhaps recent price movement.

The first one, by So and Wang 2014, suggests various simple mean reversion strategies for US stocks that enter into positions at the market close just before an expected announcement. Here is my paraphrase of one such strategies:

1) Suppose t is the expected earnings announcement date for a stock in the Russell 3000 index.
2) Compute the pre-announcement return from day t-4 to t-2 (counting trading days only).
3) Subtract a market index return over the same lookback period from the pre-announcement return, and call this market-adjusted return PAR.
4) Pick the 18 stocks with the best PAR and short them (with equal dollars) at the market close of t-1, liquidate at market close of t+1.  Pick the 18 stocks with the worst PAR, and do the opposite. Hedge any net exposure with a market-index ETF or future.

I backtested this strategy using Wall Street Horizon (WSH)'s expected earnings dates data, applying it to stocks in the Russell 3000 index, and hedging with IWV. I got a CAGR of 9.1% and a Sharpe ratio of  1 from 2011/08/03-2016/09/30. The equity curve is displayed below.



Note that WSH's data was used instead of  Yahoo! Finance, Compustat, or even Thomson Reuters' I/B/E/S earnings data, because only WSH's data is "point-in-time". WSH captured the expected earnings announcement date on the day before the announcement, just as we would have if we were live trading. We did not use the actual announcement date as captured in most other data sources because we could not be sure if a company changed their expected announcement date on that same date. The actual announcement date can only be known with certainty after-the-fact, and therefore isn't point-in-time. If we were to run the same backtest using Yahoo! Finance's historical earnings data, the CAGR would have dropped to 6.8%, and the Sharpe ratio dropped to 0.8.

The notion that companies do change their expected announcement dates takes us to the second strategy, created by Ekaterina Kramarenko of Deltix's Quantitative Research Team. In her paper "An Automated Trading Strategy Using Earnings Date Movements from Wall Street Horizon", she describes the following strategy that explicitly makes use of such changes as a trading signal:

1) At the market close prior to the earnings announcement  expected between the current close and the next day's open, compute deltaD which is the last change of the expected announcement date for the upcoming announcement, measured in calendar days. deltaD > 0 if the company moved the announcement date later, and deltaD < 0 if the company moved the announcement date earlier.
2) Also, at the same market close, compute deltaU which is the number of calendar days since the last change of the expected announcement date.
3) If deltaD < 0 and deltaU < 45, buy the stock at the market close and liquidate on next day's market open. If deltaD > 0 and deltaU >= 45, do the opposite.

The intuition behind this strategy is that if a company moves an expected announcement date earlier, especially if that happens close to the expected date, that is an indication of good news, and vice versa. Kramarenko found a CAGR of 14.95% and a Sharpe ratio of 2.08 by applying this strategy to SPX stocks from 2006/1/3 - 2015/9/2.

In order to reproduce this result, one needs to make sure that the capital allocation is based on the following formula: suppose the total buying power is M, and the number of trading signals at the market close is n, then the trading size per stock is M/5 if n <= 5, and is M/n if n > 5.

I backtested this strategy from 2011/8/3-2016/9/30 on a fixed SPX universe on 2011/7/5, and obtained CAGR=17.6% and Sharpe ratio of 0.6.

Backtesting this on Russell 3000 index universe of stocks yielded better results, with CAGR=17% and Sharpe ratio=1.9.  Here, I adjust the trading size per stock to M/30 if n <=30, and to M/n if n > 30, given that the total number of stocks in Russell 3000 is about 6 times larger than that of SPX. The equity curve is displayed below:


Interestingly, a market neutral version of this strategy (using IWV to hedge any net exposure) does not improve the Sharpe ratio, but does significantly depressed the CAGR.

===

Acknowledgement: I thank Michael Raines at Wall Street Horizon for providing the historical point-in-time expected earning dates data for this research. Further, I thank Stuart Farr and  Ekaterina Kramarenko at Deltix for providing me with a copy of their paper and explaining to me the nuances of their strategy. 

===

My Upcoming Workshop

January 14 and 21: Algorithmic Options Strategies

This  online course is different from most other options workshops offered elsewhere. It will cover backtesting intraday option strategies and portfolio option strategies.

Wednesday, September 28, 2016

Really, Beware of Low Frequency Data

I wrote in a previous article about why we should backtest even end-of-day (daily) strategies with intraday quote data. Otherwise, the performance of such strategies can be inflated. Here is another brilliant example that I came across recently.

Consider the oil futures ETF USO and its evil twin, the inverse oil futures ETF DNO*. In theory, if USO has a daily return of x%, DNO will have a daily return of -x%. In practice, if we plot the daily returns of DNO against that of USO from 2010/9/27-2016/9/9, using the usual consolidated end-of-day data that you can find on Yahoo! Finance or any other vendor,





















we see that though the slope is indeed -1 (to within a standard error of 0.004), there are many days with significant deviation from the straight line. The trader in us will immediately think "arbitrage opportunities!"

Indeed, if we backtest a simple mean reversion strategy on this pair - just buy equal dollar amount of USO and DNO when the sum of their daily returns is less than 40 bps at the market close, hold one day, and vice versa - we will find a strategy with a decent Sharpe ratio of 1 even after deducting 5 bps per side as transaction costs. Here is the equity curve:





















Looks reasonable, doesn't it? However, if we backtest this strategy again with BBO data at the market close, taking care to subtract half the bid-ask spread as transaction cost, we find this equity curve:














We can see that the problem is not only that we lose money on practically every trade, but that there was seldom any trade triggered. When the daily EOD data suggests a trade should be triggered, the 1-min bar BBO data tells us that in fact there was no deviation from the mean.

(By the way, the returns above were calculated before we even deduct the borrow costs of occasionally shorting these ETFs. The "rebate rate" for USO is about 1% per annum on Interactive Brokers, but a steep 5.6% for DNO.)

In case you think this problem is peculiar to USO vs DNO, you can try TBT vs UBT as well.

Incidentally, we have just verified a golden rule of financial markets: apparent deviation from efficient market is allowed when no one can profitably trade on the arbitrage opportunity.

===
*Note: according to www.etf.com, "The issuer [of DNO] has temporarily suspended creations for this fund as of Mar 22, 2016 pending the filing of new paperwork with the SEC. This action could create unusual or excessive premiums— an increase of the market price of the fund relative to its fair value. Redemptions are not affected. Trade with care; check iNAV vs. price." For an explanation of "creation" of ETF units, see my article "Things You Don't Want to Know about ETFs and ETNs".

===

Industry Update
  • Quantiacs.com just recently registered as a CTA and operates a marketplace for trading algorithms that anyone can contribute. They also published an educational blog post for Python and Matlab backtesters: https://quantiacs.com/Blog/Intro-to-Algorithmic-Trading-with-Heikin-Ashi.aspx
  • I will be moderating a panel discussion on "How can funds leverage non-traditional data sources to drive investment returns?" at Quant World Canada in Toronto, November 10, 2016. 

===

Upcoming Workshops
Momentum strategies are for those who want to benefit from tail events. I will discuss the fundamental reasons for the existence of momentum in various markets, as well as specific momentum strategies that hold positions from hours to days.

A senior director at a major bank wrote me: "…thank you again for the Momentum Strategies training course this week. It was very beneficial. I found your explanations of the concepts very clear and the examples well developed. I like the rigorous approach that you take to strategy evaluation.”

Friday, June 17, 2016

Things You Don't Want to Know about ETFs and ETNs

Everybody loves trading or investing in ETPs. ETP is the acronym for exchange-traded products, which include both exchange-traded funds (ETF) and exchange-traded notes (ETN). They seem simple, transparent, easy to understand. But there are a few subtleties that you may not know about.

1) The most popular ETN is VXX, the volatility index ETF. Unlike ETF, ETN is actually an unsecured bond issued by the issuer. This means that the price of the ETN may not just depend on the underlying assets or index. It could potentially depend on the credit-worthiness of the issuer. Now VXX is issued by Barclays. You may think that Barclays is a big bank, Too Big To Fail, and you may be right. Nevertheless, nobody promises that its credit rating will never be downgraded. Trading the VX future, however, doesn't have that problem.

2) The ETP issuer, together with the "Authorized Participants"  (the market makers who can ask the issuer to issue more ETP shares or to redeem such shares for the underlying assets or cash), are supposed to keep the total market value of the ETP shares closely tracking the NAV of the underlying assets. However, there was one notable instance when the issuer deliberately not do so, resulting in big losses for some investors.

That was when the issuer of TVIX, the leveraged ETN that tracks 2x the daily returns of VXX, stopped all creation of new TVIX shares temporarily on February 22, 2012 (see sixfigureinvesting.com/2015/10/how-does-tvix-work/). That issuer is Credit Suisse, who might have found that the transaction costs of rebalancing this highly volatile ETN were becoming too high. Because of this stoppage, TVIX turned into a closed-end fund (temporarily), and its NAV diverged significantly from its market value. TVIX was trading at a premium of 90% relative to the underlying index. In other words, investors who bought TVIX in the stock market by the end of March were paying 90% more than they would have if they were able to buy the VIX index instead. Right after that, Credit Suisse announced they would resume the creation of TVIX shares. The TVIX market price immediately plummeted to its NAV per share, causing huge losses for those investors who bought just before the resumption.

3) You may be familiar with the fact that a β-levered ETF is supposed to track only β times the daily returns of the underlying index, not its long-term return. But you may be less familiar with the fact that it is also not supposed to track β times the intraday return of that index (although at most times it actually does, thanks to the many arbitrageurs.)

Case in point: during the May 2010 Flash Crash, many inverse levered ETFs experienced a decrease in price as the market was crashing downwards. As inverse ETFs, many investors thought they are supposed to rise in price and act as hedge against market declines. For example, this comment letter to the SEC pointed out that DOG, the inverse ETF that tracks -1x Dow 30 index, went down more than 60% from its value at the beginning (2:40 pm ET) of the Flash Crash. This is because various market makers including the Authorized Participants for DOG weren't making markets at that time. But an equally important point to note is that at the end of the trading day, DOG did return 3.2%, almost exactly -1x the return of DIA (the ETF that tracks the Dow 30). So it functioned as advertised. Lesson learned: We aren't supposed to use inverse ETFs for intraday nor long term hedging!

4) The NAV (not NAV per share) of an ETF does not have to change in the same % as the underlying asset's unit market value. For example, that same comment letter I quoted above wrote that GLD, the gold ETF, declined in price by 24% from March 1 to December 31, 2013, tracking the same 24% drop in spot gold price. However, its NAV dropped 52%. Why? The Authorized Participants redeemed many GLD shares, causing the shares outstanding of GLD to decrease from 416 million to 266 million.  Is that a problem? Not at all. An investor in that ETF only cares that she experienced the same return as spot gold, and not how much assets the ETF held. The author of that comment letter strangely wrote that "Investors wishing to participate in the gold market would not buy the GLD if they knew that a price decline in gold could result in twice as much underlying asset decline for the GLD." That, I believe, is nonsense.

For further reading on ETP, see www.ici.org/pdf/per20-05.pdf and www.ici.org/pdf/ppr_15_aps_etfs.pdf.

===

Industry Update

Alex Boykov co-developed the WFAToolbox – Walk-Forward Analysis Toolbox for MATLAB, which automates the process of using a moving window to optimize parameters and entering trades only in the out-of-sample period. He also compiled a standalone application from MATLAB that allows any user (having MATLAB or not) to upload quotes in csv format from Google Finance for further import to other programs and for working in Excel. You can download it here: wfatoolbox.com/epchan.

Upcoming Workshop

July 16 and 23, Saturdays: Artificial Intelligence Techniques for Traders

AI/machine learning techniques are most useful when someone gives us newfangled technical or fundamental indicators, and we haven't yet developed the intuition of how to use them. AI techniques can suggest ways to incorporate them into your trading strategy, and quicken your understanding of these indicators. Of course, sometimes these techniques can also suggest unexpected strategies in familiar markets.

My course covers the basic AI techniques useful to a trader, with emphasis on the many ways to avoid overfitting.