I've read both the Google Trends paper and the Wikipedia paper, and implemented both of them. Two major things jumped out at me -
1. They tried out 40-50 words in Google Trends and backtested all of them. The word 'debt' is the one that performed best over the 7 year period of the study. Does anyone think it's going to be the most profitable over the next 7 years? Similarly I could backtest 50 signals from a random number generator. One of them is going to be the best over the past 7 years, but that tells me nothing about its prospects for the next 7.
2. Eyeballing the graph, about half their return comes from the last quarter of 2008. This tells me (i) the signal just got lucky to be short in a period where the market was tanking, and (ii) they don't have any risk controls. You should never be making 50% of your pnl in a period that represents only 1/30 of your sample. I'm willing to bet that the bootstrap value at risk of this strategy is pretty poor.
Point #1 is key. If you test enough terms retrospectively you'll find something amazing. Prospective hypothesis testing is where you actually prove something works (or not).
To the authors' credit, there seem to be a semantic pattern to the words that show profitable predictive correlation with the DJIA. Point #2 seems more significant to me... but yeah, I would agree.
Does order(c.security, -c.order_size) account for transaction costs, borrowing costs and vwapping using a symbol specific profile based on volume and volatility? Probably not a big deal for SPY, but would make more of a difference for a larger portfolio.
Right now, you have to enter parameters into one of our pre-made models, but soon (next week probably) you'll be able to write a custom model that uses whatever inputs you'd like to calculate commission and slippage.
> Does order(c.security, -c.order_size) account for transaction costs, borrowing costs and vwapping using a symbol specific profile based on volume and volatility?
What does it mean to account for "vwapping"?
I mean I know what VWAP is, I write trading algos for a living but I have no idea what your trying to ask here:)
The instant a meaningfully profitable strategy is disclosed, quants incorporate it to their tools, which generates an arbitrage force that makes it less efficient --and the strategy stops working right away.
I wonder if quants had noticed this kind of correlation prior to this result being published in Nature. I suppose yes, it's pretty basic after all (even Google have been promoting Google trends on their finance page as a heuristic for trading, so someone must have done a systematic study a while ago already)
Be extremely careful with this type of correlation. I had a colleague that thought whatever Twitter sentiment was, would reflect the market at a given point in time. However, this is a fallacy, as any "trend" in the market is already priced in by the time you can react to it. I would strongly recommend against anyone taking this type of strategy to be profitable in the medium or long term.
1. They tried out 40-50 words in Google Trends and backtested all of them. The word 'debt' is the one that performed best over the 7 year period of the study. Does anyone think it's going to be the most profitable over the next 7 years? Similarly I could backtest 50 signals from a random number generator. One of them is going to be the best over the past 7 years, but that tells me nothing about its prospects for the next 7.
2. Eyeballing the graph, about half their return comes from the last quarter of 2008. This tells me (i) the signal just got lucky to be short in a period where the market was tanking, and (ii) they don't have any risk controls. You should never be making 50% of your pnl in a period that represents only 1/30 of your sample. I'm willing to bet that the bootstrap value at risk of this strategy is pretty poor.