Stock Price Prediction

A hunt for smart beta


Factor investing has become increasingly popular. The amount of capital managed with smart beta, a form of factor investing, has doubled from 510 billion USD in September 2015 to 1 trillion USD at the end of 2017. There is both data and qualitative evidence supporting that exposure to the stock factors can yield returns greater than the returns of a market portfolio with the same amount of risk. In other words, positive alpha should been consistently achieved.

In this project, we wish to explore the possibility of quantitatively hunting alpha by utilizing the power of time-series data. Specifically, the value, momentum and profitability (also known as earnings quality) styles are explored, as these factor premiums were considered to have intuitive reasons for existing and persisting, and there should be diversifying effect from exposure to all three simultaneously.


A model that could dynamically adjust the emphasis (weights) given to signals as their predictive powers change was desired to tackle the issues of signal selection and changing market conditions. To manage the downward risk, the model should be able to also take the prevailing macroeconomic conditions into account through the application of top-down signals and produce signals to buy or sell shares of a given company. Additionally, a equal-weighted long/short strategy was considered with the flexibility to take up long and/or short positions depending on the model output.

Based on the above specifications, we opted for a Recurrent Neural Network (RNN) model with Long Short-Term Memory (LSTM) cells. RNN-models are considered the best deep learning models for predicting time series, which matched the nature of the problem at hand. Different stocks respond differently to signals in the same time period so instead of training a large and complex model for all stocks, we build one model for each stock or cluster to reduce the interference among irrelevant stocks. The model selects and weighs 9 signals of the three styles described above and 3 top-down signals, and it produces predictions of the next month’s returns from which a long/short decision is made for a stock at a given time. Rebalancing is done monthly.


We explored two approaches of training.

Cluster-based Training

In cluster-based training, the input of each epoch corresponds to a group of stocks. One potential advantage of this approach is that the firm-specific risk of different stocks can be cancelled out, resulting in more stable predictions. However, the gains of individual stocks are also diluted in this approach.

The result was disappointing as the cumulative return fall to negative in the second half of the testing period.

Cluster-based training

Single-stock Training

In single-stock training, the input of each epoch corresponds to an individual stock. This at the same time creates exposure to stock-specific gain and loss.

The result is surprisingly better compared to cluster-based training. The portfolio achieved 10.57% annualized return and a Sharpe ratio of 1.64.

Single-stock training

Limitations of Model

While transaction costs were considered in Part 5 of this report, it was not taken into account when designing the model. In Part 5 we considered hypothetical transaction cost, and found that with a transaction cost of 0.651%, the strategy’s profits would be eliminated The actual transaction cost could have been modelled with data about the stock prices and bid/ask spreads. This would have given a more realistic picture of performance after taking implementation cost into account.

It was also assumed that there was no restriction on short-selling, and that there was zero borrowing cost. Future work could take the implementation impacts of short-selling into account.

Due to the limited time window used, the model was not tested under turbulent conditions like a financial crisis. Thus, there may be some hidden downside risk associated with this strategy. Future research could be done to manage such risks, for example by including more recession data.