• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Using Machine Learning Methods to Predict Stock Prices Based on Indicator Functions and News Analysis

Student: Makhrov Boris

Supervisor: Mariam Mamedli

Faculty: Faculty of Economic Sciences

Educational Programme: Statistical Modelling and Actuarial Science (Master)

Year of Graduation: 2021

The first stage of the work was of an introductory nature. It was used to select the correct types and characteristics of models that will be used in the study. So, after considering various types of moving averages, it was found that the best option for our model would be an exponential moving average (EMA). After considering methods for predicting stock prices using neural networks, it was found that predicting stock prices based on news reports will be based on the “random forest”model. And for predicting stock prices based on previous prices, the long-term Memory Neural network (LSTM) is suitable. It was also determined which hyperparameters are best suited for these two neural networks. In addition, Apple was selected, on the shares of which the models will be built. The second stage included a preliminary analysis of Apple as of March-April 2021. The main advantages and disadvantages of the company were considered, and it was also revealed that 65% of all profits are Iphone sales. This fact suggests that stock prices may react strongly to the releases of new Apple phones. This hypothesis was confirmed based on an analysis of the popularity of requests for new Iphones using the Google trends platform. So, on average, for one Iphone, we have the following indicators: a month before the presentation, the price has time to grow by 3.916%, in the period from the presentation to the start of sales, the indicator decreases to 1.48%, and for a monthly period it goes down to 0.82%. In addition, based on the stock multipliers, we conclude that as of March-April 2021, Apple shares are highly overvalued in the market, while at the same time showing a very high return on investment. Finally, technical indicators confirm our concerns about overbought shares of this company and together with moving averages give recommendations for selling shares with a trading horizon of 1 month. The third part of the study is based on the construction of three models. The first model consisted of a portfolio (3 stocks) and compared two trading strategies: "buy and hold" stocks, as well as an exponential moving average (EMA) strategy. With an investment horizon of 21 years (from 2000 to 2021), the “buy and hold " strategy showed a yield of 435.6%, while among the exponential averages, the best model was the EMA (200) with a yield of 399.2%. Two investment periods were also considered: on the annual period, the best return indicator remained for the “buy and hold " strategy, but on the monthly horizon, the EMA (20) strategy performed better than the others. It is also worth noting the excellent profitability of EMA strategies in the face of crises and market crashes. The second model is a "random forest" model, which was supposed to predict a price change of more than 0.5% based on news analysis. The total sample was a five-year period (from January 1, 2016 to January 1, 2021). The model showed an AUC value of 0.7 and Accuracy of 0.638. The efficiency of the model can be considered "good". Finally, we consider LSTM models with different characteristics based on the closing price data for the period from January 1, 2015 to April 1, 2021. The model containing three hidden layers with 50 neurons and 1 output layer performed better than the others. This model was trained on 100 epochs, has a linear activation function, an Adam optimizer, and a batch size parameter of 32. The model error indicators can be considered good: MSE = 0.00015, MAE = 0.0096, RMSE = 4.56. Based on this model, the price forecast for the next day was built, which showed a discrepancy with the real price of 0.34%.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses