NIFTY INDEX PREDICTION WITH THE HELP OF DATA SCIENCE
TANKY (02 Aug 2021)
Note: This is the third blog in series of ‘Data Science in Daily Lives’ and looks at the feasibility of introducing data science to equity index predictions which could facilitate meaningful and informed investment decisions. Financial markets tend to be very unpredictable and even illogical at times. Case in point is the disconnected state of world’s equity markets as on date from the covid induced economic slowdown world over. While the economic output and growth of most economies have taken a beating, the equity markets around the world after a deep, although brief dip in mid-2020 have since scaled new peaks and seem to be singing a totally different tune. Due to these unpredictable characteristics, financial data should be necessarily possessing a rather turbulent structure which often makes it hard to find reliable patterns. Modelling turbulent structures requires ML algorithms capable of finding hidden structures within the data and predict how they will affect them in the future. The most efficient methodology to achieve this is Machine Learning, Deep Learning and Time Series Analysis.
While ML and time series models could be developed and deployed to predict equity indices under the assumption that future follows the past patterns (trend), the values are also heavily subject to unpredictable external environment and associated sentiments. The models thus developed may be more relevant over longer term rather than in short term. This is a small effort in NIFTY index prediction over coming years.
This should not be construed as financial advice but rather as a pointer towards using data science in investment decisions. Further, I would advise you to hire the services of a qualified professional to assess the model created and the resultant predictions and above all, to do your own research prior investing.
History of NIFTY Index. The NIFTY 50 is the flagship index on the National Stock Exchange of India Ltd. (NSE). The Index tracks the movement of a portfolio of blue-chip companies, the largest and most liquid Indian securities. It includes 50 of the approximately 1600 companies listed on the NSE, captures approximately 65% of its market capitalization and is considered a true reflection of the Indian equity market.
The NIFTY 50 has been trading since April 1996 and is owned and managed by India Index Services and Products Ltd (IISL).
Advisability of including Equities in Portfolio. Investment in equities has been a prudent way of wealth creation, albeit it comes with its own sets of risks and rewards. Equity investments directly benefit from the progress of the economy. Equity investing has provided exceptional long-term returns in the past. In 40 years, the Sensex has effectively delivered a compounded annual growth rate of 16.09 per cent. Thus, equity investing is one of the best ways to achieve superlative capital growth and wealth creation. Investing is a long-term necessity. An optimised asset allocation and investment plan can help one accelerate wealth creation.
Methodology Adopted. Equity indices are not randomly generated values instead they can be treated as a discrete-time series model which is based on a set of well-defined numerical data items collected at successive points at regular intervals of time. We shall develop and use different models to predict NIFTY. Forecasts are done under the assumption that the market and other conditions in future would continue to be very much like the present. Not that there would be no changes, but that the change if at all would be gradual, not a drastic one.
Data. Data set used here is obtained from the archives of the NSE url <<NSE - National Stock Exchange of India Ltd. (nseindia.com)>>. The dataset has collated information regarding the daily NIFTY index values. The data used comprises of daily NIFTY adjusted close values during the period from 25 July 2011 till 31 July 2021.
Dataset Pre-processing. This step mainly consists of cleaning the dataset, looking for and imputing null values, finding duplicate data points and getting rid of them. Next, we parse date values and make the dataset ready for time series analysis. The cleaned dataset looks as follows:
Past Price Variations. The NIFTY index has passed through a turbulent past ten-year period, although with a clear upward trend. A plot of the variation is as shown below:
Yearly Box Plot. There is a clear uptrend visible across the years, with major volatility visible in 2020 due to the onset of pandemic. 2014 and 2021 also shows volatility albeit lesser than that of 2020.
Monthly Box Plot. There is absolutely no seasonality present in the data. However it can be seen that the volatility in index values is comparatively lower through the months from Apr to Jul across the years.
Monthly index variations across the years. A plot of monthly index variations for the various years under consideration is shown below.
Trend & Seasonality. A decomposition plot of the timeseries data is plotted below. The trend curve gives an impression of steadily increasing trend across the years. While absence of seasonality pattern is evident from the seasonality curve, lots of error/residuals (white noise) are present.
Train-Test Split. We divide the data into train and test data. Train data comprises all the data points till 31 Dec 2018 and test dataset comprises of all the balance data points.
LINEAR REGRESSION ON TEST DATA
NAIVE PREDICTION ON TEST DATA
SIMPLE AVERAGE PREDICTION ON TEST DATA
DOUBLE EXPONENTIAL SMOOTHING PREDICTION ON TEST DATA
The RMSE values for the models (errors between the predicted values and the test values) are indicated below.
Predictions till July 2025. Now prepare the prediction dataset by parsing dates from 01 Aug 2021 till 31 July 2025. This prediction dataset is used on two of the previously developed and trained models (LR and Holt’s/DES Models). While LR model prediction indicates a modest increase over the next four years, based on the best fit line, DES Model indicates a more substantial increase. The plots of predictions are included below.
PREDICTION TILL JULY 2025 BY LINEAR REGRESSION MODEL
PREDICTION TILL JULY 2025 BY DES MODEL
The Google Colab notebook link is embedded below. <https://colab.research.google.com/drive/1hNI-bEn8717HZiLn7BI3RfkGmoZGpntU?usp=sharing>
Conclusion. The DES Model predicts a NIFTY value of approximately 19678 by end July 2025.