Forecasting the Long Term Economics Status of Bangladesh Using Machine Learning Approaches from 2016-2036

Received : 22-03-2019 Accepted : 04-05-2019 ABSTRACT: It is a piece of happy news for us that Bangladesh has been now converted to a developing country. The United Nation and World Bank have recognized it. But they have a condition that we need to continue this economic progress till 2024 for getting a permanent recognition. The economic condition depends on many factors like Gross Domestic Product (GDP), Personal Saving, Private Sector Investment, Gross National Income (GNI) per capita, Human Assets Index (HAI) and Economic Vulnerability Index (EVI). This paper portrays the forecast of the long-term economic condition of Bangladesh as an independent variable which is a year and the dependent variables are GDP, private sector investment and personal saving. The living conditions of a country depend on GDP. Personal saving and Private Sector Investment are also important parts of a country’s economy. If we will forecast these attributes properly, then we can determine the economic condition of Bangladesh and living status of the people more accurately. Therefore, we can determine that Bangladesh can fulfil the condition of getting permanent recognized as a developing country. For forecasting these attributes, we proposed a model which consists of Karl Pearson’s coefficient and modified linear regression techniques. For improving performance, we modify linear regression by gradient boosting. This experiment shows that our model gives us more accurate forecasting about GDP, Private sector investment and Personal Saving.


Introduction
The economy is a term which is related to production and consumption activities that focus to allocate the nation's resources for the public benefit. The development of a nation mainly depends on its economic status. The main aim of our paper is to focus on forecasting GDP, Personal Saving and Private Sector Investment. GDP is the monetary value of all the finished goods and services produced within a country's borders in a specific time period [1]. GDP is commonly used as an indicator of the economic health of a country [1]. Bangladesh posted a GDP growth rate of 7.9% in FY-18 and is expected to achieve 7.5% in FY-19, according to a report by the Asian Development Bank [2]. Personal saving refers to a person saving money, property, etc for future use or invests in anything in order to get a profit in the future from this investment. People save their money at Banks, Insurance Company or invest in the stock markets, etc. Private sector Investment is also the result of personal saving. Nowadays, Banks, Insurance companies are interested to invest in the private sectors with a lowinterest rate. As a result, the unemployed are trying to start their own business which changes not only their economic condition but also the economic condition of Bangladesh [2].
For forecasting GDP, Personal saving and Private sector Investment, we proposed well known Karl Pearson coefficient and linear regression method [4]. Karl Pearson coefficient is a method which helps us to find a relationship between two or more variables [3]. The relationship is of two types. One is positive and the other one is negative. We found three pairs of a relationship between variables like a year and GDP; a year and Personal Saving; year and private sector investment [1]. We have used this coefficient in the linear regression for forecasting purpose. Linear regression is a method to predict a dependent variable, based on the independent variable. Gradient boosting is an ensemble technique which improves the performance of a weak classifier [5]. After applying gradient boosting, our model forecast variables more accurately [4]. If we forecast these variables, we can say that people saving increased or not, if saving increased than investment increased,

Fig1. The Block Diagram of Proposed Model
if the investment is increased then more jobs field is created which increased the earnings of people. In this situation, the total GDP of a country is also increased [1].
In this paper, Section II describes the Statistical and Machine Learning Algorithms which is used in our model, Section III is a methodology which describes our proposed model, Section IV is an experimental evolution where we describe the description of our data, and Section V is a result analysis step Where, we describe the result of our model, Section VI we concluded our work.

2.1Karl Pearson's Coefficient
It is a widely used mathematical method wherein the numerical expression is used to calculate the degree and direction of the relationship between linear related variables. Pearson's method, popularly known as a Pearsonian Coefficient of Correlation, is the most extensively used quantitative methods in practice [3]. The coefficient of correlation is denoted by "r". The value of the coefficient of correlation (r) always lies between '±1'. Such as, if 'r' is equal to '+1' then perfect positive correlation. If 'r' is equal to '-1' then perfect negative correlation. If 'r' equal to '0', no correlation. The method worked when the relationship between the variables is linear, there are a large number of independent causes that affect the variables under study so as to form a Normal Distribution, The variables are independent of each other [3].For finding coefficient 'r' we use the following formula (1): Here, x, y are two continuous variables.

2.2.Linear Regression
Linear regression is used for finding a linear relationship between the target and one or more predictors. There are two types of the linear regression-Simple and Multiple. Linear regression is useful for finding a relationship between two continuous variables. One is the predictor or independent variable and another is a response or dependent variable. It looks for a statistical relationship, but a not deterministic Relationship. The relationship between the two variables is said to be deterministic if one variable can be accurately expressed by the other. The core idea is to obtain a line that best fits the data. The best fit line is the one for which total prediction error (all data points) is as small as possible [4]. The error is the distance between the points to the regression line. The general formula of linear regression is (2): Here, y is the dependent variable, x is the independent variable and a, b are coefficient.

2.3.Gradient Boosting
It is introduced by Friedman in 2001. It is also known as MART (Multiple Additive Regression Trees) and GBRT (Gradient Boosted Regression Trees). Gradient boosting is widely used in industry and has won many Kaggle competitions. GBM constructs a forward stage-wise additive model by implementing gradient descent in function space [5]. The first step of the GBM is calculated Error. For this we use (3): Then we use (4) for getting the new predicted value.

Methodology
For the purpose of forecasting, a forecasting model was defined. The working principle of the proposed model has been shown in Figure 1.The basic modules of our model are described below:

3.1.Finding Karl Pearson's Coefficient
According to Fig.1, the first step of our model is to find the coefficient (r). For this purpose, we use (1). Before using (1), we need to find the value of the equipment of the equation [3]. So, we are finding the mean of variables using (5) and (6) which is denoted by ' ̅ 'and 'ȳ'. Where, 'x' and 'y' are the values and 'N' is the total number of individuals. Then we have found the value of "XY" using (7). At last, we put all the values in (1) and find the value of "r".

3.2.Hypothesis Testing
The second step of our model is hypothesis testing. We use hypothesis testing for finding that the relationship between variables is significant or not. For this purpose, we need to choose an appropriate significant level 'α'. The most common use value of the "α" is 0.05. Then, we have found the degree of freedom using (8). Then we compare the value of coefficient "r" with a critical value of "r" using Table 1. If coefficient "r" is greater than or equal critical "r", then reject the null hypothesis and the relationship between variables are significant. If coefficient "r" is less than critical "r", then accept the null hypothesis and the relationship between variables are less significant. If the relationship between Variables is significant then we go to the forecasting step. If the relationship between variables is less significant, then we stop our model.

Linear Regression:
At forecasting step, firstly we are using linear regression for forecast variables. For this purpose, we use (2). For finding the value of 'a' and 'b' we basically use (9) and (10). After finding the values of 'a' and 'b', we formulate an equation like (2). But this equation can't give us a more accurate result, so we need to modify it.
2. Gradient Boosting: By using this method, we can improve the performance of linear regression. For this purpose, first, we are using (4) to find the error. Then this value is added with a predicted value according (5). This process gives 100% accurate result. But we can only apply this method when the actual value is known. In a single sentence, we can say that it is only applicable to training data. We try to modify this method to apply to test data [4][5].

Modified Linear Regression:
Using (2), we can predict the value of a dependent variable based on an independent variable. Using this method first we try to predict the value of training data. From this value, we choose one value as an ideal value and calculated the error for this value using (3). If we predict GDP from 2016-2036 then we chose 2016 as an ideal value because we know its actual value. So we calculate an error for this value. Then we can add to the predicted value. But, we can't use the method for 2018-2036 because the actual value of this year is unknown. So in this case, we proposed (11).

= + + (11)
Here, 'Error' is found in an ideal case using (3), 'n' is a variable which value is 1, 2, 3... a number of the test case. It is increased with 'x'. If starting from 2016 then 'n' is one for 2016, 'n' is two for 2017, which is increased with 'x'. We are always taking the first test case as an ideal case. We can choose a first test case in a way that its actual value is known to us. This approach gives a better result than linear regression [4]. For forecasting purposes, we choose four attributes where the year is an independent variable and the dependent variables are GDP, personal saving and private sector investment. The value of year is a positive integer number. The value of other variables is a positive floating point number. We collect all the value of a variable against with year. The relationship between the year and other variables is linear. That is indicated in almost case the value of the variables are linearly increased which are linearly increased with year. This type of data is also known as time series data. The time series model is basically an econometric model. It helps us to forecast data based on previous data. The value of data is changing with time. In Table 2, We see the GDP, personal saving and private sector investment data from '2008' to '2017'. We use this data because it is available for non-commercial research purpose. The value of GDP is USD billion, personal saving is BDT billion and private sector investment is the percentage of GDP published [7]. GDP indicates the total amount of Bangladesh GDP in the year.

Experimental Evolution
Personal saving indicates the total money saving by total Bangladeshi people from their income in a year. Private sector investment indicates how many percentages of total GDP invest in the private sector.

Result Analysis
According to Fig.1, First, we are finding the coefficient between variables. The Karl Pearson's Coefficient between year and GDP is '0.97', between a year and Personal Saving is '0.971', between a year and Private Sector Investment is '0.92'. All the Values pass the Hypothesis testing. In the hypothesis testing, the value of the critical 'r' is '0.632' which is less than Coefficient. In the forecasting step, first, we find the linear regression between variables. We found three equations (12), (13) and (14). Here, we put the value of year is equal to '2016' and '2017' for finding values and compare results. We can see in Table 3 that the error rate of this case is very high. It is also increased with a year [7]. In this case, we need to modify linear regression like (11). We take the year '2016' as an ideal case. Finding the error for GDP, personal saving and private sector investment by taking a year is equal to '2016' [4]. The error for GDP is '5.93 We can see in Table 4 that the error rate, in this case, is very lower than the previous case. In Fig.2 we can see that GDP is increased with a year. If GDP is increased then personal saving and private sector investment should be increased with a year. In Fig.3 and Fig.4 we can see that personal saving and private sector investment are increasing with a year. We see that all these three attributes are increasing with a year; as a result, the economic condition of Bangladesh is progressing. If the economic condition is improved then the living status of peoples are improved. By using this forecast, we find the value of GDP, personal saving and private sector investment from 2016-2036 which help us to take any decision to improve these values.

Conclusion
Long-term forecasting about anything is a very difficult task, especially in the economy. Because the economic condition of a country depend on many factors. That factor again influenced by its surrounding condition. We forecast on GDP, private sector investment and personal saving of their values and assuming it is linearly increased. But these values are influenced by the political condition of the country, the environmental condition of a country, etc. But using our model we try to provide a more accurate forecast about this variable. From this forecasting, we can see that the value of these variables is increased by the year. It is a positive sign for our country. If GDP of our country is increased in a way that we can say that the future living conditions of the peoples will be improved. Personal saving is increased so that they will invest more money. As a result, private sector investment is also increased. For these, the unemployment rate will be decreased. It is a good sign for our, the value of GDP, personal saving and private sector investment are increasing but we need to also ensure that we use it for properly for developing in the country. We try to use this model to forecast other variables which influence the economic condition of our nation in the future.