A new approach to forecast Malaysian mortality rates

In this paper a new approach to project the time varying index 𝑘 𝑡 for the Lee-Carter (LC) model by using a machine learning technique known as Neural Network (NN) is proposed for forecasting Malaysian male and female mortality rates. To evaluate the forecasting performance of the proposed model, the conventional LC model which uses ARIMA to forecast 𝑘 𝑡 is used as a benchmark. However unlike previous studies were done in Malaysian, we employed 9 different ARIMA models and evaluated the AIC and BIC to obtain the best fit model to forecast 𝑘 𝑡 . The forecasting performance of the two methodologies were then compared using 3 performance indicators Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). In this study, findings showed that the proposed NN model outperformed the conventional ARIMA model for forecasting both Malaysian male and female mortality rates.


Introduction
*Over the last few decades as a result of an unprecedented increase in mortality rates around the world and the potential risk this increase pose towards population size and structure, social security systems, life insurance, and pensions industry worldwide. Modelling and forecasting mortality rates accurately has become an essential research area for practitioners of actuarial science and demography in many countries.
Several mortality forecasting models have been proposed in the past. However, the most successful model was introduced by Lee and Carter (1992) which incorporated stochastic forecasting.
The Lee-Carter (LC) model has been said to be the leading statistical model of mortality forecasting (Deaton and Paxson, 2004). It has become one of the most favorable models to be used for modelling and forecasting mortality in many counties, such as US (Lee and Carter, 1992), Chile (Lee and Rofman, 1994), China (Jiang, 1995), Japan (Wilmoth, 1996), the seven most economically advanced nations (G7) (Tuljapurkar et al., 2000), the Nordic countries , India (Yadav et al., 2012), Sri Lanka (Aberathna et al., 2014), Thai (Yasungnoen and Sattayatham, 2016) and Malaysia (Ngataman et al., 2016). Furthermore, it has also become a model that is utilized by policy analysts around the world for forecasting all-cause and cause-specific mortality (Girosi and King, 2007).
Although the LC model has its advantages such as, simplicity of parameter estimation, ease of parameter interpretation, ability to acquire all values of a countries life table and also having the capability to produce probabilistic intervals. The LC model also has some disadvantages which has caused it to become a basis model for which many modifications and extension have been proposed. Wilmoth (1993) introduced two methods to improve the fitting of the LC model by employing weighted Singular Value Decomposition (SVD) and by using Maximum Likelihood Estimation (MLE). Lee and Miller (2001) proposed a variant of the LC model which restricted the fitting period, adjusted the time varying index of the LC model by matching life expectancy and eliminated the jump off errors by forecasting from the observed rates. Booth et al. (2002) introduced a method to ascertain the optimum fitting period based on the assumption of recent trends are bound to be linear, and also fitted the time varying index to the age distribution of death after finding systematic departures from linearity in the Australian mortality decline. Brouhns et al. (2002) proposed a variant where the LC model is embedded in a Poisson regression model, suited for fitting and forecasting age sex specific mortality rates. Li and Lee (2005) proposed an extension for modelling group population. While an extension for modelling log death rates to forecast age specific mortality and fertility rates was proposed by Hyndman and Ullah (2007), using the functional data paradigm framework of Ramsay and Silverman (2005). De Jong and Tickle (2006) introduced a more flexible model using the state space framework by Harvey (1990) for modelling the log death rates.  implemented a fuzzy formulation of Lee-Carters model to overcome the violations of the Lee-Carter assumption of constant error variance across age. Yue et al. (2008) applied a similar approach by Bell (1997) and Hyndman and Ullah (2007), proposing an age-shift model using principal component analysis, which included two secondorder interaction terms to overcome an unsupported assumption of LC in various countries (i.e., parameters are dimed to be constant). Recently another extension was introduced by Neves et al. (2017), which adopted the general framework of the LC model. But, considered several competing conditional distributions for different outcome variables and a new class of time series models, known as, Generalized Autoregressive Score model (GAS) by Creal et al. (2008;, for estimating, forecasting and simulating mortality rates. Furthermore, recent advances in fast computation and numerical methods have enabled a more widespread use of the Bayesian approach in many fields of application, including population forecasting (Wiśniowski et al., 2015). Wiśniowski et al. (2015) presented a fully integrated and dynamic Bayesian approach which is embedded to LC type models to forecast populations by age and sex, it also could be used to handle different data types and sources of information. For more on Bayesian LC extensions, see Wong et al. (2018), Antonio et al. (2015) and Raftery et al. (2013).
Many of the models developed by extending or modifying the LC model had one thing in common, which is the need to forecast the time varying index. Typically, this is done by using the Box Jenkins strategy for autoregressive integrated moving average (ARIMA) models (Box et al., 2015). However due to the limitation of this approach, such as, not being able to forecast over long term interval (Girosi and King, 2007). Carter (1996) proposed a state space approach to model and forecast the time varying index, however it was pointed out by Lee and Miller (2001) that the proposed model was barely distinguishable from the original model.
This study aims to propose the use of an alternative method to model and forecast time varying index for LC type models by using a machine learning model known as Neural Network (NN). Neural network models have been prevailing as a forecasting method in both theoretical and empirical works (Kourentzes and Crone, 2010a). Over the years, NN models have attracted vast amount of attention in the Time Series and Forecasting (TSF) community due to its superior performance in classification and regression problems in machine learning (Yan, 2012). In contrast to statistics-based forecasting techniques the neural network approaches have several unique characteristics such as: 1) being non-parametric, hence, not having the need to rely on any underlying model, 2) being both nonlinear and data driven and 3) being more applicable to complicated models due to its flexibility (Zhang et al., 1998). As a result, NN models have been acknowledged by many experts to be a rising technology in TSF (Yan, 2012).
Prior to this study, as per our knowledge only one paper has been published concerning the forecasting of the time varying index using NN, which was by Safitri et al. (2018), and it was found in their study that the NN model performed well in long term forecasting with low errors.
The rest of this paper will be structured as followed. Section 2 will provide information on the dataset used. Section 3 will give a brief overview on their LC model. Section 4 will discuss the fitting and forecasting methods for the time varying index. Section 5 will present information on the performance measures. In Section 6 numerical results will be presented and in section 7 the conclusion.
The acquired single age dataset will be divided into two, where one portion will be the "training set" spanning from 1991 to 2012 and the other portion will be the "testing set", spanning from 2013 to 2016.

Lee-Carter model
The basic assertion of the LC model is that, there is linear relationship between the explanatory variables age at time and the logarithm of the age-specific death rate for age at time , ln(m x,t ). Lee and Carter (1992) proposed model is as follows: where, , is the central death rates relative to age at time , represents the general shape of the of the mortality curve, is the time varying index at time , is the deviation from the age profile as varies for each age and , is the error term which is assumed to be homoscedastic. Due to the LC model being over parameterized, as a result of being invariant to the following transformations: The model is not able to be fitted using ordinary regression methods. However, it is able to be identified using a two-step procedure proposed by Lee and Carter (1992). In the first step the parameters and are evaluated by imposing the following constraint, which is normalizing to sum to unity, ∑ = 1 and to sum to nought, ∑ = 0, then can be acquired by averaging ( , ) over time, . Following with the use of SVD to estimate and . To do so, SVD is applied to the following matrix Z: In the second step, is refitted to the total observed death rates to give more weight to ages which have higher deaths. As a result, it compensates the effects of using the logarithm of rates in the LC model. Once is refitted we would have acquired the appropriate values to be forecasted.

Forecasting
Once the model has been fitted and estimates of the parameters are acquired. To forecast using the LC model, all that has to be done is to forecast . In this section a brief overview of the methods used is provided.

ARIMA
The time varying index is generally forecasted by employing a random walk with drift ARIMA model, which was originally used by Lee and Carter (1992). However, it is explicitly mentioned in Lee and Carter (1992) that different population may have to use different ARIMA models to fit the mortality index. On the contrary, studies done in Malaysia did not consider other ARIMA models. For this reason, in this study we adopt the methodology suggested by Chavhan and Shinde, (2016), where 9 different possible choices of ARIMA models are fitted to the mortality index and compared to acquire the best fitted model using two most commonly used model selection criterions, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) (Burnham and Anderson, 2004).

Neural network
To model and forecast the time varying index we will be using a feed forward multilayer perceptron (MLP) neural network. The MLP network is made up of three portions interconnected by connection weights as shown in Fig. 1. The first portion consist of a set of input nodes, which represents the input layer, this is where the inputs are supplied to the network, in general for a time series forecasting neural network the inputs used are either lagged observations or other predictor variable. The second portion is one or more layers of computational nodes known as hidden layers, this layers help the network learn complex task by extracting progressively more meaningful features from the inputs as they are passed on through each layer and the last portion consist of the output layer.
To compute a one step ahead forecast ̂+ 1| , equation 5 is used. The biases of each neuron which act as an intercept in a regression are denoted as 0 and 0 . While the network weights are denoted as = ( , ), where and are the output of the hidden layers. I denotes the number of input of the network and the activation function which can be in the form of a sigmoid, bipolar sigmoid or a hyperbolic tangent, is denoted by (. ). J represents the number of hidden nodes in the network (Kourentzes et al., 2014). To model and forecast the time varying index, we will be using an automated fitting and forecasting neural network algorithm by Kourentzes and Crone (2010b). This will be done by employing R package "nnfor" by Kourentzes (2017). For detailed methodology see Kourentzes and Crone (2010a;2010b) and Kourentzes et al. (2014).

Performance measures
After acquiring the best fitted forecasting models for , of both ARIMA and Neural Network. The time varying index can be now forecasted on to the testing set.
e T+h = y T+h − ŷ T+h|T To evaluate the forecast performance for both models, the "errors" will be acquired using equation 6. Which is the difference between the forecasted values of the training set ̂+ ℎ| and the testing set +ℎ . Once the errors are obtained, several different performance measures can be used to assess the forecast. However only three measures will be used in this study, which are the Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) as they provide sufficient information to evaluate the forecast accuracy.

Root mean square error (RMSE)
Root Mean Square Error measures how spread forecasted data is from the actual values.
√mean(e T+h ) 2 RMSE close to 0 shows that the forecasted data is closer and concentrated near the actual line indicating high performance forecast. Therefore, low RMSE values are more desired.

Mean absolute error (MAE)
MAE measures the average magnitude of the errors in a set of predictions, without considering their direction.

mean( |e T+h | )
It's the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight. As a result, a smaller MAE is preferred.

Mean absolute percentage error (MAPE)
Mean absolute percentage error is unit free measure of forecast error. Which measures the relative overall fit.
It measures the accuracy of forecast error in percentage. Hence, making it easier to interpret. A more desired MAPE when comparing forecast is the one closer to 0%.

Components of Lee-Carter model
After unabridging, 26 male and female abridged lifetable spanning from 1991 to 2016. LC model was fitted to both male and female data. Once the model was fitted, the SVD analysis showed that 86.7% and 92.6% variation was explained by fitted LC models for Malaysian male and female mortality data respectively. SVD analysis showed that 86.7% and 92.6% variation was explained by fitted LC models for Malaysian male and female mortality data respectively. Fig. 2a shows the plot for the average values of the central death rate over time . It can be observed that for both male and female has the similar general pattern, mortality tends to decrease at lower ages, following with an exponential increase as the age increases. Fig. 2b shows the plot for age-specific constant describing the relative speed of mortality changes across ages. It can be seen that male mortality is much more sensitive compared to female mortality, especially for ages less than 16, ages between 20-34, and higher ages more than 77. Furthermore it can be observed, for male are almost zero for age 36 to 45, this shows that during this ages mortality rate for male is not responsive to change in time. Fig. 2c shows the plot for time varying index , describing the general mortality for different times and captures the most important trends in death rates across all ages. In general, is supposed to be decreasing as mortality is a decreasing factor and this is seen for both male and female.

Evaluating goodness of fit
To evaluate how well the LC model fitted to both male and female mortality data. The actual values and the fitted values were plotted for the following age groups 0, 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, and 81-84+. Analyzing Fig. 3 and Fig. 4, both male and female log mortality rate fitted values follow closely to the actual values across all age groups and years. There are slight over estimation for age groups 1-10 and 11-20. However, it can be inferred that the LC model fits well to both Malaysian male and female mortality data.

Fitting ARIMA model
We now refit the entire dataset to the training set and obtain the time varying index for both male and female, and further use this estimates to acquire the best forecasting models using ARIMA and Neural Network. Table 1 shows all the nine considered models, together with their Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) to assess and obtain the best fit model for forecasting the .

ARIMA
According to the values in Table 1, it was found that, for male ARIMA(0,1,0) with drift and for female ARIMA(0,2,0) without drift are the best fitted models for the . This results, together with the results of studies done on the Peruvian population and Indian population by Cerda-Hernández and Sikov (2018) and Chavhan and Shinde (2016) respectively, confirm the importance of finding an appropriate model for modelling the motility index before forecasting.

Neural network
After employing the "nnfor" package to acquire the best fitted MLP network. Fig. 5 and Fig. 6 were produced, which show the best fitted NN structures for the training set data of both male and female respectively.
It was found that a 1 input, 20 hidden node and 1 univariate lag combined forecast using a median operator was the best neural network fitted for male . While for female , the best fit model had 3 input, 20 hidden node and 1, 3, 4 univariate lags combined forecast using a median operator was the best neural network fitted for male . Both fitted models were acquired at a MSE's of 2.7556 and 2.8088 for male and female respectively.

Evaluating forecasting model accuracy
Now that the best fit forecasting models have been acquired. The time varying index can now be forecasted onto the testing set and errors between the forecast mortality rates and actual mortality values may be evaluated. Table 2 show the three error measures used to evaluate the forecast between the two models. Looking across the years for both male and female error tables it can be observed, as the time horizon increases the size of errors increases for both models. However, for the Neural Network model the size of the errors increase at a lower rate, showing better performance over a larger time horizon. Similar results were reported by Safitri et al. (2018) for the Indonesian population.
Furthermore, it can also be noted that the Neural Network model out performs the ARIMA method by consistently showing lower RMSE, MAE and MAPE values for both male and female over the years. As a result this confirms that the NN model is a better forecasting method for the time varying index compared to the conventional ARIMA method.

Conclusion
In this paper, LC model was fitted to Malaysian male and female mortality data, following with the use of two models to forecast the time varying index ( ). The two models used were the conventional ARIMA model, and a new approach which has been prevailing as a forecasting method known as Neural Network. Unlike previous studies done in Malaysian, to obtain the best fit ARIMA model to forecast , 9 different ARIMA models were compared and the best fitted model was obtained through evaluating the AIC and BIC of the 9 models. The best fit ARIMA model obtained was then used as a benchmark model, and its forecast performance was compared with the Neural Network models forecast using 3 performance measures; Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). The finding in this paper firstly showed that the LC model fitted well for both Malaysian male and female data with 86.7% and 92.6% variation being explained by both male and female fitted models respectively. This results were in line with Ngataman et al. (2016) who fitted Malaysian male and female mortality data to LC model from year 1981 to 2010.
Secondly, moving on to fitting the time varying index, for the ARIMA model, it was found that for Malaysian male data the time varying index fitted best to ARIMA (0,1,0) with drift, while for Malaysian female data the time varying index fitted best with ARIMA (0,2,0) without drift. Therefore, justifying the importance of testing ARIMA models before forecasting the time varying index as mentioned by Lee and Carter (1992). On the other hand, for the neural network model it was found that 1 input, 20 hidden nodes and 1 univariate lag combined forecast using a median operator was the best neural network fitted for male . While for female , the best fit model had 3 input, 20 hidden nodes and 1, 3, 4 univariate lags combined forecast using a median operator was the best neural network fitted for male .
Lastly, coming to the evaluation of forecast for both models, our finding showed that overall the Neural Network model outperforms the conventional ARIMA model in forecasting mortality rates. The Neural Network model consistently showed lower errors against actual values and the size of errors increased at a lower rate as time horizon increases, compared to the forecast of the ARIMA model.