The performance comparison of two-step robust weighted least squares (TSRWLS) with different robust’s weight functions

Article history: Received 20 January 2017 Received in revised form 9 March 2017 Accepted 8 April 2017 The purpose of this paper is to compare the performance of Two-Step Robust Weighted Least Squares (TSRWLS) using three different Robust’s Weight Function namely Huber, Bisquare and Hampel. Previously, the procedure of TSRWLS only used Huber’s weight function as the second weight and this study serves to compare the performance of TSRWLS when the three different weight functions are used. The performance was evaluated based on real data and Monte-Carlo simulation study and the findings suggests that the performance of TSRWLS by using Huber, Bisquare and Hampel as the second weight is relatively close to one another with a fairly close standard error and almost identical values of biasness and root mean square error. Based on the result in the numerical example and simulation study, this study concluded that the performances of TSRWLS using all three weight functions performed equally. It is therefore suggested that any one of the three robust’s weight function can be used as the second weight in performing TSRWLS. However, the use of Huber’s weight function as the second weight in TSRWLS is recommended because of the simplicity of the function when compared against the other two weight functions.

The purpose of this paper is to compare the performance of Two-Step Robust Weighted Least Squares (TSRWLS) using three different Robust's Weight Function namely Huber, Bisquare and Hampel. Previously, the procedure of TSRWLS only used Huber's weight function as the second weight and this study serves to compare the performance of TSRWLS when the three different weight functions are used. The performance was evaluated based on real data and Monte-Carlo simulation study and the findings suggests that the performance of TSRWLS by using Huber, Bisquare and Hampel as the second weight is relatively close to one another with a fairly close standard error and almost identical values of biasness and root mean square error. Based on the result in the numerical example and simulation study, this study concluded that the performances of TSRWLS using all three weight functions performed equally. It is therefore suggested that any one of the three robust's weight function can be used as the second weight in performing TSRWLS. However, the use of Huber's weight function as the second weight in TSRWLS is recommended because of the simplicity of the function when compared against the other two weight functions.

Keywords:
Heteroscedasticity Outlier Two-step robust weighted least squares Robust's weight function

Introduction
*In regression analysis, the assumptions and outliers must be considered in order to ensure that the result or estimated regression model is correct. Violated assumptions and presence of outliers will lead to the estimated regression model to be imprecise. One of the violated assumptions that are commonly faced by the researcher in conducting linear regression analysis is heteroscedastics error. The heteroscedastic error and outlier are two problems that will affect the performance of Ordinary Least Square (OLS) in estimating the regression linear model. As an alternative, the Two-Step Robust Weighted Least Squares (TSRWLS) method was proposed to remedy this problem. It has been proved that this method is not affected by heteroscedastic error and outlier simultaneously (Habshah et al., 2013). Previously, however, the procedure of TSRWLS only used Huber's weight function as the second weight to perform this  (Bellio and Ventura, 2005). Therefore, this study was performed to investigate the performance of TSRWLS using three different robust's weight functions (Huber, Bisquare and Hampel). The performance will be evaluated based on the error measures such as the standard error, biasness and the root mean square error.
The heteroscedasticity refers to the situation when the variance of the error terms is not constant. It has been proved that when the homoscedasticity assumption is violated, the OLS is no longer at its optimum. The OLS estimator remains unbiased, but becomes inefficient, leading to the estimates of the standard errors to be inconsistent. The statistical hypothesis tests such as the t-test, F-test, and Waldtest are then rendered invalid (Schmidheiny, 2012). Therefore, the weighted least square (WLS) based on the variance function was proposed as an alternative (Kutner et al., 2008). By using this method, the estimated parameters in linear regression model will be unbiased and efficient (Sosa-Escudero, 2009). However, due to the presence of both heteroscedasticity and outliers in the data, the WLS is no longer appropriate because the WLS estimators are affected by the outlier (Habshah et al., 2009).
An outlier refers to a value that is extremely large or small compared to the other observations. Outliers can create great difficulty and in least square method for example, a fitted line may be pulled disproportionately toward an outlying observation because the sum of the squared deviations is minimized (Kutner et al., 2008). This could cause a misleading regression model. Therefore, Robust Weighted Least Square (RWLS) was put forward to remedy the effect of outliers and heteroscedastic errors simultaneously (Habshah et al., 2009). However, RWLS method can only be used for single linear regression.
Because of the limitation, another method which is called as the Two-Step Robust Weighted Least Squares (TSRWLS) was proposed (Habshah et al., 2013). This method can be used to estimate the multiple linear regression models. Besides, TSRWLS is not affected by heteroscedasticity and outliers compared to OLS and WLS methods.

Methodology
The procedure of the Two-Step Robust Weighted Least Square (TSRWLS) is initiated by computing the regression function based on LTS estimator and obtaining the fitted values. The next step is obtaining the residual = −̂ and regressing the absolute residual on the fitted values. From the standard deviation function, the fitted values of is obtained. The estimated first weighted (w1) is then acquired through the inverse of squared standard deviation function as in Eq. 1.
The second weight (w2) from the robust's weight function can now be attained. The three robust's weight functions which are Huber, Bisquare and Hampel can be referred to in Table 1. The final weighted W is now computed as in Eq. 2.
Next, the estimate parameters are computed as depicted in Eq. 3.
To evaluate the performance of TSRWLS with three different robust's weight functions, the analysis section has been divided into two parts which are the numerical example and the simulation study. The performance will also be tested in three different conditions of data, which include heteroscedastic error, heteroscedastic error with a single outlier and heteroscedastic error with several outliers.
In the numerical example, the data is taken from Chatterjee and Price (1977). The dataset have 50 observations where education expenditure is the response variable with three independent variables comprising of income, resident under 18 and resident in urban area. The performance of this method will be evaluated based on the value of standard error.
In this simulation study, = 0.4 will be employed. To generate a certain percentage of the outlier, ~(0,1) + ℎ (0,10) will be included. The percentage of outlier may vary. Based on this regression model, data in two sample sizes will be generated. 30 and 100 observations respectively for 1000 trials will be obtained to get the summary statistics such as bias, the mean square error and the root mean square error in order to evaluate the overall performance. The summary statistics is summarized in Table 2.
This simulation study will be carried out using Rprogramming language.

Results and discussion
The performance of TSRWLS using three different robust's weight functions will be discussed based on the numerical example and simulation study.

Numerical example
The data taken from Chatterjee and Price (1977) has a heteroscedastic error with a single outlier. The heteroscedastic error was examined by using the residual plot, while the outlier was identified by using LTS method. In this part, the performance of TSRWLS using three different robust's weight functions was examined in two different conditions of data which contain heteroscedastic error and heteroscedastic error with a single outlier.

Heteroscedastic error
To test the performance of TSRWLS using three different robust's weight functions in heteroscedastic error conditions, one observation which is observation 49 from the data was excluded. This is due to the observation being detected as an outlier by using the LTS method. Table 3 shows the estimated coefficients and standard error for data with heteroscedastic error. Based on the result, the performance of TSRWLS using three different robust's weight functions are not too different since the values of the respective standard errors are fairly close to one another. The estimated coefficient values for 1 , 2 and 3 were also close to one another for each robust weight function.

Heteroscedastic error with single outlier
In this section, the performance of TSRWLS using three robust's weight functions when the data has heteroscedastic error with a single outlier was examined. The result in Table 4 suggested that the performance of TSRWLS using three robust's weight functions performed equally since the estimated coefficient and standard error are relatively close. The Monte-Carlo simulation study was then performed to support this finding.

Simulation study
The Monte-Carlo simulation was employed to illustrate the performance of TSRWLS using three different robust's weight functions. The performance was measured by using the biasness measure and the root mean square error. The performance was also examined with two different sample sizes which are 30 and 100 observations respectively. Based on Table 5, the estimated coefficients for all robust's weight functions are fairly close to the actual value which is equal to one. These estimated values are consistent even when the percentage of outlier went up to 40% in the data. This result suggests that the performance of TSRWLS using three robust's weight functions are relatively close to one another. The result in Table 6 and Table 7 shows the value of biasness measure and the root mean square error in two different sample sizes. Based on the result, the values of bias and the root mean square error are not too different between robust's weight functions which are Huber, Bisquare and Hampel for both sample sizes. This indicates that the performance of TSRWLS with three robust's weight functions performed equally.

Conclusion
In the numerical example, the performance of TSRWLS using three different robust's weight functions are fairly close to one another since the value of the standard error for each estimated coefficient are not too different. This result has also been supported by the Monte-Carlo simulation study. In the simulation study, the values of the estimated coefficients by using Huber, Bisquare and Hampel as the second weight are fairly close to the actual value which is equal to one. The value of biasness measure and the root mean square error were also relatively close to one another. As a conclusion, the performance of TSRWLS using three different robust's weight functions which are Huber, Bisquare and Hampel performed equally since the value of the error measures (the standard error, biasness and the root mean square error) in the numerical example and the simulation study are relatively close to one another. Therefore, it suggested that any robust's weight function, either Huber, Hampel or Bisquare can be used as the second weight in the procedure of TSRWLS. However, the use of Huber's weight function as the second weight in the procedure of TSRWLS is recommended because the function is simpler than other two weight functions.