Stochastic generation of hourly rainfall series in the Western Region of Peninsular Malaysia

Article history: Received 17 August 2017 Received in revised form 20 October 2017 Accepted 18 November 2017 Comprehensive analysis and modeling of rainfall distribution is essential in capturing the characteristics of high intense rainfall. The western region of Peninsular Malaysia which is more urbanized and densely populated is prone to flash flood occurrences due to the high intense rainfall brought by a convective rainfall during the inter-monsoon season. Convective rain is usually short live and intense. Therefore, knowledge pertaining to the distribution of rainfall intensity at short time scale is crucial in planning and decision making prior to, during and after a flood event, thereby minimizing the potentially catastrophic impact of flooding. The selection of appropriate probability distribution to represent rainfall intensity is highly critical to get a better indication of seasonal contribution to the annual rainfall. This study aimed to determine the better distribution of rainfall intensity to represent extreme rainfall events in the western region using Advanced Weather Generator (AWE-GEN). Model development consists of using hourly rainfall data and other meteorological data from three stations located within the studied region. Two probability distributions incorporated in the AWE-GEN model, namely, Weibull and Gamma were fitted to the historical data. Numerical evaluation using Root Mean Square Error goodness-of-fit test was used to compare the performance of the distributions. Results showed that AWE-GEN model is capable of simulating the monthly rainfall series at the west coast region with Weibull being the better distribution representing intensity. It was found that high values in model parameters α, θ and η contribute to the higher intense rainfall within the studied region. The AWEGEN model also performs quite well in reproducing the hourly and 24 hour extremes rainfall as well as generating the extreme wet spell; however the model slightly underestimates the extreme dry spell. Results can be beneficial, particularly, for a better rainfall forecasting at watersheds and urban areas.


Introduction
*Precipitation is one of the most important meteorological variables for hydrological modeling. In cases long series of observed precipitation is not available; they can be stochastically generated by weather generators. Weather generators are traditionally used to stochastically generate long synthetic series of data, fill in missing data, and produce different realizations of the same data (Wilby et al., 1998). It employs random numbers and takes the observed time series of a station as input. Besides able to simulate many realizations which provide a wider range of feasible situations (Ababaei et al., 2010), it can also provide the means to extend the simulation of weather to locations where observed weather data is not available. This can be achieved by interpolating the parameters of a weather generator between sites using an interpolation technique such as kriging or thin-plate smoothing splines (Semenov et al., 1998). Weather generators are basically based on first-order Markov-Chain associated with transition probabilities for simulating precipitation occurrence and a gamma distribution for the precipitation amounts (Fowler et al., 2007). It is also known as stochastic rainfall models in which their parameters have its own probability distributions. The parameters of the stochastic models are estimated from statistical analysis of time series and can be changed in accordance with climate model simulation results. These models are capable of capturing the storm structure at the hourly time scale and downscaled to finer scales. The models are based on delta-change methods which are identification of properties or variables that are assumed to be scale invariant from the regional climate model scale to the urban catchment scale (Willems et al., 2012). There are many types of weather generators have been used since the 19th century with most of the studies used daily scale of meteorological data as inputs (Dubrovský, 1997;Schnur and Lettenmaier, 1998;Wilby et al., 1998;Wilks and Wilby, 1999;Brissette et al., 2007;Kim et al., 2007;Mareuil et al., 2007;Manning et al., 2009;Wilks, 2010;Kim et al., 2011;Min et al., 2011;Tseng et al., 2012;Khazaei et al., 2012;Chun et al., 2013;Kuchar et al., 2014;Parey et al., 2014).
Long Ashton Research Station-Weather Generator (LARS-WG) was able to simulate daily precipitation. Means of yearly maxima and return values of daily synthetic precipitation were within the 95% confidence intervals of observed data for the study region (Semenov, 2008). LARS-WG had also been used to simulate the site-specific daily weather data required by crop growth simulation models. This method allows changes to a wider set of climate parameters in the scenario (Barrow and Semenov, 1995). Other studies have proved that LARS-WG was able to capture the weather statistics including the extremes at most of the temperate regions such as Europe, USA, Canada, New Zealand and Australia (Semenov and Barrow, 1997;Hashmi et al., 2009;Hashmi et al., 2011). In dealing with extreme events, weather generators that incorporate Neyman Scott Rectangular Pulses (NSRP) model is the most comprehensive for hourly (finer) data (Sunyer et al., 2012). Besides LARS-WG, the multisite statistical downscaling model (MSDM) has also been proposed in downscaling daily precipitation series at multiple sites in a regional study area by utilizing General Circulation Model (GCM) precipitation outputs as inputs. The MSDM was proved to be able to reproduce the observed precipitation occurrence lag-1 autocorrelation, the standard deviation of the wet-day precipitation amounts, maximum 3-day precipitation total, and 90th percentile of the rain day amount. It was also accurately reproduced cross-site correlations of precipitation occurrence and as well as precipitation amount among multiple observation series in Quebec, Canada (Jeong et al., 2013). A recent study by Mehan et al. (2017) compared CLImate GENerator (CLIGEN), Long Ashton Research Station Weather Generator (LARS-WG), and Weather Generators (WeaGETS) models regarding their ability to capture the statistical properties of observed data. CLIGEN model was likely to overestimate values at the extremes, but both CLIGEN and LARS-WG are well performed in terms of capturing the statistical properties of observed precipitation and temperatures. On the other hand, an improvement is needed for WeaGETS model in order to get the better simulations of its parameters.
In Malaysia, the performances of Statistical Downscaling Model (SDSM) and LARS-WG have been compared in terms of generating possible future values of local meteorological variables. It was found that SDSM yields a better performance compared to LARS-WG, except SDSM is slightly underestimated for the wet spell lengths (Hassan et al., 2014). Advanced Weather Generator (AWE-GEN) model developed by Fatichi et al. (2011) has shown great skill in simulation and projections of extreme rainfall events for Peninsular Malaysia (Syafrina et al., 2015).
In AWE-GEN model, Gamma distribution is fitted to the intensity of rainfall. However, past studies have been conducted in Peninsular Malaysia. Several types of distributions have been tested for rainfall intensity and the results varied according to the models being used. For instance, Generalized Pareto has been found to be the best distribution of rainfall intensity in Peninsular Malaysia (Dan'azumi et al., 2010) to model the rainfall intensity. Another study found that Mixed Lognormal distribution was the best distribution model for most of the rain gauge stations in Peninsular Malaysia (Suhaila et al., 2011). However, studies by Abas et al. (2014) and Daud et al. (2016) using Neyman Scott methodology showed that Mixed Exponential was the best distribution to describe the intensity of rainfall in Peninsular Malaysia.
The surface climate of Peninsular Malaysia is influenced by the northeast monsoon season between November and February and by southwest monsoon season between May and August. The North East monsoon season is usually associated with heavier rainfall with the eastern and southern regions being the most affected areas. In between these two monsoons are the inter-monsoon seasons occurring in March-April (MA) and September-October (SO), which brings intense convective rainfall to the western part of Peninsular Malaysia. Rainfall intensity is defined as the ratio of the total amount of rain (rainfall depth) falling during a given period to the duration of the period. It is expressed mm per hour (mm/h), depth units per unit time. The statistical characteristics of high intensity, short duration, and convective rainfall are essentially independent of locations within a region. Western coast recorded higher value of extreme intensities and extreme cumulative indices during intermonsoon (March and April) season which resulted in an increase in flash flood occurrence during this period (Syafrina et al., 2015). Therefore, knowledge of the distribution of rainfall intensity is crucial in planning and decision making prior to, during and after a flood event -thereby minimizing the potentially catastrophic impact of flooding.
Accordingly, this study aims to develop to determine the better probability distribution of rainfall intensity that represent extreme rainfall events for stations located on the western coast region of Peninsular Malaysia. A stochastic rainfall model will be presented for the generation of hourly rainfall data at three selected rainfall stations. AWE-GEN model which integrates the Neyman-Scott process employs a reasonable number of parameters to represent the physical attributes of rainfall. With respect to rainfall intensity, this study proposes the use of a Weibull distribution. The performance of the proposed model will be compared to a model that employs the Gamma distribution. Historical hourly rainfall data of 31 years  is used as input to construct the models, and simulations of hourly series by both models are performed at an independent site. The performance of the models is assessed based on how closely the statistical characteristics of the simulated series resembled the statistics of the observed series. Root Mean Square Error (RMSE) value is then estimated for both sets of simulations at each rainfall station and compared. The Lowest value of RMSE indicates better distribution at a particular station.

Data
The studied region which is located on the western part of the peninsular is the most progressive and densely populated region in Malaysia. The region is subjected to many flash flood incidences which partly due to land use changes and progress. In this study, the AWE-GEN model is constructed based on 30 years of historical data . The input data required by AWE-GEN are hourly rainfall, hourly temperature, hourly relative humidity and hourly wind speed. Hourly rainfall data were sourced from the Malaysia Drainage and Irrigation Department (DID) while other meteorological data were sourced from Malaysian Meteorological Department (MMD). In this study, three rainfall stations represents the west coast were selected. Fig. 1 shows the location of the rainfall stations whereas Table 1 lists the selected stations used in this study.

Model development
In AWE-GEN model, the proposed Gamma distribution is fitted to the intensity of rainfall. In AWE-GEN, the intra-annual variability of rainfall is captured by the Neyman-Scott Rectangular Pulses (NSRP) model. Work by Abas et al. (2014) and Norzaida et al. (2016) indicated that the NSRP model is suitable to be used in Malaysia. The Gamma distribution that is associated in NSRP is as follows, where is the scale parameter ( > 0), is the shape parameter( > 0) and is the hourly rainfall amount. Gamma distribution will then be replaced by the Weibull distribution and will be fitted to rainfall intensity. The Weibull distribution is as follows, where and are the scale and shape parameters, respectively. Table 2 gives the definition of each rainfall parameter with Gamma representing rainfall intensity.  For validation model, the simulated hourly rainfall was divided into two non-overlapping periods of i) 1975 to 1989 and ii) 1990 to 2005. 1975 to 1989 was used as the reference period where the multiplicative factor is calculated based on the simulation output and the high resolution observational data. The changing factors were then used to correct the biases of the simulation output from 1990 to 2005. The corrected hourly rainfall is then compared to the observation from the identical period of 1985-1999. To compare the performance of both distribution, Root Mean Square Error where the total number of data is, is the ith actual rainfall amount and ̂ is the simulated rainfall amount; value is estimated for both sets of simulations. The lowest value of RMSE indicates the better distribution at a particular station. Next, each rainfall station will use the better distribution to simulate extremes rainfall as well as dry/wet spell lengths. Mean duration of the cell (h) Mean number of cell per storm [-] Shape parameter of the Gamma distribution of rainfall intensity [-] Scale parameter of the Gamma distribution of rainfall intensity (mm h -1 ) Table 3 shows the RMSE values for both distributions at each rainfall station. There is not much difference in the values between Gamma and Weibull, but overall, Weibull is the best fit for rainfall intensity. The simulated statistical properties of rainfall are compared with observations at the monthly scale as shown in Fig. 2. Overall, the statistical properties are well-preserved at the periods of aggregation of 1 hour. The mean and variance are well simulated. Despite an underestimation of the lag-1 autocorrelation and skewness, both statistics show a consistent pattern between observed and simulated time series at all stations. Also seen in the figure, it is quite challenging for the weather generator to simulate the remaining statistics, where the frequency of nonprecipitation is slightly overestimated while in contrast, the transition probability wet-wet is slightly underestimated at all stations.

Fig. 2:
A comparison between observed (red) and simulated (green) monthly statistics of rainfall (mean, variance, lag-1 autocorrelation, skewness, frequency of non-precipitation, transition probability wet-wet), for the aggregation period of 1 hour Table 4 shows the estimated rainfall parameters of the AWE-GEN model for every station. According to Arritt and Daniel (2014), rainfall intensity is pointed out by two parameters α and θ and the mean of rainfall intensity can be written as αθ. From the table, the highest mean of rainfall intensity for all stations is in April which corresponds to the intermonsoon period. There is a high chance of convective rainfall on the west coast during this season that may lead to high intense rainfall over a short interval of time. The parameter estimates for and indicates the estimated storm origin arrival rate and waiting time for cell origin after the storm origin, respectively.
There are no significant differences in and at all stations. Even though the parameter estimated for the mean number of cell per storm, , for station 3117070 recorded lowest value in April (≈ 1), the mean duration of cell, , has the highest value in April compared to other months. Similarly, for station 3516022, is higher in December followed by April with a mean of ≈4 and ≈3 rain cells per storm, respectively. However, recorded the highest value in April. Meanwhile for station 3118102, recorded highest value in April in spite of having lowest value of in April. This is also can be seen in Fig. 3 where is found higher in November while is found higher in April. A comparison between observed and simulated monthly rainfall for every station is shown in Fig. 4. The simulated process perfectly preserves the monthly mean and variance of observed rainfall. As shown in the figure, the highest mean rainfall received by all stations is during November. However, the mean rainfall amount starts to decline in December until February. This corresponds to the northeast monsoon season which usually begins in early November and ends in February. In contrast, the mean rainfall amount seems to increase in March and April. It is interesting to note that March and April correspond to the inter-monsoon season where the western region is at high risk of flash floods during this period which consistent with Syafrina et al. (2015). Another significant finding is that the mean rainfall received in May to August is lesser compared to the other months. This is also in line with findings in Syafrina et al. (2015) where the west coast region is quite dry during the southwest monsoon season (i.e., May to August). The simulated and observed hourly and 24 hour extremes rainfall are shown in Fig. 5. Both extremes are well simulated up to the 40-return period. Similarly, the extreme wet spell is well simulated for all stations. On the other hand, the extreme dry spell is slightly underestimated for all stations.

Conclusion
Overall, the AWE-GEN model is capable of simulating the monthly rainfall at the west coast region of Peninsular Malaysia. Results revealed that Weibull is the better distribution in representing the rainfall intensity compared to Gamma distribution over the west coast region. The results have shown that there are no significant differences in and at all stations. Higher values in , and contribute to higher intensity of rainfall while gives less contribution to the intensity of rainfall in the west coast region. In addition to that, the AWE-GEN model is also able to capture the extreme properties. This model is performing quite well in reproducing the hourly and 24 hour extremes rainfall for all stations as well as generating the extreme wet spell. However, the AWE-GEN model marginally underestimates the extreme dry spell. Results can be beneficial, particularly, for a better rainfall forecasting at watersheds and urban areas and managing storm water management systems.
Abas N, Daud ZM, and Yusof F (2014). A comparative study of mixed exponential and Weibull distributions in a stochastic model replicating a tropical rainfall process. Theoretical and Applied Climatology, 118 (3): 597-607.