River water quality assessment using APCS-MLR and statistical process control in Johor River Basin, Malaysia

Article history: Received 8 May 2017 Received in revised form 9 July 2017 Accepted 15 July 2017 The objectives of this study are to determine the most significant parameters of Johor River Basin which contribute to river pollution loading and to discover the potential contamination of pollutants and perform the process capability of water quality. The environmetric techniques and statistical process control have been utilize in this study. PCA extracted eight principal components which explaining 77% of total variance. The APCS-MLR model has revealed NH3-N and PO4 as the main parameter which are main pollutants that give highest contribution towards the river. The control charts have been established for NH3-N and PO4 by using SPC to monitor the level of concentration in a timely manner. Thus, continuous monitoring in the area should be done for better improvement of river quality in the Johor River Basin.


Introduction
*River is the vital natural resource living things especially for the human being (as a substantial role in the economic, social, cultural, religious), the provision of water supply domestic human consumption, irrigation for agriculture, the accommodation of transport, industrial use and a livelihood to the people. The accomplishment in the economic growth and industrialization in Malaysia has convinced to environmental problems with ever increasing land, air and water pollution (Ho, 1996).
Anthropogenic influences as well as natural processes degrade surface waters and impair their use for drinking, industry, agriculture, recreation and other purposes (Carpenter et al., 1998). Concurrently, the process of industrialization in Malaysia has led to economic growth, but at the expense of the environment. Water pollution is a byproduct of industrialization of toxic and hazardous waste, which was generated by various industries and discharged into the aquatic environment (Abdullah, 1995). Changes in the species composition and the decline in the overall health of aquatic within the river basin have been occurred due to the deterioration of water quality (Durell et al., 2004).
Environmetric is an advanced multivariate analysis that is rooted for the assessment of the environmental database (Juahir et al., 2010a;Nasir et al., 2011;Zali et al., 2011;Dominick et al., 2012). Apart from that, environmetric is also exhibited as a division of environmental analytical chemistry that requires multivariate statistical modeling and data treatment known as chemometric analysis (Simeonov et al., 2002;Brodnjak-Vonˇcina et al., 2002;Simeonov et al., 2004;Felipe-Sotelo et al., 2007;Kowalkowski et al., 2006;Pere'-Trepat et al., 2006;Osman et al., 2012;Saim et al., 2009;Gazzaz et al., 2012;Retnam et al., 2013). This quantitative technique is suitable for all aspects of the social and natural environment, including forecasting, mathematical modeling, data analysis and statistics (Juahir et al., 2010b;Nasir et al., 2011). Principal Component Analysis (PCA) is one of the most utilized tools in the environmetric (Shrestha and Kazam, 2007;Krishna et al., 2009;Juahir et al., 2011;Nasir et al., 2011;Dominick et al., 2012). According to Shrestha and Kazama (2007), data obtained were subjected to different multivariate statistical approaches: (i) to define geogenic and anthropogenic origin, (ii) to identify possible nonpoint sources of contamination and (iii) to estimate the contributions of possible sources on concentration of determining parameters (Krishna et al., 2009). In fact PCA can identify several pollution factors reasonably, but the interpretation of these factors in terms of actual controlling sources and processes are highly subjective (Liu et al., 2003).
Multiple Linear Regression (MLR) is a multivariate statistical technique that was often applied in a particular study in order to predict relationships between input and output variables without detailing the causes of these relationships (Dominick et al., 2012). On the contrary, MLR has also been carried out in order to measure the relationship between the independent and dependent variable (Guille´n-Casla et al., 2011;Dominick et al., 2012). Consequently, it is recommended to define the percent contribution of different sources for the application of MLR to the PCA scores (Wu et al., 2009). Apart from that, a study conducted by Wang et al. (2009) showed that an analysed data set by using PCA followed by MLR was prominent in order to provide understanding into the precision and quantification of source apportion. The combination of PCA and MLR is known as the Absolute Principle Component Score-Multiple Linear Regression (APCS-MLR) model. This technique was applied in order to determine the contribution of each possible sources defined by PCA (Zhou et al., 2007;Nasir et al., 2011;Su et al., 2011a;2011b). Hence, it is proven that the utilization of APCS-MLR are capable in the identification of the possible source contributions in each physicochemical (Simeonova et al., 2003). Furthermore, APCS-MLR was applied to calculate the source of contributions after determining the number and characteristics of possible sources by PCA (Zhou et al., 2007).
Statistical Processing Control (SPC) has been used widely in the monitoring, manufacturing processes and service operations (Woodall et al., 2000). Madu (1996) also explained the feasibility of control charts in environmental monitoring. The justification of Maurer et al. (1999) on the empirical analysis of SPC in sediment pollutant analysis using control charts rule supported the statement in Madu (1996). The statement by Maurer et al. (1999) stated that although SPC was primarily developed for the industrial purposes, this methodology can also be used in environmental discipline. The limitation of using SPC in environmental monitoring is depending on the higher sampling frequency in an attempt to identify the developing trends, moreover the use of this tool also requires increased sampling frequency in order to reflect the system or scale that are being monitored (Maurer et al., 1999). Corbett and Pan (2002) encouraged the practitioners to link the theories of SPC in the environmental data to evaluate the environmental performances. Most studies in SPC have been carried out in a small number of areas especially in environmental discipline. This is because the techniques in modern statistical process control charts and related methods are still remaining unknown to most of the environmental personnel (Corbett and Pan, 2002). In fact, the application of control charts and process capability provide a promising tool in the environmental realm, on the word of Corbett and Pan (2002). Therefore, SPC techniques are important to identify the risk of the environment in the future. Researchers studied on the SPC techniques in environmental data may find that the limitations in SPC are due to insufficient data for control charts to decide (Maurer et al., 1999). This study, however, attempts to provide a profound understanding of the capabilities of each tool in SPC. Despite that, this study also aims to provide an insight on the control charts and process capabilities in order to capture a better analysis and interpretations out of the outcomes particularly concerning the environmental realm. As stated by Besterfield (2009), the SPC has comprised with many tools such as Pareto Diagram, cause and effect diagram, check sheet, process flow diagram, scatter diagram, histograms and control charts. In addition, the control charts are implemented for process stability; where the process is stable if there is no out-of control point and the process is not stable when there is out of control points (Woodall et al., 2000). Theoretically, the phase 1 of the applications in the control charts usually considered any-out of the control points of the chart and in phase 2, the probability of a signal on any one sample is occasionally used if the successive statistics plotted are independent (Woodall et al., 2000). The error of false positive and false negatives is balanced by SPC meanwhile data from a base period is used to construct the control limits. Consistent with Madu (1996), any process involved has its variation and the two causes of variation is known as chance (natural) and assignable cause of variation. The control charts are aiming to identify and eliminate special cause so that only random causes of variation are retained in the system to ensure the stable process (Madu, 1996).
Malaysian rivers can be classified as Class IIB/III Rivers (Abdullah, 1995;Juahir et al., 2010a;Al-Mamun and Zainuddin, 2013). The secondary data of Johor River Basin were acquired from the Department of Environment Malaysia (DOE). The data were selected for evaluation for the year of 2003 to 2007. A total of 30 water quality parameters notably as the dissolve oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand, (COD), suspended solid (SS), pH, ammoniacal nitrogen (NH3-N), dissolved solid (DS), total solid (TS), nitrate (NO3), chloride (Cl), phosphate (PO4), Escherichia Coli (E.Coli), coliform and also various types of heavy metal parameters.. The objectives of this study are to determine the most significant parameters of each river basin which contribute to river pollution loading and to discover the potential contamination of pollutants and perform the process capability of water quality. All the data are analysed using the following software; the software is packaged XLSTAT 2012 and SPC XL software.

Study area
Johor River is also known as one of the largest catchment area in the southern part of Peninsular Malaysia with a total catchment area of 2751.72 km 2 (Dorofki et al., 2012). The river flows towards the north of the basin from the Bukit Gemuruh (at an altitude of 109m) and Gunung Belumut (at an altitude of 1,010m) (Kia et al., 2012). The Johor River originates from its source of Layang-Layang River and Sayong River in the upstream area before merger into the Johor River and flows down towards the southeast of the Johor Straits estuarine. The major tributaries that are located at the downstream of the Johor River are the Tiram and Lebam River (DID, 2000). In fact, the main tributaries that diverged the Johor River are Sayong, Linggiu, Tiram and Lebam Rivers as shown in Fig. 1 and Table 1. The water temperature of the river is ranging from 21°C to 32°C. Based on Kia et al. (2012), Johor River is rich with its magnificent beauty of the natural forest and swamps that covers almost the major proportion of the landuse. Whereas, the southern part of the basin is mostly covered by the oil palm and rubber plantations. The main cities that are located adjacent to the Johor River basin are the Kota Tinggi with a total human population of 220000 people. The total catchment area of the Johor River at Kota Tinggi is approximately 1620 km 2 and the major land use in the Johor River basin are mainly conquered by oil palm plantations, other types of crop cultivations, urbanization, water body and swamps (Kia et al., 2012).

Pre-processing data
Preliminary work was undertaken in the data matrix that included assembly and data transformation. The data below the detection limit were substituted with values equal to half the detection limit. Normal distribution tests were carried out with the support of the W (Shapiro-Wilk) test; the agreement of the distribution of the physico-chemical parameters of water with normal distribution was tested (Sojka et al., 2008;Samsudin et al., 2011). Standardization was applied to upturn the influence of variables whose variance is small and conversely. Log scaling is very common in environmental data since some of the variables might show very low or very high values.

Principal component analysis (PCA)
Principal Component Analysis (PCA) used on the normalized data set to observe in contrast of the compositional pattern among the analyzed water quality parameters (variables) and to recognize the factors that influence each of the parameter (Dominick et al., 2012). The new variable which is knows as Principal Components (PCs) are the linear combinations of the original set of variables (Sousa et al., 2007;Dominick et al., 2012). The PCs can be expressed by Eq. 1 (Dominick et al., 2012): where y is a component score, b is the component loading, x is the measured value of the variable, i is the component number, j is the sample number, and m is the total number of variables. The covariance matrix was diagnosed and Eigenvalues are produced which is known as a characteristic root (Vega et al., 1998). This analysis is based on eigenvalue criteria by which a value >1 is deliberate significant, and a new group of variables was produced based on the similarity of the entire data set (Osman et al., 2012). Factor loading gives the correlation between the original variables and the VFs, while the individual transformed observations are called factor scores (Vega et al., 1998). The VF coefficients having a correlation 0.49-0.30 are considered 'weak' significant factor loadings, correlations in the range of 0.74-0.50 are considered 'moderate' and those in the range of >0.75 are considered 'strong' (Liu et al., 2003;Retnam et al., 2013).

Statistical process control
Primarily, the x and R chart is one type of a control chart that have a subgroup size of more than two groups that is coupled together in the SPC (Maurer et al., 1999;Douglas, 2009;Besterfield, 2009). Both charts are usually computed in order to determine either the process is stable and predictable. According to Douglas (2009), the control chart often display the average changes over time while the R-chart present the range of subgroups changes over time. The x and R charts are used throughout any processes having a subgroup size greater than one where the size may fall between two and ten.
Supposedly, a quality characteristic is normally distributed with mean (µ) and standard deviation (σ), where both the values of µ and σ are known. x is normally distributed with mean µ and standard deviation σx = σ √n .
Besides, the probability is 1 -α that any sample mean may fall between Eq. 2 (Douglas, 2009 (2) Therefore, this equation could be used as an upper and lower limit. The x represent as the average of each sample, while µ is represented as the best estimator and the grand average x ̿ is the process average in Eq. 3 (Douglas, 2009): where x an average of the sample and m is the number of subgroups. Meanwhile, x ̿ is represented as the center line on the x chart. Principally, the control chart is only accountable as completed when there is an upper control limit, central line and lower control limit which facilitate in the determination whether the process is stable or not (Besterfield, 2009). Generally, the formulas for constructing the control limits on the x chart are described in Eq. 4 (Douglas, 2009) as follows: Nevertheless, the range method is often used in constructing the control limit chart where the process variability is monitored by plotting the values of the sample range R on a control chart.
The range of the sample is the difference between the largest and smallest observations in Eq. 5 (Douglas, 2009): where the R1, R2,…., Rm is the range of the m samples. The average range is computed using the following expression in Eq. 6 (Douglas, 2009): whereas, the centerline and control limits of the R chart are given as follows in Eq. 7 (Douglas, 2009): The constants D3 and D4 are tabulated for various sample sizes in the table of factors in the construction of variables control charts. In fact, the application of x and Rconsist of 3 phases; trial control limits, revise trial control limits and new trial control limits (generally are tighter than the first trial control limits). The trial central line is established for the x and R charts by using the following Eq. 8 (Douglas, 2009): where x ̿ is the average of the subgroup averages, xk is the average of the kth subgroup is the number of subgroups. R ̅ is the average of the subgroup ranges while R ̅ k is the range of the kth subgroup. The trial control limits for the charts are constructed on ±3σ (standard deviations) from the central value. This is presented based on the following Eq. 9 and Eq. 10 (Douglas, 2009): where UCL = upper control limit, LCL = Lower control limit and σx ̿ ispopulation standard deviation of the subgroup averages (Douglas, 2009;Besterfield, 2009). According to Besterfield (2009), in SPC, the most crucial phase is the establishment of the revised central line and control limits. The standard values for the central lines are implemented in order to unleash the best estimate for the available data. x ̿ and R ̅ are considered as a representative of the process and positioned as the standard values for x₀ and R₀ once the analysis indicates as 'good controls' on the preliminary data (Douglas, 2009;Besterfield, 2009). No out-of control points on both sides of the central line are depicted and no unusual patterns of variation are categorized in better control process. At this stage, only the out-of-control points are analysed for the determination of the process stability (Douglas, 2009;Besterfield, 2009). In the process, if there is an assignable cause the out-ofcontrol points can be discarded. The out-of -control state are most commonly due to the chance caused that may occur as a part of natural variation. Therefore, the data may still remain in the system.
As stated by Douglas (2009), when the data is discarded, the 3 rd phase as the new x ̿ and R ̅ is calculated by the simplified calculations that is displayed as follows Eq. 11 and Eq. 12; where xb represent as the discarded subgroup averages, sb is the number of discarded subgroups and Rb represents the discarded subgroup range. The initial 25 subgroups are not plotted with a revised control limit as it is used to report the results for future subgroups (Douglas, 2009;Besterfield, 2009).

Capability index
The process capability analysis is generally an approach that is used in order to assist decision makers in making decision either the process is proficient of complying to the existing environmental legislation or benchmark that have been set for a sufficiently large proportion of time (Corbett and Pan, 2002). The capability index is also known as a measure of the stable and predictable, which has been showcased in the control charts. On the other hand, it is also referred as a measure of the process capability which is termed as the capability ratio that are symbolized by the Process Capability (Cp) which is a necessary complement to a variables control chart. Process Capability Index (Cpk) is used in the capability index in order to measure the centre of the target or nominal value where a minimum value that is normally recommended for Cpk is 1.00 in the control chart (Douglas, 2009;Besterfield, 2009). However, when the Cp value is 1.33 or reach a greater amount, the operating personnel are responsible to maintain the process centered, stable and predictable (Douglas, 2009). The process capability and tolerance are combined to form a capability index as defined in the following Eq. 13: where Cp is the capability index, USL -LSL represents the upper specification limit substitute by the lower specification limit or tolerance and 6σ 0 refers to the process capability. Nevertheless, when the capability index is 1.00, it is considered most commonly categorized as Case (II) situation and if the ratio is greater than 1.00, it will be referred as Case (I) situation which is desirable and if the ratio is less than 1.00, it will be considered as Case (III) situation which is known as undesirable (Douglas, 2009;Besterfield, 2009). Consistent with Besterfield (2009), this case situation will differentiate the output of the control processes to the specification limits. In spite of that, the process spreads pertains as the process capability and equal to 6σ meanwhile tolerance is the difference between specifications. Unfortunately, undesirable result may be obtained accordingly whenever the tolerance is established without regard to the process. There are three possible situations that may occur and this explains the situation case (I) when the process capability is less than the tolerance, (II) when the process capability is equal to the tolerance and (III) when the process capability is greater than the tolerance (Douglas, 2009).  (Table 2) elucidated eight PCs with eigenvalues greater than one explaining 77% of the total variance in the water-quality data set. The first factor (VF1) indicates 30.3% of the total variance with strong positive loadings of COND., SAL, DS, TS, Cl, As, K, Mg and Na. These variables are derived from the mineral component that is available in the river. This finding is consistent with Vega et al. (1998) which stated that this group of variables is the common origins of minerals that are more likely induce from the dissolution of limestone and gypsum. Moreover, the seasonal factor such as rain may also affect to the soil infiltration process and transported the pollutants into the river through the surface runoff. On the contrary, VF2 stipulated 11.9% of the total variance with the strong positive loading of BOD, COD, SS and turbidity. This VF represents as the anthropogenic activity that is incorporated from the industrial, domestic and commercial areas. The association among these variable (BOD, COD, suspended solid and turbidity) are corresponded due to the discharge of organic source and sewer pipes effluents which contain pollutants with high concentration of bacteria (Mohd et al., 2011). VF3 accounted for 9.91% of the total variance with strong positive loadings of NH3-N and PO4. This variable is suggested to come from agricultural runoff pollution sources. The presence of NH3-N and PO4 in the water system might be resulted by the runoff and soluble fertilizer that is used in the oil palm plantation industry. This is consistent with Singh et al. (2005) which stated that the usage of nitrogenous fertilizer increases the level of NH3-N and PO4. In fact, nitrogenous fertilizer is widely used in crop cultivations in order to boost the plants growth by providing enough nutrients for the plants to perform photosynthesis. Unfortunately, if the plants are over dose with nitrogenous fertilizer, it may stimulate to the transportation of alkaline and nitrates that are contained in the fertilizer into the overlying water. Hence, the people that consume the water may prone to sickness such as hypoxia. The fourth factor (VF4) signifies for 6.52% of the total variance with the strong positive loading of E-coli and coliform. This factor represents as the association of variables in the fecal waste. This is because the presence of Ecoli and Coliform often incorporated as a strong indication of sewage discharge into the surface water. This finding is also confirmed by Mohd et al. (2011) which suggested that the E-Coli and Coliform is suspected to be originated from the animal faeces, surface runoffs and discharge from the sewage and wastewater treatment plants. Whereas the fifth factor (VF5) explains 4.24% of the total variance with the strong positive loading of Zn. Initially, Zn is an element that is commonly utilized in the manufacturing industry that includes transportation, construction, machinery and electricity. Moreover, Zn is also used as an activator in the rubber and paint industries. In fact, the white pigment subsists in the water colours or paints are resulted by the touch of zinc oxide in the colour mixtures. In spite of that, VF6 accounts for 3.67% of the total variance signifying Cd as the strong positive loading. In general, the generation of Cd in the environment is largely influenced by the manufacturing industries (OSHA). It mainly produces Cd for pigments, coating and plating and as a stabilizer for plastic. In fact, Cd also consists in the manufacturing industries as an inevitable byproduct of Zn, Pb and Cu extraction (Lenntech, 2014). VF7 exhibit 3.7% of the total variance with strong positive loading of Fe. Fe is known as one of the most abundant elements in the earth's crust. Nonetheless, Fe is also found as a source metal in the steel and alloy production (Lenntech, 2014). In general, Fe is usually corresponded to the industrial effluents due to its capability to exist in four distinct crystalline forms as described in the periodic table (Juahir et al., 2010b;Nasir et al., 2011;Lenntech, 2014). Hence, these three factors (VF5, VF6 and VF7) indicate as a coherent group of metals association which suggests that these factors are a symbol of anthropogenic activities. The eighth factor (VF8) signifies NO3 as the strong positive loading which exhibit 4.6% of the total variance. Primarily, the sources of nitrogen in water were contributed by various types of pollution sources which this include animal wastes (livestock, birds, mammals and fish), feedlot discharges, municipal and industrial wastewater, fertilized field, lawn runoff, septic tanks and vehicle exhausts. In the environment, NO3 often provides nutrients for the plants in streams, rivers, and reservoirs. In fact, the nitrate levels in the river water often fluctuate due to the seasonal factor and higher nitrate concentration are usually observed during the rainfall season (Ismail, 2011). Therefore, this clearly indicates that the generation of NO3 is resulted by the surface runoff that transported land surface particulates into the overlying water. The summarization of all the possible pollution sources in the Johor River Basin is described in Table 3.

APCS-MLR
The results revealed that the APCS-MLR model for Johor River exhibited R 2 equivalent to 0.780 indicating a good fit between the measured and predicted concentration as this is described in Table  4. The coefficient of determination (R 2 ) value clearly indicate that there is a strong correlation between the goodness of the receptor modeling approach (APCS-MLR) and the source apportionment of the water variables. Moreover, the APCS-MLR has managed to distinguish VF3 as the highest percentage of contributory with 71.68% of the possible pollution (Table 5). This clearly indicates that NH3-N and PO4 that are influenced by the agricultural runoff are depicted as the major pollution sources in the Johor River Basin. Fundamentally, there are numerous agricultural and livestock activities that have been carried out in the development areas adjacent to the Johor River Basin. The point source and non-point sources of the pollutants include water runoff from the cropland, lawns, gardening and confined livestock that may be the reason of the high contribution of NH3-N and PO4 in the river.
In fact, the Federal Land Development authorities are also situated close to the Johor River. This is to ensure that the land development authorities are engaged directly with the concern related to the river.

̅ and R chart for NH 3 -N and PO 4
The trial control limit for the ̅ x and R chart of the Johor River Basin are shown in Fig. 2. The outcome illustrates that the mean concentration of NH3-N exceeded the UCL. There are four points in the NH3-N mean observations that exceed the control limits of the ̅ xchart. The four points are points 1, 2, 7 and 8 representing 16% out of all the observations. Meanwhile, the R chart denotes the 1, 2, 8, 10 and 11 as the points that positioned outside the upper control limit indicating 20% of the NH3-N mean concentration. This concludes that the process is not stable for the base period.  Fig. 2, the revised control chart is constructed to determine the stability of the process. The entire out of control points is discarded in each chart in order to ensure that the process is stable. This is supported by Corbett and Pan, (2002), which stated that the out of control point is discarded in order to improve the normality so that the data will follow the normal distribution. Hence, the x ̅ chart shows UCL (0.23055), LCL (-0.05025) and CEN (0.09015) in Fig. 3. Whereas, the R chart depicted the value of UCL is 0.51441, LCL is 0.0 and CEN is 0.24333. The remaining plotted points in Fig. 3 indicate as a stable process whereby it shows that all of the points (observation) are within the control Concentration (mg/l) Concentration (mg/l) limits. Thus, it can be applied as the example of the whole operation to create the future prediction and measure the risk of contamination. Subsequently, there are two mean observations that are utilized from the other NH3-N data subgroup in order to determine either the process is stable or not. The results depicted that when the two observations are added in the process, all the points are positioned within the control limits (Fig. 4). Although there is a variation within the control limits, it is still considered as a natural variation of the process. Hence, this signifies that the mean concentration of NH3-N in Johor River is in a control process. The control charts were then applied to PO4 in order to verify whether the process is stable or not in the trial control limit. In the x ̅chart (Fig. 5a) point 1, 2, and 3 (mean observations) exceeds the control limit suggesting to a possible assignable cause. This finding indicates that 12% of the mean concentrations positioned outside the upper control limit. Meanwhile, the R chart (Fig. 5b) denotes the same result where the point 1, 2, and 3 exceeded UCL is indicating 12% of the PO4 mean concentration. The result indicates that the process is not stable for the base period. The out of control point is considered as an assignable cause as it situated outside UCL area. The assignable cause is occurring when there is an undesirable variation which is the cause of the unexpected increase in the PO4 concentration. Hence, this may be the cause of the non-point source pollution. Afterward, the out of control points are removed to ensure that the process is stable as shown in Fig.  5. The new control limits charts were computed and the out of control point is discarded. The UCL and LCL are 0.09286 and -0.01437 respectively for the x ̅chart whereas the UCL and LCL are 0.19644 and 0.0 respectively for the R chart. The remaining plotted points in Fig. 6 indicate that the process is stable; in fact, it shows that all of the points (observation) are within the control limits. Therefore, it can be used as the representative of the whole process to make the future prediction and measure the risk of pollution. Primarily, the two mean observations from the other PO4 concentration is added to perform the monitoring period using the control limits that have been constructed in Fig. 5. Within this period (Fig. 7), the PO4 mean concentration data values of the other subgroup illustrate that there is no out of control points exceeding the control limits. Thus, this signifies the positive result verifying that the PO4 concentration is in a control process at the Johor River Basin. Data concentration of NH3-N from previous assessment in Fig. 7 found to be within the UCL and LCL and only natural variation occurs. This process is considered to be in statistical control or stable process. Therefore, the process performance can be predicted by the process capability analysis. The inherent variability of the process is compared with the specification limits in the process capability analysis so that the environmental performance potential can be detected under normal or in control condition (Carbett and Pan., 2002).

Capability index for NH 3 -N and PO 4
Based on Fig. 8, the capability index has been computed to measure the risk to the environment. The capability index Cp is used to measure the potential risk of NH3-N towards the water pollution. The Cp value is found less than 1 which is 0.2622. This shows that the potential risk of NH3-N concentration for unacceptable water pollution is higher. Therefore, the result of the analysis in process capability shows that the process is not suitable in the subsequent large period of time. This is therefore suggested that continuous monitoring should be done by DOE from time to time to ensure that the level of NH3-N concentration in Johor river basin complies with the specification limit that has been set up which is UCL (0.3mg/L) and LCL is (0.1mg/L). The capability index Cp for PO4 indicates less than 1.00 which is 0.2637 in Fig. 9. This will indicate that the potential risk of PO4 concentration for unacceptable water pollution is also higher. Thus, this result implies that the process is not suitable in the subsequent large period of time. More inspection is needed to control the PO4 concentration based on the specification limit that has been set up by the DOE which is USL (0.0075mg/L) and LSL (0.005mg/L). This specification limit is referred to the NWQS which has been set up by DOE.

Conclusion
From this study, the differences of the types of pollution sources and dissimilar of significant possible pollution sources can be approved by using environmetric technic. Both concentrations of NH3-N and PO4 have shown a high risk for unacceptable water level by using Process Capability Indices. NH3-N and PO4 are found as the main pollutants that give

Cpk Analysis
In spec Concentration (mg/l) Concentration (mg/l) a higher contribution towards the Johor river basin. According to the previous result, Johor River Basin has face the land alteration towards oil palm plantation and agricultural within the period of [2003][2004][2005][2006][2007]. Based on SPC, both concentrations of NH3-N and PO4 has shown the risk of unacceptable water pollution is higher. As a result, continuous monitoring in the area should be done for better improvement of river quality in the Johor River Basin. The important of analysis and modeling of water quality need to take seriously by the authority. From water quality data analysis and modelling, the authority can focus on the most significant parameters which contributed to the river pollution. It is save time and save money budget in water quality sampling and lab analysis of the redundant parameters. The parameters which showed significant towards the water quality at the river basins through the data analysis can be used as reference for the authority in determining which parameters have to monitor at the monitoring stations.