Efficient effort estimation of web based projects using neuro-web

The effort estimation needs to be done at early stages for successful delivery of software. Numerous models have been developed to estimate software effort during the last decades, but effort estimation of a software project is still a challenging task and in the case of web based projects, it is even harder. The selection of programming language and use of different type of objects i.e. hyperlinks, graphics, and scripts etc. make the web effort estimation process really complex. An estimation model “WebMo”, proposed to estimate the effort of web based projects inspired by COCOMO. This research presents a non-algorithmic model named “Neuro-Web” based on Artificial Neural Networks (ANN). The proposed model will use the WebMo parameters as input. These parameters include web application size, productivity coefficients, and 9 different cost drivers. This proposed model is calibrated using the dataset of 164 real-life web applications developed by different freelancers and software houses. The “Neuro-Web” model is compared with the existing model “WebMo” and results reveal that Neuro-Web performs better than “WebMo”. The MMRE of the proposed method is just 9.92% as compared to 26.27% for WebMo.


Introduction
*Effort estimation of a project is a perplexed activity that needs to be done at early stage of the project development. It is the process of forecasting the expected development cost and time. Correct estimation is critically important as underestimation can results in low quality of the project and eventually leads to the project failure. On the other hand, the over-estimation of a project can be source of business loss (Hill, 2010).
The software effort estimation can be performed by algorithmic (Sharma, 2013) and non-algorithmic methods. The algorithmic methods include COCOMO model (Boehm, 1984), SLIM model (Putnam, 1978) and Function Point based model (Sheta et al., 2008). These methods use different type of parameters. The parameter values are provided to mathematical formulas to foresee software effort. The limitations of these algorithmic methods include its source of estimation (SRS document), inappropriate measurement of project size and difficulty in modeling the complex inherent relationships (Clark et al., 1998). Due to these deficiencies, the researchers focused on non-algorithmic methods based on Computational Intelligence i.e. Genetic Algorithms (Singh and Misra, 2012), Fuzzy Logic (Martin et al., 2005) and Artificial Neural Networks (ANN) (Santani et al., 2014). The ANN has the capability to learn from test data and produce output like a human brain. It can also model the complex relationships between dependent and independent variables effectively (Briand and Wieczorek, 2002).
Along with traditional software, the effort estimation of web based projects is also critically important. The continuously increasing online retail sales ($2,197 trillion in 2017) (Saleh, 2017) reflects the importance of successful web development and accurate effort estimation is the foundation of this success. The demand of being quick-to-market makes web development different from traditional one. The other difference includes its complex nature, small team size, use of multiple techniques (scripts, API's and graphics etc.) (Reifer, 2000) and ad-hoc processes. Despite of these differences, a very few models have been proposed to exclusively deal with web based projects. One of those is WebMo, proposed by Bohem (1984). This model uses the Web Objects metric for effort estimation. The Web Objects include; building blocks, web components, graphics or multimedia files etc. (Reifer, 2000).
The above mentioned limitations of algorithmic methods attracted us to look for non-algorithmic methods to estimate effort of web based projects. Very few researchers had attempted ANN for estimation web based project. This motivated us to propose a novel non-algorithmic model called Neuro-Web. The model has been verified with the help of 164 real life web based projects.
The remaining paper is structured as follows: Section 2 discusses the Background, Section 3 describes the Literature Review, Section 4 discusses the Proposed Methodology, Section 5 describes the Experimental Setup, Section 6 presents the Results and Analysis and finally the section 7 concludes the paper with some future directions. Boehm (1984) introduced the WebMo model for effort estimation of web based projects (Reifer, 2000). The WebMo model is derived from COCOMO; a widely used model for effort estimation of traditional projects. The core difference between WebMo and COCOMO is number of cost drivers and the sizing metrics. The COCOMO has 15 cost drivers and WebMo deals with 9 cost drivers. The project size is calculated in Source Lines of Code (SLOC) for COCOMO whereas the WebMo uses the Web Objects metric for size calculation. The Web Objects includes; API's, JavaScript Applet, Graphics, Hyperlinks, application points, components or multimedia files etc. The size of the project is calculated with help of Halstead (1977)'s equation as given below (see Eq. 1). The volume is computed to calculate the size of web project (Halstead, 1977). * = 2( ) = ( 1 * + 2 * ) 2( 1 * + 2 * )

WebMo model
where, N is No. of occurrences of web objects and operations on those web objects, n is No. of unique web objects and operations, N1* is total occurrences of web objects, N2* is total occurrences of operations on web objects, n1* is No. of distinct web objects, n2* is No. of distinct operations on web objects, and V* is volume/size of the project. Eq. 2 is used to calculate the effort (in personmonths) of the Web based project. (2) where, Size is Size as calculated from Halstead's equation, Cd is 9 cost drivers, A is constants, and P1 is power laws. Table 1 contains the values of A and P1 for different categories of web based projects. Table 2 describes the details of 9 cost drives. The value of cost driver can be measured in an ordinal scale (Very Low, Low, Nominal, High and Very High).

Artificial neural networks
Artificial Neural Network (ANN) is a network consists of artificial neurons (Richard, 1987). These neurons are connected to each other with help of connection links as shown in Fig. 1. Some weights are associated with each connection. These weights contain information about input signals which is used to solve some specific problem. In ANN, the nodes are organized in different layers. These layers are consisted of input layer, hidden layer(s) and output layer. The inputs need to be provided to each neuron. Every neuron has its internal state called activation level of neuron. The inputs with aggregated weights, measured on some threshold are provided to activation function to produce the output of that neuron. A large number of activation functions are used depending on nature of problem, like Linear, Sigmoid, Gaussian, Tangent, Hyperbola, Parabola etc.

Fig. 1: Artificial neural network basic model
The ANN needs to be trained like human brain. A large number of algorithms exist for training purpose of neural networks, but which algorithm will best works for some artificial neural network, depends on the architecture of that network. The most widely used topology or architecture of ANN is the feed-forward neural networks (FFNN). The information flows from input neurons to output neurons and never goes in reverse direction in feedforward network. The intermediate layers called hidden layers can be used to increase the dimensionality of neural network (Richard, 1987).

Literature review
One of the important needs of software project management is precise, consistent and accurate prediction of resources. Researchers are working in this direction from last twenty years but still many challenges are associated with cost and effort prediction (Bhatnagar et al., 2010).
The researchers more focus was on minimizing the subjectivity of traditional software estimation methods. A very few researchers took web based project estimation under consideration. There are different algorithmic, regression-based and parametric models like COCOMO model (Boehm, 1984), SLIM's model (Putnam, 1978) and Function Point Analysis (Sheta et al., 2008) and Ordinal Regression Model (Sentas et al., 2005) for traditional software. There are also non-algorithmic techniques exist for software effort estimation. These techniques include; case base reasoning (Mukhopadhyay et al., 1992), clustering (Zhong et al., 2004), artificial neural networks (ANN) (Richard, 1987) and genetic algorithms (GA) (Martin et al., 2005). Due to the successful application of genetic algorithms (Qamar et al., 2018) and artificial neural networks (Richard, 1987) in different domains (i.e. medicine, geology, engineering, image processing, physics, classification and control problems), It grabs the attention of more researchers to use this for software effort estimation and many researchers used this in different areas of software project management. Tronto et al. (2008) and Bhuyan et al. (2014) evaluated the use of artificial neural networks as prediction of cost and effort in software project management. Furthermore, Finnie et al. (1997) reported that back propagation learning algorithm on multilayer perceptron for software effort prediction. Srinivasan and Fisher (1995) also used multilayer perceptron for effort prediction on COCOMO dataset. Ruhe et al. (2003) used hybrid techniques for web based projects estimation. He used small dataset from industry. The multivariable regression and expert judgment were the used techniques to estimate effort. Later, Costagliola et al. (2006) performed the comparison between two types of web based measures for size estimation. Mendes (2007) used Bayesian Network for effort estimation and found it better than regression-based model. Mendes (2007), further used Classification and Regression Trees (CART) and case-based reasoning (CBR) techniques for web based project estimation. The WebMo model (Reifer, 2000) proposed by Boehm (1984) was also used to estimate the effort of web projects. Reddy et al. (2007) proposed an approach for web effort estimation using ANN in 2007. Later on, Panda (2015)

Proposed methodology
The WebMo is the algorithmic model which is developed for web-based projects. We had rectified this model in Multilayer Artificial Neural Network by providing the parameters of WebMo model to the ANN as input and estimated effort was measured by training the ANN. A detailed comparison between actual efforts, WebMo's estimated and proposed model's (Neuro-Web) effort was conducted.
A Feed Forward Neural Network was designed (as shown in Fig. 2) which is taking 9 cost drivers and calculated size in its input neurons layer and there are five neurons in hidden neuron layer and one output neuron. The number of neurons in hidden layer was selected after an iterative testing process by keeping in view that more neuron can cause the issue of over fitting.

Fig. 2: Architecture of ANN used for neuro-web model
The steps of Neuro-Web Model are given below: Step 1: Get values against all cost drivers Step 2: Initialize the weights, biases and number of nodes in hidden layer.wi=whi=1; biasi=1 Step 3: Set learning rate α = 0.003 Step 4: Test stopping condition for false, Repeat the steps 5 to 12 Step 5: For each training data, Repeat the steps 6 to 12 Step 6: Compute the hidden layers Hiddenj = b1 +Σ Xi*wij for i=1 to 16; j = 1 to n Step 7: Activate the hidden layers Hiddeni =1/1+e-H for i = 1 to n (number of hidden nodes) Step 8: Compute the output layer effort = bias2 + Hidden1*wh1+...+Hiddenn*whn Step 9: Compute error error = ln(Actual Effort) -efforts Step 10: Compute Δw Step 11: Update the weights using Δw Step 12: Test stopping condition Repeat Step 13 and 14 for all projects in test data Step 13: Compute effort of testing data Pick weight from weight associative memory of that training project which gives effort closest to actual effort.
Learning rate α = 0.003 and stopping condition is error should be less than some threshold.
Initially, all weights (input and hidden layer) and bias are set on 1. For each training data step 6 to step 12 are perform, in these step weights are being updating and saving in a weight associative memory. In step 6 and 7 weights of every connection of input and hidden layer is computed respectively. In step 8, effort is computed with the help of input and hidden layer weights. From step 9 to 12, weights are being updating while error (actual -estimated effort) is greater than defined threshold. In step 13, estimated effort is computed for training data, for this purpose weight associative memory is used. The accuracy of estimated effort is calculated by using the most popular method such as Magnitude Relative Error (MRE) and Mean Magnitude Relative Error (MMRE) (Briand et al., 1999) which are described in Eq. 3 and Eq. 4.

Experimental setup
The dataset was collected from different software houses and freelancers to analyze and implement the effort estimation model. The companies provide us information on a condition of hiding their identity and to use this information just for research purpose. The Fig. 3 depicts the details of dataset collected from different sources. Around 40% of the data sets were collected from different software houses of Pakistan, more particularly from Lahore. The other sources of dataset (17%) were the freelancers working in virtual teams for clients. Around 43% of projects data was taken anonymously (developers did not disclosed their affiliation with any company).
The total of 164 projects dataset was divided into training (61%) and testing (39%) parts as Neural Network works in two modes: training and testing mode. During training mode, we used training data to adjust the weights. While the testing mode will validate either network is trained properly or not on provided testing dataset. The single instance of data set (a project) contains project name (project pseudonym), its launching year, values of its 9 cost drivers in term of very low, low, nominal, high and very high, occurrences of operands and operators in a project, distinct operands and operators and its actual effort in person months was given. Each attribute is comma separated. Table 3 demonstrates the values of parameters used to implement the feed forward neural network.

Results and analysis
The overall results for 64 projects (Testing Dataset) are presented in Table 4. The second column represents the actual effort taken from software houses, anonymous sources and freelancers with collected dataset. The 3rd and 4th columns compare the estimated effort using Neuro-Web and WebMo respectively. The results show that the Neuro-Web performed much better than WebMo as Neuro-Web estimated effort was much closer to the actual effort. The last two columns of the table calculate the MRE of Neuro-Web and WebMo for estimated effort. It can be seen in table 4 that mean relative error of Neuro-Web is much lesser than WebMo and ultimately the MMRE too. The overall results demonstrate that Neuro-Web performed much better than WebMo. Fig. 4 depicts the comparative analysis between actual efforts, Neuro-Web's estimated effort and estimated effort of WebMo. It can be shown though figure that estimated effort of Neuro-Web is much closer to the actual effort when comparing with WebMo.

Conclusion
In this paper, we have proposed a novel nonalgorithmic model Neuro-Web for effort estimation of web-based projects. The proposed model is calibrated with help of 164 real life project's dataset. The model is based on artificial neural network that need to be trained like human brain. The model used WebMo model parameters as input. The estimated effort of proposed model was compared with actual effort and the effort measured using WebMo model. The estimated effort using Neuro-Web was close to actual effort. The MMRE of Neuro-Web was just 9.92% while the MMRE of WebMo was 26.27%. As of now, we have taken this dataset from anonymous sources, Pakistani software houses and freelancers only. In future, this model can be evaluated using the datasets taken from different international software houses. Similarly, instead of comparing this model with WebMo, this could be compared with other web effort estimation tools/models. The model can be calibrated by using different company sizes, different areas of web based applications and different number of datasets. Last, but not the least, the research can be extended for different other non-algorithmic techniques like Fuzzy techniques, Swam Intelligence and Genetic Algorithms.