Development of a prediction model based on linear regression to estimate the success rates of seafood caught from different catching centers

For businesses and organizations that aim to be efficient and competitive on a worldwide basis, food quality assurance is extremely important. To maintain constant quality, global markets demand high food hygiene and safety standards. Intelligent software to assure fish quality is uncommon in the fishing industry. Most seafood processing industries utilize Total Quality Management (TQM) systems to ensure product safety and quality. These protections ensure that significant quality risks are kept within acceptable tolerance limits. However, there are no ways for calculating the success rates of seafood obtained from different catching centers. The purpose of this study is to develop algorithms for predicting the success rates of seafood caught at different catching centers. To determine the best model to match the data, the algorithms employ the Least-Square Curve Fitting approach. The success rates are predicted using the best-fit model that results. The bestFitModelFinder algorithm is used to find the best model for the input data, while the prediction of quality algorithm is used to predict the success rate. The algorithms were tested using data obtained from a seafood company between January 2000 and December 2019. Statistical metrics such as mean absolute deviation (MAD), mean square error (MSE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are used to evaluate the prediction accuracy of the presented algorithms. The algorithms' performance analysis resulted in lower error levels. The proposed algorithms can assist seafood enterprises in determining the quality of seafood items sourced from various fishing areas.


Introduction
*Food quality assurance is particularly important to efficient and internationally competitive businesses and organizations. To ensure consistent quality, international markets demand high food hygiene and safety standards. Food-borne infections are on the rise in developed countries, according to the World Health Organization (WHO), due to a lack of effective procedures to check the quality and safety of food (WHO, 2015). As a result, assuring food quality has become an extremely critical concern in recent years, and it is the first move toward drawing worldwide attention to a country's food goods. In many parts of the world, seafood has long been a favored element of the diet, and in some places, it has even served as the primary source of animal protein (Huss, 1994). Fish is becoming a healthier option for red meat for an increasing number of individuals (Huss, 1994). The consumption of fish, as an excellent source of omega-3 fatty acids, proteins, and vitamins, is increased among people and became an essential part of a balanced human diet (Roos et al., 2003). However, if handled improperly, these items can cause a variety of infectious diseases including food poisoning in humans. Seafood goods, in general, suffer numerous quality control issues across the product spectrum, particularly in export markets.
Intelligent software to assure the quality of fish is not very common in the seafood industry. To assure product safety and quality, most seafood processing businesses use Total Quality Management (TQM) systems. Good Manufacturing Practices (GMP), Sanitation Standard Operating Procedures (SSOP), and Hazard Analysis Critical Control Point (HACCP) processes are all included in such systems (Oliveira et al., 2016). All these measures ensure that significant dangers that compromise the quality are kept under acceptable tolerance levels. However, no methods are available for analyzing the results of the numerous tests or predicting the success rates of seafood acquired from specific or distinct catching centers. The purpose of this research paper is to suggest algorithms to predict the success rates of seafood caught from different catching centers. This is accomplished by analyzing the past success rates of seafood from the catching center and forecasting future success rates. The proposed algorithm examines previous data, determines the best model to fit the data. The fitted model is then used for future predictions.
The two types of prediction approaches are qualitative and quantitative prediction. Qualitative prediction methods are sometimes known as "subjective" procedures since they rely on people's ability to extrapolate and generalize. Quantitative prediction methods rely on historical data to construct mathematical extrapolations of future data. When historical data is available, such techniques are applied. The information can be specified in mathematical terms, and those characteristics of the standard that have been validated in the past can be assumed to continue in the future (estimation of continuity).
While conducting literature studies, it has been observed that there are no studies in the linear regression-based prediction of quality of seafood based on the catching centers. We found a 2019 study, where the numerous fish rotting indicators were assessed throughout 12 days of storage at 4 2 °C using a basic multispectral imaging (430-1010 nm) system, as well as linear and non-linear regressions (Khoshnoudi-Nia and Moosavi-Nasab, 2019). Total-Volatile Basic Nitrogen (TVB-N), Psychotropic Plate Count (PPC), and sensory score in fish fillets were used as markers (Khoshnoudi-Nia and Moosavi-Nasab, 2019). Different chemometric models, such as partial least-squares regression (PLSR), multiple linear regression (MLR), leastsquares support vector machine (LS-SVM), and backpropagation artificial neural network (BP-ANN), were examined in terms of prediction performance. For simultaneous prediction of PPC, TVB-N, and sensory score, all models performed well (R2P 0.853 and RPD 2.603) (Khoshnoudi-Nia and Moosavi-Nasab, 2019). The study of García et al. (2017) included the creation of a smart quality sensor that can be used to assess and predict quality over time. The sensor uses biochemical and microbiological deterioration markers, as well as dynamic models, to predict quality according to QIM and EU grading criteria.
The rest of the paper is organized as follows: Section 2 is concerned with the materials and methods used in this research. It includes the general literature about the different stages involved in the development of an effective prediction model, the theory of curve fitting with the focus on leastsquare curve fits. The proposed algorithms to predict the quality, the find the best model to fit the data are also included in this section. The next section presents the results and the interpretation of the results. Performance analysis of the proposed algorithms is given in Section 4. The conclusion is given in Section 5 followed by references.

Development of an effective prediction model
Model selection, Model fitting, and Model validation are the three essential phases in constructing an effective prediction model (Pham, 2006). The available data points are plotted in the model selection step to establish the shape of the model to fit the data (Pham, 2006). Most of the models use linear, polynomial, or simple nonlinear functions. As a result, the ideal technique to choose an initial model is to plot the data, examine the shape, and then choose the best model that matches the data (Pham, 2006). Following the selection of a basic functional model, the next stage in the modelbuilding process is to determine the unknown parameters in the function using an acceptable model-fitting method (Pham, 2006). Maximum likelihood and least-squares are the two most common methods for parameter estimation. Both methods generate parameter estimators with a variety of useful characteristics. The most crucial phase in the process is to validate the model. Several statistical indicators are used to determine the accuracy of the chosen model in this step. The analysis of residuals is the primary metric. The variations between the actual values and the matching anticipated values produced using the regression function are the residuals from a fitted model (Picard and Cook, 1984). If the chosen model appears to be appropriate, it is utilized for prediction. If the model validation reveals any flaws in the chosen model, the modeling process is repeated to choose a better model (Kuhn and Johnson, 2013).

Curve fitting
Curve fitting is the method of determining the equation of the best-fit curve that can be used to forecast unknown variables (Guest and Guest, 2012). It is also referred to as regression analysis, and it's used to identify the "best fit" line or curve for a set of data points (Freund et al., 2006). The data points are shown, and the basic form of the model is seen during curve fitting. Any point along the curve can be found using the equation generated. To verify shape similarity, the displayed data can be compared to a series of curves. The data must be converted to resemble the curve once a curve that matches the general shape of the data has been determined (Freund et al., 2006). The equation is derived by the form and is used to interpolate or extrapolate the results. The statistical process of regression varies from curve fitting in that the latter is often the most sensible technique of obtaining the former (Freund et al., 2006). Curve fitting places a higher emphasis on the shape of the curve that will be used to match the data. However, regression is frequently used without much consideration for curve selection.
Curve fitting has the advantage of allowing us to reliably estimate parameter values if we know the mathematical model of the process that created the given data. The technique is known as parametric regression (Hardle and Mammen, 1993). However, obtaining a satisfactory fit necessitates a strong representative system model and correct beginning parameter values, which is the system's drawback. The least-square curve fits, non-linear curve fits, and smoothing curve fits are the three types of curve fits (Motulsky and Ransnas, 1987). The least-square curve fits are the most common of these.

Least-square curve fit
The least-square curve fit minimizes the sum of the squared vertical deviations between the original data and the predicted values (Howell, 1971). This method of the curve fit is relatively straightforward and easy to compute and understand, but it is not the most statistically robust method of fitting a function. This method is sensitive to outliers in the data (Howell, 1971). If a given data point is widely different from the rest of the points, the regression results can be false. The commonly available leastsquare curve fits are linear, quadratic, exponential, logarithmic, and power (Howell, 1971).
Linear regression is a straightforward method of supervised learning. It is the most exact method for fitting a linear regression model. Among the different approaches for determining parameter values, the least-squares method is the most reliable (Steel and Torrie, 1960). The parameter values are selected using the least-square approach so that the total of the squared vertical deviations between data points and the curve is as small as possible. LSM is a method for fitting a unique curve through a set of data points (Fig. 1).
Because there is a linear relationship between the dependent variable and the parameters, LSM is easily utilized to identify parameters of linear equations. If the model is not linear, it is either translated into a linear equation or alternative nonlinear procedures are used. To convert from nonlinear to linear forms, employ transformations like logarithms, inversions, and exponentials. Here, we consider linear, quadratic, and exponential curves to find the line of best fit. A linear equation is = + , quadratic takes the form = 2 + + and exponential is represented as =

Proposed algorithms
The proposed prediction algorithm predictionOfQuality estimates the success rate of fish. The algorithm accepts the category of the fish, details of the landing center, the years (start and end years) where data is available, and the year of prediction as input. getCountOfHighQuality and getTotalPurchase are two procedures used by this algorithm to retrieve the success count and the total purchase of the fish from the data store. Suitable queries are used by these procedures to retrieve the concerned data. Subsequently, the success rate of each year is calculated. Then, the bestFitModelFinder algorithm is used to find the best fit model for the given input data.
The bestFitModelFinder algorithm accepts the years (x) and the corresponding success rates (y). Initially, half of the data input is passed to the bestFitModelFinder algorithm to find the best model. The Least-Square Method is used in the algorithm. The algorithm compares the y values of a line, a parabola, and an exponential curve to the deviations of the y values from the equations. The squares of each of the variances are computed. Then the lowest of the three values is chosen. The fitStraightLine algorithm will be called, for example, when the squares of the deviations are the smallest from a line compared to others. The predicted success rate values are returned to the bestFitModelFinder algorithm which in turn returns the values to the predictionOfQuality algorithm.
The standard error is determined, and if it is below acceptable limits (less than 5%), the entire dataset is sent into the bestFitModelFinder algorithm, which finds the best-fit curve. Otherwise, the initial data point is removed, and the remaining values are sent to the bestFitModelFinder algorithm, which compares the input values to a line, parabola, or exponential curve.

Results and discussion
The testing of the algorithms was done using the data collected from a Seafood company. The data span the period from January 2000 to December 2019. The results of Cephalopods and Crustaceans fish families are given below. Fig. 2 depicts the success rate prediction of the Cephalopods fish family from a catching center (C1). The bestFitModelFinder program examines the input data and determines that a parabolic curve is the best model for fitting the data. The success rates of Cephalopods are plotted from 2000 to 2019, and the generated quadratic equation is used to estimate the future year's success rate. Fig. 3 depicts the success rate prediction of the Crustaceans fish family from a catching center (C1). The best model for the input data is found to be linear. The bestFitModelFinder program analyzes the data and determines that the square of the variations between the success rates of a parabola equation is the smallest and that the structure is uneven. The standard error, however, is found to be outside of the permitted range. As a result, the method discards the first five data points before determining the bestfit curve (parabolic curve) within the allowable standard error. The updated model is given in Fig. 5.

Performance analysis
The statistical measurements of mean absolute deviation (MAD), mean square error (MSE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are used to assess the prediction accuracy of the presented algorithms (Myttenaere et al., 2016).
The mean absolute deviation is a typical method for calculating total prediction error. This value is calculated by dividing the sum of the absolute values of the individual prediction errors by the sample size. The mean absolute percentage error is the average absolute percent error for each prediction minus actuals divided by actual. The mean of the squared difference between the predicted and observed values is referred to as mean square error. The formulas of the measurements are given below:

Conclusion
Food safety and quality concerns are becoming increasingly important in today's world. For people across the world, seafood is a key source of protein.
Nonetheless, bacteria and other substances found in seafood represent a significant risk to people. The aim of this research is to propose prediction algorithms based on the Least-Square Method to predict the success rates of seafood based on different catching centers. The bestFitModelFinder algorithm is used to select the most appropriate model for the input data and then the predictionOfQuality algorithm predicts the success rate. Least-Squares-Based Methods are used to identify the best-fit curve. The prediction accuracy of the proposed algorithms was measured using various statistical measurements and it resulted in lower error values. The algorithms provided can help seafood businesses determine the quality of seafood items sourced from diverse fishing areas.
Further research can be undertaken to investigate different methods to ensure the quality of export seafood, aquaculture, etc.

Funding
The research leading to these results has received funding from the Research Council (TRC) of the Sultanate of Oman under the Block Funding Program BFP/ RGP/ ICT/ 18/ 113.

Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.