Statistical distribution for initial crack and number of loading in fatigue crack growth process

Article history: Received 29 April 2017 Received in revised form 25 August 2017 Accepted 28 August 2017 A statistical distribution for crack growth technique is one of the important issues emerging from the fatigue crack propagation process. This study aims to compare three different statistical distributions for providing the best modelling of the fatigue data. The normal, the lognormal and the Weibull distribution are compared for determining a better fit for the variables. Kolmogorov-Smirnov has been chosen as the criterion of the best distribution of the variables. Ten replicate specimens of aluminium alloy A7075-T6 in constant amplitude crack tests were conducted. The number of cycles for the formation of the initial crack and initial crack length were taken as random variables. A Bootstrap approach was applied for ensuring that the chosen distribution was the best representative for this type of variables since small data was incorporated in this analysis, it was not suitable to justify the true population. Thus, the result showed that the lognormal distribution was the best distribution to represent the number of cycles and the length of the initial crack. It was found that whether the normal and lognormal types were suitable for those variables, the lognormal was more conservative for these types of variables. These two variables played the main role in life prediction. Therefore, an analysis of the statistical distribution is highly important. It is believed that these results lead to the significant prediction of fatigue lifetime.


Introduction
*One of the important criteria to determine the mechanical effectiveness is fatigue occurrences. For example, predicting life estimation of a component in a mechanical field such as the components in an aircraft and train axle is important for minimising the cost of maintenance and inspection (Huynh et al., 2012). There are two phases that contribute to the life of fatigue; the first one is the number of loading cycles required to initiate a crack and secondly, the number of cycles it takes for that crack to propagate and result in failure. Supported by Schijve (2014) and Li et al. (2015a), the process starts with initially, a crack initiation period which is controlled by the local stress cycles at the material surface and secondly, a crack growth period, where there is a small crack growth that is driven by the cyclic stress intensity around the crack front.
Crack initiation is a term that is used differently by scientists and engineers. For the scientists, studying the fatigue in the laboratory defined initiation as the number of cycles required to form, generate or nucleate the smallest crack that they could detect by any means. Meanwhile, for the engineers, designing and maintaining structures defined as initiation, is likely to mean the smallest crack or an engineering-sized crack that could be determined by a reliable Non-Destructive Evaluation (NDE) technique (Chan, 2010).
One of the factors that determine the durability of the material is the occurrence of crack initiation on the material performance and the advanced process provides a warning that the condition of the material has been threatened (Goszczyńska, 2014). Sangid (2013) studied the importance of crack initiation in predicting the fatigue life of components. Crack initiation dominates the process of fatigue failure, which is the most important factor, as the majority of the lifetime is spent in this stage, while the fatigue crack growth exerts a small influence on the global predictions, mainly in the high-cycle fatigue regime (Correia et al., 2013;Scharnweber et al., 2013).
Identifying the suitable statistical distribution for the initial crack length and the number of loading cycles is required to make a better prediction of the material's lifetime. An appropriate statistical approach should be used when studying the influence of the material microstructure on the fatigue behaviour of structural elements (Glodež et al., 2013). The analysis of the experimental surface crack growth data aims to determine the best statistical distribution and parameters of the relevant variables, allowing a better fit of the experimental observations (Khelif et al., 2008). Min et al. (1996) analysed the distribution of the crack initiation's life and growth life and found that all of these results would be able to provide a solid test foundation for the study of the probabilistic fatigue, probabilistic fracture mechanics, fatigue reliability and its engineering applications.

Related works
A statistical distribution of the relevant variables is required in fatigue problems despite the fact that it is difficult to derive a physical description of the fatigue phenomenon quantitatively. Normally, a statistical distribution is assumed and the most wellknown function is the Gaussian distribution (also normal distribution). Khelif et al. (2008) found that the Weibull and lognormal distributions were proposed in the literature, regardless of the fact that the choice of appropriate distribution is a very difficult task. Thus, the statistical analysis is important to determine the best distribution of the variable rather than assuming it as a normal distribution.
Previously, Schijve (2005) analysed three distribution functions in order to describe the distribution for fatigue; (i) the normal distribution function, (ii) the 3-parameter Weibull distribution function, and (iii) the lognormal distribution function. A study by Wu and Ni (2007) also considered three statistical distributions namely the Normal, Lognormal and Weibull distributions on aluminium alloy 2025-T6 by comparing the datasets and they found that the random loading cycles and random crack sizes were best fitted by both the lognormal and Weibull distributions, respectively for most cases studied. However, there was an argument, despite the fact that the three-parameter distributions of the Weibull and lognormal types are suitable for lifetime prediction, and the twoparameter Weibull distribution is more suitable for the probabilistic fatigue design (Khelif et al., 2008). Another study by Makkonen (2009) only considered the Weibull distribution in his study, since it provides a slightly better fit with the experimental data and therefore, the parent population of crack initiation was assumed to be Weibull-distributed.
Verification is required after the process of identification of distribution is performed. A Goodness of fit test is applied for the verification of the decision for choosing the suitable distribution. The Goodness of fit tests like the Kolmogorov-Smirnov or Chi-Square is applied due to the selection of the best distribution of the variables among the comparison distributions. Previously, Kolmogorov-Smirnov's goodness of fit test was used to select the best distribution among five distributions (Normal, 2-parameter lognormal, 3-parameter lognormal, 2parameter Weibull and 3-parameter Weibull distributions) and based on the test, only the classical lognormal was seen to be suitable for the best fitting distribution for the HDPE structures (Khelif et al., 2008). Another study by Bao et al. (2009) found that the crack growth rate could be considered reasonably to follow the log-normal distribution in all the three stress levels (260MPa, 280MPa and 320MPa) and verified by the value from the Chi-Square goodness of fit test. The purpose of this study is to determine the statistical distribution of the two variables: initial crack length and a random number of loading. In the real applications, the number of loading to reach a certain crack length is estimated as a continuous variable. For experimental purposes, that number of loading is counted as a discrete variable to determine the trend of the fatigue crack growth.

Small sample size and bootstrap approach
In the engineering field, specifically in fatigue crack growth problem, it is rare to obtain a large sample size due to some limitations in the experimental works. For the statisticians, it is believed that the small data set will cause a problem in producing the results of the parameter estimates. In other words, a further analysis and results from this parameter estimate could be doubtful due to the accurateness of the predecessor analysis. Some phenomenon like the prediction analysis when involving small data set always produce misleading result (Šeruga and Nagode, 2015;Bello et al., 2015). Thus, the estimation results from a small sample data set, for instance, the sample mean and sample standard deviation, are not suitable to justify the generalisation of the true population.
Bootstrap resampling is a solid and popular method to resample the original data from a small sample size. The Bootstrap method is introduced to overcome the estimation parameters' problem in generalising the true population. Estimation values from the sample are crucial for ensuring the robustness of the analysis, like a prediction of a lifetime of a material. Bootstrap method is the resampling technique from the initial sample that requires at least 1000 bootstrap resamples is sufficient to obtain accurate confidence interval estimates (Chen et al., 2015). Based on the bootstrap method, standard error and confidence interval would be calculated to describe the uncertainty in the probabilistic models based on the limited data. The result of confidence interval and standard error from the Bootstrap technique have shown the best values compared to other methods like Monte Carlo and classical method (Khelif et al., 2008). The problem of a small sample size will contribute to a problem that leads to a misleading selection of the best-fit probability distribution. It is proven to handle the problem of large variation when modelling with a small set of samples for ensuring that the objective and reliable analysis would represent the large population (Li et al., 2015a;Bello et al., 2015;Chen et al., 2015;Suo et al., 2015).
For estimating the sampling properties of statistics from the data, bootstrap method is applicable due to simple and straightforward method (Li et al., 2015b). The characteristics of the bootstrap are; there is no need to make any assumptions about the overall distribution, and it can be inferred with the sample data by the computer technology (Suo et al., 2015). Therefore, the technique is useful for small sample size and with an unknown distribution. As a data-driven method, the bootstrap method can evaluate the measurement uncertainty from the poor information without prior information about the probability distribution of the measured data in real-time (Chen et al., 2015).
In this paper, the determination of the statistical distribution is carried out on the initial crack length and the number of cycles for the initial crack formation on aluminium alloy 7075-T6, where three different probability distribution functions have been considered to search for a better fitting: Normal, lognormal, Weibull distributions and subsequently, the Kolmogorov-Smirnov was applied to determine the best distribution for these two variables. However, the experimental results only yielded ten specimens, which is a very small dataset to generalise the actual population. Therefore, the bootstrap approach is applied to make sure that the chosen distribution was representing the variables.

Data measurement
In this study, the data was collected from the experiments. The number of loading cycles forming the initial crack was determined and the lengths of the crack initiation were measured. Aluminium alloy of 7075-T6 was chosen as a material in this experiment. This type of material, aluminium alloy of 7075-T6 series, is used to make aircraft, especially the wings on a naval aircraft (Newman et al., 2013).
There were only ten samples in the experiment. The dimensions of the specimens were 160.0 mm length, 60.0 mm wide and 20.0 mm thickness. They were tested at room temperature under constant amplitude loading of 45kN stress load and 0.1 fixed stress ratios in order to observe the fatigue crack growth on the surface of the material. The use of 45kN ensured that it excessed the endurance limit load, which affirmed that the fatigue would occur for the dimensions of this material. The length of the initial crack of the ten specimens was measured using the digital calliper and the average length of five measurements was used. Fig. 1 shows the result of the experiment concerning the formation of semi-elliptical of initial crack length and fatigue crack growth on the material surface. The semi-elliptical shapes were formed due to the nature of the fatigue crack growth and the dimension of the material. These particles or clusters are represented as semicircular shape (Newman et al., 2013;Náhlík et al., 2017). This paper only focuses on the initial crack part, which is the first semi-elliptical flaw.  Fig. 2 presents the data of the random loading cycles nucleating the initial crack for 10 specimens of aluminium alloy A7075-T6. The results showed that there was no relationship between the random loading cycles and initial crack since the initial crack formed does not depend on how many loading cycles can go through the material. The majority of them only required fewer random loading cycles but they produced a lengthy initial crack as compared to the others. In the meantime, some of the specimens required more loading cycles to initiate the initial crack such as specimen numbers 4, 6 and 8, even though the experiments were conducted on the same material, dimension, machine and environment. There are some factors that influence the results during the conduct of the experiment, such as machine shutdown or a human factor. correlation coefficient, which is only 3.96% correlation. Meanwhile, Fig. 4 illustrates the process of the fatigue life for the ten specimens: starting with the initial crack, crack growth and failure. It indicates that the initial crack lengths vary from 3mm to 6mm for all specimens and there is only one specimen showing random loading cycles more than 100,000 cycles to fracture.

Method
The distributions for the initial crack length and the number of cycles to nucleate the initial crack were selected. Based on the previous literature, the selection of the distributions was found suitable for the data lifetime particularly the fatigue crack growth data (Schijve, 2005;Wu and Ni, 2007). The probability density functions, parameters estimation and its statistical properties are given as follows:  The probability density of a normal distribution is given by Where refers to the mean length of the initial crack length (mm) and the number of loading cycles and is a standard deviation parameter of the same variables. Parameter estimates can be determined using the maximum likelihood approach, which is derived from the likelihood function. Likelihood function for the Normal distribution is given as Then, the estimators can be derived from the likelihood function by the differentiation of ( ; , ) with respect to and , which are given as Then, by solving these two partial derivatives, we can obtain the parameter estimation for both and , which are given as: The parameters, and 2 , provide the information about the mean and variance of the distribution. Apart from that, the coefficient of variation is ⁄ , coefficient of skewness is equal to zero and the coefficient of kurtosis is equal to three.
 The probability density of the lognormal distribution is given by: Where, is a shape parameter and is a scale parameter of the length of the initial crack and number of loadings. The likelihood function of the Lognormal distribution is given as Then, by using the same approach, the maximum likelihood estimator for the Lognormal parameters is determined as The statistical properties of the Lognormal distribution are provided by the mean, [ + 2 2 ⁄ ], the variance, ( 2 ) − 1] and the coefficient of kurtosis (4 2 ) + 2 (3 2 ) + 3 (2 2 ) − 3 (Krishnamoorthy, 2006). where, is a shape parameter and is a scale parameter for the initial crack length and the Then, using the same approach, the maximum likelihood estimator for the Weibull distribution can be determined as The maximum likelihood estimator for the parameters of Weibull distributions can be determined numerically using methods such as the Newton-Rapson, scoring, EM algorithm, quasi-Newton and the Nelder-Mead method (Masseran et al., 2013). The statistical properties of the Weibull distribution are provided by the mean, Γ(1 + 1⁄ ) , the variance, (Krishnamoorthy, 2006)  Since the data is considered as a small sample size, the bootstrapping analysis is considered to generalise the population. Bootstrap samples are constructed based on a random sampling with replacement from the original dataset. With that, each observation in the bootstrap sample set may appear once, more than once or not at all. With the constructed bootstrap sample set, the statistics of concern (e.g., sample mean, sample SD, and K-S scores) are obtained.
Based on the Li et al. (2015b), the procedure is as follows: Let, original data set, = { , = 1,2, … , }, then, a bootstrap sample set = { 1 , 2 , … , }is constructed by random sampling with replacement from . From this bootstrap sample set, the sample mean and SD are given as The above procedure is repeated and the bootstrap sample sets are obtained. For each set of bootstrap samples, the sample mean, standard deviation and the KS values are computed.
Then, the bootstrap mean and SD estimates of the sample mean value, can be calculated by Similarly, the bootstrap mean and SD estimates of the sample SD, , are derived as below: In this paper, N=5000 is adopted. A 5000 resampling was selected to ensure that the results were reliable.

Results and discussion
The descriptive statistic was analysed to describe the experimental data obtained. The Kolmogorov-Smirnov goodness of fit test (K-S test) was applied to investigate the statistical distribution of the number of the loading cycles and the length of the crack initiation. Tables 1 and 2 show the result of the parameters estimation, the goodness of fit test and the Coefficient of Variation (COV) values for each distribution of the length of initial crack and number of loading cycles for the original data set, respectively.

Determining probability distribution
As discussed above, the three distributions were also compared in the bootstrapping analysis. Fig. 5 indicates the PDF of the mean of number of cycles on the three distributions: normal, lognormal and Weibull. It clearly shows that the Weibull curve is shifted out from the histogram while, the normal and lognormal curves fit perfectly on the histogram. As the number of specimens tested is small for the experiment, the classical parameter estimates are not suitable. Hence, to verify the value of parameter estimates, two methods were compared to take into account the small sample estimates: the resampling technique and the Bootstrap method. Table 3 shows the obtained mean and the standard deviation, and the confidence intervals for the two variables. The Bootstrap technique was seen to give the narrowest confidence intervals for the given data. It could be interpreted as 95% confident that the average number of cycles of all populations was between 31212 and 47426 cycles, whereas, the mean of initial crack length was between 3.5mm and 4.097mm. Moreover, the standard error and standard deviation, in this technique, were seen to be lower than the other methods. Standard deviation explained the shape of the distribution, which was the distance of the individual data from the mean value, while, the standard error explained the distance of the sample mean to the true mean of the true population.
Figs. 6-9 show the probability density functions of the sample mean and the standard deviation of the initial crack and the number of cycles that nucleate the initial crack, respectively. Their sampling properties are summarised in Table 4-6.    Tables 4-5 exhibit the descriptive analysis of the three selected distributions of the initial crack length and number of cycles based on the mean and the standard deviation values from the bootstrap data, respectively. This includes all the parameter estimation values, including the COV values, which were slightly lower than the original data in Table 1 and Table 2. Based on these results, the Weibull distribution was seen to be unsuitable for both the variables even for a small or a large-scale data, and the COVs of the mean values for the three candidate distributions of both variables were comparable.  Fig. 7: Standard deviation of the bootstrap initial crack   Table 1 and Table 2. The COVs of the KS values for the normal and lognormal candidate distributions of initial crack and number of cycles are comparable. However, the COV of the KS value for the Weibull distribution was seen to be significantly different from the data. The best-fit distribution can be identified from the KS values associated with the three candidate distributions for each bootstrap sample.    Note that the distribution cannot be determined as the best-fit distribution due to the small sample size. In this respect, the bootstrap method shows a great advantage over the traditional methods. Fig. 12 illustrates a comparison of the probability plot for all three candidate's distributions of the initial crack length. The behaviour of all distributions is quite similar which is between 3.65mm and 4.1 mm. There are two intersections illustrated in the graph, which explained the likelihood of the initial crack formation happening at 25% and 90%, respectively, which are around 3.8mm and 4.1mm.  Fig.  13 presents the fact that all distributions have agreed to predict the 25% likelihood of initial crack length being formed during 35,000 loading cycles, and 90% likelihood when there are 45,000 loading cycles.
From the observations, the Normal and Weibull distributions demonstrate a linear relation or almost a similar behaviour in explaining the distribution of the data. Conversely, the Lognormal demonstrates the curve relation. Naturally, the curve relation is better in presenting the real situation problems. Fig.  14 supports the lognormal distribution, showing that it is the best distribution to represent the variable.

Conclusion
A constant amplitude-loading test was conducted on an aluminium alloy A7075-T6 in order to identify the statistical distribution. Based on the experimental findings, there is no relationship between random loading cycles and the length of the initial crack. The Normal, lognormal and the Weibull distributions were selected and compared and the selection of the best distribution was verified by the Kolmogorov-Smirnov test. The major conclusions of this investigation have been summarised below. First, two variables were selected as the main factors in Firstly, the two variables were selected as the main factors in explaining the fatigue crack growth problems: initial crack length and random loading cycles. Secondly, the numerical results showed that the bootstrap method could effectively model the variations of the sample statistics and the KS scores, even though the sample size of the original dataset was as small as N=10. Thirdly, by the bootstrap method, it was significantly seen that the best distribution of the two variables in the crack problem, was the lognormal distribution, which was conducted graphically and quantitatively. It is believed that the outcomes from this study would be able to model the fatigue time accurately. This study considered the random loading and the initial crack length. To predict the fatigue life, there are three important processes. Therefore, the statistical distribution for the fatigue life should be considered for future research to ensure the modelling of the fatigue life distribution more accurately.