A bivariate exponentiated Pareto distribution derived from Gaussian copula

Article history: Received 25 March 2017 Received in revised form 4 May 2017 Accepted 10 May 2017 The exponentiated Pareto distribution has been used quite effectively to model many lifetime data. In this paper, a new bivariate exponentiated Pareto distribution is introduced. The proposed bivariate distribution is derived from Gaussian copula with exponentiated Pareto distribution as marginals. Some properties of the bivariate exponentiated Pareto distribution can be obtained using the Gaussian copula property. Moreover, several methods of estimation are considered to estimate the unknown parameters of the proposed bivariate distribution. Numerical simulations are carried out to compare the performances of different estimators. Finally, one real data is analyzed and the results showed that the proposed bivariate distribution is useful for real life applications.


Introduction
*The Pareto distribution usually used to analyze lifetime data, but its hazard function is restricted being decreasing. This problem was one of the reasons that lead researchers attention in finding ways to extend this family of distributions. Gupta et al. (1998) introduced a two-parameter distribution as a generalization of the standard Pareto Type II, called the exponentiated Pareto distribution. They proved that exponentiated Pareto distribution is effective for analyzing lifetime data. Depending on the value of the shape parameter, the failure rates take decreasing and upside-down bathtub shapes.
A continuous non-negative random variable T follows an exponentiated Pareto distribution, if its cumulative distribution function (CDF) can be written as (Eq. 1): where, and are two shape parameters. Observe that if = 1, this will give the standard Pareto distribution type II. The probability density function (PDF) is given by Eq. 2 where t , θ, λ > 0 . We write EP to denote the exponentiated Pareto distribution. According to Shawky and Abu-Zinadah (2009), the density in Eq. 2 is decreasing if θ ≤ 1 and unimodal function with mode given by [ λ(θ−1) λ+1 + 1] 1 λ ⁄ − 1 if > 1, which give it a great flexibility of fit.
In many lifetime data applications, we can have one unit associated with two or more lifetimes. For the case where we have two lifetimes T 1 and T 2 associated to each unit, we can consider bivariate lifetime distributions such as the studies of Freund (1961), Marshall and Olkin (1967), Mardia (1970), Whitt (1976), Sarhan and Balakrishnan (2007), Kundu et al. (2010), Gupta et al. (2010), Al-Mutairi et al. (2011), Sankaran et al. (2014), and Olkin and Trikalinos (2015). Hutchinson and Lai (1990) studied the existent of bivariate non-normal distributions and provided many applications for different bivariate models. The use of copula function give a great flexibility to derive bivariate lifetime distributions, see for example, AL-Hussaini and Ateya (2006), Quiroz-Flores (2009), Gupta et al. (2010), Kundu andGupta (2011), El-Sherpieny et al. (2013), Kundu (2015), Achcar et al. (2015), and El-Gohary and El-Morshedy (2015). The main aim of this paper is to introduce a bivariate exponentiated Pareto distribution derived from Gaussian copula with EP distribution as marginals. Some properties and estimate of the parameters will be investigated and analyzed. Simulation study will be used to examine the performance of this new distribution and one real data set will be analyzed to illustrate the flexibility of the proposed distribution.
The paper is organized as follows: in Section 2, some basic definitions of copula and Gaussian copula functions are presented. The proposed bivariate exponentiated Pareto distribution is derived and some properties are discussed in Section 3. Section 4 illustrates different methods of estimation of the unknown parameters. Monte Carlo simulation study and analysis of one real data set are carried out in Section 5. Finally, the paper is concluded in Section 6. Sklar (1959) introduced the name of "copula". It is a function that joins or "couples" two or more marginal distribution functions to construct bivariate or multivariate distribution. The copula function constructs unknown bivariate distributions from known marginals, see Trivedi and Zimmer (2007). In addition, this function can link any type of ma rginal distribution and could be from completely different families. However, other methods such as conditional distributions and mixing distributions regularly depend on marginals from the same family.
According to Sklars theorem, there exists a copula C such that (Eq. 3) F T (t 1 , t 2 ) = C(F T 1 (t 1 ), F T 2 (t 2 ) ) If 1 ( 1 ) and 2 ( 2 ) are continuous and differentiable and C is unique then from Eq. 3, the joint density can be written as (Eq. 4) where f T 1 (t 1 ) , f T 2 (t 2 ) are the density functions corresponding to F T 1 (t 1 ), F T 2 (t 2 ) and C`= is the copula density. Many copulas have been introduced in the literature, where each of them have different dependence structure on the data, the interested readers are referred to Joe and Xu (1996), Trivedi and Zimmer (2007), Lai (2009), andNelsen (2007). In this paper, we will concentrate on the Gaussian copula since it is a flexible and has full range of dependence. In addition, it is easy to generalize to multi-dimensions.

Bivariate exponentiated Pareto distribution derived from Gaussian copula
This section introduces a new bivariate exponentiated Pareto distribution derived from Gaussian copula. Let = ( ), where ( ) is the distribution function of the EP distribution, given by Eq. 1, after indexing T, and by j, j=1, 2. Then by using the definition of a two dimensional copula in Eq. 3, the joint distribution function of the two EP random variables T 1 and T 2 derived from Gaussian copula can be written as (Eq. 7) ( 1 , 2 ) = ( 1 , 2 , ) Consequently, by using Eq. 4, the PDF of the bivariate exponentiated Pareto distribution (BEP) derived from Gaussian copula is given by Eq. 8 where `( 1 , 2 , ) is the density of the bivariate Gaussian copula given by Eq. 6 and ( ) , = 1,2 is the PDF of the EP distribution, given by Eq. 2.

Some properties of the BEP distribution derived from Gaussian copula
It is well known that many dependence properties of a multivariate distribution depend only on the corresponding copula. Therefore, many dependence properties of the proposed BEP distribution can be obtained by studying the Gaussian copula. A non-negative function d defined in ℝ 2 is total positivity of order two, denoted by TP 2 if for all 11 < 12 and 21 < 22 with 1 , 2 ∈ ℝ ( 11 , 21 ) ( 12 , 22 ) ≥ ( 12 , 21 ) ( 11 , 22 ) And if we reverse the equality Eq. 11, we will have a reverse rule of order two denoted by (RR 2 ).
Property 3: Since the Gaussian copula is upper tail and lower tail independent, it follows that if (T 1 , T 2 )~BEP distribution, then T 1 and T 2 are upper tail and lower tail independent.

Graphical description of the BEP distribution derived from Gaussian copula
The joint density ( 1 , 2 ) given in Eq. 8 for three levels of dependence with 1 = 2 = 1.4 and 1 = 2 = 102 , are presented in Fig. 1. Figs. 1a and 1b display 3D and a contour plots of the bivariate density, respectively, when = 0.7. Figs. 1c and 1d, show the contour plots for different levels of dependence parameter =-0.5 and 0.85 respectively. From Fig. 1, it can be seen that large values of the copula parameter, (positive or negative) add a large linear dependence and the PDF will have more mass. In addition, it can be seen that the joint PDF takes on different shapes and will therefore be useful and have more flexibility in analyzing bivariate data.

Estimation
In this section, different methods of estimation are conducted.

Maximum likelihood estimators
Let T i = (T 1i , T 2i )` , i = 1, . . , n be a bivariate random sample of size n from BEP ( 1 , 1 , 2 , 2 , ).Then the log likelihood function becomes Eq. 12. l = ∑ {n ln θ j + n ln λ j 2 j=1 By differentiate the log likelihood function with respect to , and , respectively, and equating the resulting equations to zero, we get Eq. 13.
similarly, for and (Eqs. 15 and 16) Eqs. 15 and 16 will be solved numerically to obtain the ML estimator of , and , respectively. Hence, ML estimator of could be obtained by substituting ̂ from Eq. 15 in Eq. 14. The simulation results of the ML method will be discussed in Section 5.

Sampling information matrix and approximate confidence interval
The ML estimate of the unknown parameters cannot be obtained in closed forms; In this case the approximate confidence interval of the parameters = (θ 1 , λ 1 , θ 2 , λ 2 , ) can be obtained. That is, with the use of large sample and appropriate regularity conditions, the ML of is approximately ~( , −1 ) where −1 is the asymptotic variancecovariance matrix and I denotes the Fisher information matrix. Therefore, the approximate 100(1 − )% confidence interval is: where 2 ⁄ is the percentile of the standard normal distribution with right-tail probability 2 ⁄ .

Inference functions for margins estimators
The ML method could be computational intensive, because it needs to estimate jointly the parameters of the margins and the copula parameter. To avoid that, we will use the Inference functions for margins (IFM) method suggested by Joe and Xu (1996) to estimate the unknown parameters. The two stages of the IFM method can be incorporated as follows: First stage, the estimates of the marginal parameters is computed by maximizing the following (θ 1 , λ 1 , θ 2 , λ 2 ) with respect to θ 1 , λ 1 , θ 2 and λ 2 (Eq. 18).
The simulation results of the IFM method will be carried out in Section 5.

Canonical maximum likelihood estimators
The Canonical maximum likelihood estimators (CML) method introduced by Genest et al. (1995) to estimate copula parameter. It is appropriate when the dependence parameter is important, which can be estimated without identifying the marginal distributions. This approach transforms the observations ( 1 , 2 )′ into pseudo-observations with uniform margins ( 1 , 2 )' using the empirical CDF of each marginal distribution. That is, the CML estimate of can be obtained by maximizing the log likelihood function for Gaussian copula given by Eq. 19 with respect to , where in this case is the pseudo-observations. For more details see, Genest et al. (1995).
Suppose that, a little or no-information about the parameters is available. In this case, the prior distribution of the parameters is considered as a non-informative uniform distribution. Then, the prior distribution for each parameter can be written as follows (Eq. 20).
therefore, the joint posterior distribution is given by Eq. 21.
(θ 1 , λ 1 , θ 2 , λ 2 , | ) = ∏ { ( ) ( )} 2 =1 ( ) (θ 1 , λ 1 , θ 2 , λ 2 , ), It is clear that the above joint posterior function cannot be obtained in closed form. Therefore, MCMC techniques can be used to find the Bayes estimate of the parameters. Bayes estimators are obtained under squared error loss function and 100(1 − )% credible interval for the vector of the parameters is given by [ ≤ ≤ ] = 1 − , where and are the lower limit and upper limit of the 100(1 − )% credible interval for and are obtained by solving these two equations: MHadaptive package in the R program which depends on Metropolis-Hastings algorithm will be used to obtain the mean posterior estimates of the unknown parameters and the credible interval of the parameters numerically.

Simulation study
A numerical study is presented to illustrate the various theoretical results in previous section. Simulation results are given to see how the ML, IFM, CML and Bayesian estimators behave for different sample size and for different values of the copula parameter.
Mean estimates and mean squared errors (MSE) are calculated over 1000 replications. Moreover, the performances of the different estimators are compared based on the MSE through Monte Carlo simulations. The results are reported in Table 1.
From Table 1, it can be seen that the performances of the ML, IFM and CML estimates are satisfactory. In addition, as the sample size increases, the MSE decrease for all the parameters. It is observed that the IFM estimates are close to ML estimates, however, based on MSE these estimators are less efficient than ML estimators.
In addition, it is observed that the estimate of the copula parameter using CML method has a larger MSE than the ML and IFM estimates of this parameter which is consistent with the results found by Genest et al. (1995). Table 2 reports the Bayesian and ML estimates of the unknown parameters for n= 80, 100 and 150 along with their MSE.
From Table 2, it is observed that the MSE of ML estimates of θ 1 , λ 1 , θ 2 and λ 2 are smaller than their corresponding MSE of Bayesian estimates while MSE of Bayesian estimate of is smaller than its corresponding of ML estimate. Therefore, it can be concluded that the Bayesian method based on noninformative prior performs better than the ML method for estimating the copula parameter. Generally, it can be noted that the values of the copula parameter have small effects on the marginal parameter estimates.

Data analysis
The American football league data obtained from the matches played on three consecutive weekends in 1986 have two variables 1 and 2 where; 1 is the game time to the first points scored by kicking the ball between goalposts and 2 is the game time to the first points scored by moving the ball into the end zone, see Cso rgő and Welsh (1989). These times are of interest to a casual spectator who wants to know how long one has to wait to watch a touchdown or to a spectator who is interested only at the beginning stages of a game. Some basic statistics of 1 and 2 are presented in Table 3. It is observed from the histogram and scatter plots of the data that both 1 and 2 are right skewed and positively correlated. The sample spearman correlation coefficient between 1 and 2 is 0.804. Since both the marginals are right skewed and sample correlation coefficient is also within the possible range of the proposed BEP distribution, we use the BEP to model this bivariate data set.
Before progressing further, the natural question that arises here is whether the BEP distribution fits these data or not. Kundu and Gupta (2011) mentioned that we can test the marginals only, since there is not a satisfactory goodness of fit test for a general bivariate distribution functions. Hence, the CDF plots of the univariate EP distribution with the empirical distribution of the data are conducted to obtain the initial values of the marginals parameters. We have fitted the EP distribution to the marginals and obtained: ̂1 = 9.948,̂1 = 1.375,̂2 = 8.013, 2 = 1.142. Fig. 2 show the plots of the fitted and the empirical CDF for the two marginals based on ML estimates. The Kolmogorov-Smirnov (K-S) test values and the associated p values (reported in brackets) for 1 and 2 are 0.1871 (0.1057) and 0.1889 (0.087), respectively. Therefore, the K-S along with Fig. 2 indicates that the BEP distribution provides appropriate fit for this bivariate data set.
Also, the K-S test values and the associated p values (reported in brackets) for 1 and 2 using the Bayesian estimates are 0.1731 (0.161) and 0.1782 (0.1224), respectively. This indicates that the estimated model is appropriate for analyzing this data set.
In addition, to examine if the Gaussian copula is appropriate for this data, goodness of fit test is conducted and the result showed that it is suitable with (p value > 0.05) and the copula parameter based on maximum pseudo likelihood (MPL) estimated to be 0.88 which can be used as initial value when fitting BEP distribution. For details, see, Genest et al. (2009). Therefore, we use the ML estimates of the marginal parameter along with the estimate of copula parameter obtained based on MPL as initial values to fit BEP distribution to the data. Table 4 reports the ML, IFM, CML and Bayesian estimates, the standard error (SE), 95% confidence and credible interval of the BEP parameters. From Table  4, it can be seen that the IFM estimates of the marginal parameters are better than the ML estimates, while based on SE and the length of the intervals the ML estimate of copula parameter is better than the IFM estimate. Also, it is observed that the estimate of the copula parameter using CML method has a larger SE and larger interval length than ML and IFM estimates of this parameter. Based on SE and the length of the intervals the Bayesian estimates of the parameters are better than the ML, IFM and CML estimates. Now for comparison purposes, EL-Damcese et al. (2015) presented the analysis of a same real data set using the bivariate exponentaited generalized Weibull-Gompertz distribution (BEGWG) and compared it with bivariate exponentiated Gompertez distribution (BEG). Table 5 reports the ML estimates, the maximized log likelihood values ( ), Akaike information criterion (AIC), correct Akaike information criterion (CAIC) and Bayesian information criterion (BIC) test statistic for the BEG, BEGWG, and BEP distributions.
From Table 5 it can be seen that the BEP distribution has lowest AIC, CAIC, and BIC values compared the BEGWG and BEG distributions. Therefore, BEP distribution provides more appropriate and flexible fit compared to BEGWG and BEG for this data set.

Conclusion
In this paper, we have introduced BEP distribution as a flexible bivariate lifetime distribution. The proposed bivariate distribution derived from Gaussian copula with exponentiated Pareto distribution as marginals. It is observed that the generation of random samples from BEP distribution is simple, and therefore Monte Carlo simulation is performed to estimate the parameters using different methods of estimation. We have discussed some interesting properties of this new bivariate distribution using the Gaussian copula property. Monte Carlo simulation indicated that the performance of the ML, IFM, CML and Bayesian estimators are quite satisfactory. The proposed distribution applied to real life data set and the results of analysis showed that the BEP distribution provides more suitable fit than BEG and BEGWG distributions.