Comparison of some multivariate normality tests: A simulation study

Article history: Received 5 September 2016 Received in revised form 10 November 2016 Accepted 7 December 2016 Many classical multivariate statistical methods are mostly based on the assumption of multivariate normality. Departures from normality, called non-normality, render those statistical methods inaccurate, so it is important to know if datasets are normal or non-normal. Especially in medical and life sciences, most statistical tests required the assumption of multivariate normality have been extensively used. In this study, after summarizing the properties of several most widely used multivariate normality tests, we aim to compare the power and type I error rates of these tests, which have been developed in recent years by many researchers. So, the reader will elucidate the differences and the similarities/superiorities and weaknesses of the tests in order to make the appropriate choice in their practical applications. For this purpose we carried a Monte Carlo simulation study with nominal α level, small, medium and large sample size, different dimension and multivariate distributions which includes different skewness and kurtosis. In conclusion, the results obtained from the comparative study are given.


Introduction
1Many statistical methods for continuous variables assume an underlying multivariate normal distribution. This methodologies have been widely applied in researches. Thus, several formal tests were developed for assessing the multivariate normality of a set of random variables. Let X=(X1, X2,…, Xp) be a vector of p random variables distributed as a multivariate normal with mean µ and covariance matrix Σ, that is null hypothesis to be tested can be written as Η 0 : X ~ Np(µ, Σ).
Various multivariate normality tests, which have been included in the literature in decades, will be discussed in the second part of this study, while in the third section the Monte Carlo simulation study results, which compares the type I error rates and their powers of various multivariate normality tests in respect of the different numbers of variables, the sizes of samples and significance levels (p, n, α), will be revealed. Some results and discussions obtained from the comparative study in section 3 are given in section 4. Finally, conclusions are given in section 5.

Methods
Various multivariate normality tests which have been included in the literature in decades will be summarized in this section of the study.

Villasenor-Alva and Gonzalez-Estrada's generalized Shapiro-Wilk (GSW) test
Villasenor Alva and Estrada (2009) proposed a goodness of fit test for multivariate normality which is based on Shapiro and Wilk (1965) statistic for univariate normality and on an empirical standardization of the observations. Assume X1, …, Xn, are independent identically distributed random vector in R p , (p ≥ 1). Let N p (µ, Ʃ) denote the p-variate normal density with mean µ and covariance matrix Ʃ . Let 0 be the null vector of order p and let I be the identity matrix of order p×p. To test the null hypothesis H0: X1, …, Xn is a sample from N p (µ, Ʃ), where µ and Ʃ are unknown, they proposed the test statistic as * = 1 ∑ =1 (1) where is Shapiro-Wilk's statistic evaluated on the ith coordinate of the transformed observations Zi1, …, Zin , i=1,…,p. The test based on * rejects H0 at a test size α if W * < c α; n, p , where c α; n, p satisfies the equation = Ρ{ * < ; , | 0 holds}. Kankainen et al. (2007) obtained generalizations of classical Mardia's measures of skewness and kurtosis by using special choices of location and scatter estimators.

Kankainen-Taskinen-Oja's skewness (b1, new) test
If the multivariate measure of skewness is constructed so that T1 and C are the sample mean vector and sample covariance matrix and T2 is the one-step M-estimator that uses T1 and C as initial estimators and weight function v1(r) = r 2 , then the resulting skewness measure is easily seen to be (2) Note that this measure is equivalent to that introduced in Móri et al. (1993). The limiting distribution of b1,new is that of η1U1, where U1 ∼ 2 and η1 = 2(p+2)/p 2 . See also Henze (1997).

Szekely -Rizzo's -statistic (Energy Statistic) for testing multivariate normality
The -test of multivariate normality was proposed and implemented by Székely and Rizzo (2005). The test statistic for d-variate normality is given by  (Székely and Rizzo, 2005). Henze and Zirkler (1990) test which is another multivariate normal known for good power, is based on the empirical characteristic function. As an appealing property of this test have been stated that it was a consistent test. The test statistic is based on a nonnegative functional D(.,.) that using characteristic functions to measure the distance between the hypothesized function and the empirical function. In order to consistent test statistic, the functional D(.,.) must equal zero, providing that the data is multivariate normal distribution. The non-negative functional is given as

Henze-Zirkler (HZ) test
where ̂( )is the characteristic function of the proposed distribution and ̂( )is the empirical characteristic function. The functional consists of a is a smoothing parameter. The test statistic was proposed as where 2 is the squared Mahalanobis distance betwwen two given observations and 2 is the squared Mahalanobis distance from a given observation to the centroid. The test statistic is approximately lognormally distributed and this distribution is used to find the critical values of the test. The test rejects H0 for large values of Tβ(p).

Royston's H multivariate normality test (1992)
Let Χ 1 , Χ 2 , . . . , Χ be a multivariate random sample of size n, where Χ ∈ and if Χ (1) , Χ (2) , . . . , Χ ( ) denotes an ordered univariate sample for the kth variate, the Shapiro-Wilk's test statistic is is the sample mean, and ̃ the estimator of the normalized best linear unbiased coefficients for = 1,2, . . . , . Royston (1992) suggested a normalizing transformation of Wk, obtaining a standard normal score Zk, for k = 1, 2, . . . , p. Next, the statistic is calculated, denotes the standard normal cumulative distribution function. Finally, the test statistic proposed by Royston (1992) was defined by that is approximately 2 , where ν is referred to as the equivalent degrees of freedom, since are not independent. See also an estimate for ν based on the method of moments (Royston, 1983;Cardoso De Oliveira and Ferreira, 2010).

Simulation study
A simulation study for the comparison of the tests introduced in the previous section is discussed in this section. This simulation study is comprised of two parts. In the first part of the study, the Generalized Shapiro-Wilk test (GSW), which has been proposed by Villasenor-Alva and Gonzalez-Estrada, the Kankaiken-Taskinen-Oja's skewness test (b1,new), the Kankainen-Taskinen-Oja's kurtosis test (b2,new), the Energy test, the Henze-Zirkler (HZ) test and the Royston (1992) test statistics are compared in terms of type I errors. The second part is comprised of the power comparisons of these tests. In the simulation study, the iteration number has been taken to be 10000, and an R code programme has been prepared for each test procedure. The results which have been obtained have been displayed using tables and figures. For the comparison of type I errors and the comparisons in respect of the power of the tests, the number of variables has been taken to be p = 2, 3, 5, 10, and the sizes of the samples have been taken to be n=20, 50, 100 and 200. Comparisons in terms of type I errors and power of tests have been performed for nominal α = 0.05.
In the Monte Carlo simulation study the selection of the alternatives to be used against a multivariate normal distribution is quite important. The purpose here is to define the multivariate normality tests which possess the best performance in many situations. The first distribution which needs to be paid attention to is the multivariate normal distribution. There are two reasons for the comparison of tests according to normal distribution. The first is to check to be sure that the algorithms used in the calculation of the test statistics has been programmed correctly, while the second is used to be sure that the tests reject normality only at approximate nominal α levels (thus making a type I error).
The study will consider several normal mixture models in order to perform the simulation of the sample units derived from two different populations as a second distribution. The general form of the normal mixture model is ̃Ν ( 1 , Σ 1 ) + (1 −̃)Ν ( 2 , Σ 2 ). The ̃ here is the contamination parameter, and specifies the rate of the sample obtained from one population. Three contamination levels will be reviewed. The first level is ̃= 0.9 (90%, 10%), and contains mild contamination. Therefore, it is skewed and leptokurtic. The second levels is ̃= 0.788675 (78.8675%, 21.1325%), and contains moderate contamination. It is skewed and mesokurtic. The third level is ̃= 0.5 (50%, 50%), and contains severe contamination, and is symmetrical and platykurtic. The second normal mixture model has quite an interesting characteristic. The kurtosis of this model is normal, but the model has a distribution which is not normal. Therefore, it is expected that the kurtosis tests of our data set will be of low power. The total number of normal mixture models considered in the study is 6. In these models, the symbols are defined as follows: μ 1 : Mean vector whose components are zero; μ 2 : Mean vector whose components are one; I: Unit matrix with a dimension of × ; Σ 1 : Quadratic matrix with a dimension of × , whose diagonal components are 0.2; Σ 2 : Quadratic matrix with a dimension of × , whose diagonal components are 0.5; Σ 3 : Quadratic matrix with a dimension of × , whose diagonal components are 1, and nondiagonal components are 0.5 (Mecklin and Mundfrom, 2000).
Thus, the normal mixture models have been obtained as below: 1. 0.9Ν (0, Σ 1 ) + 0.1Ν (1, Σ 2 ) 2. 0.788675Ν (0, Σ 1 ) + 0.211325Ν (1, Σ 2 ) 3. 0.5Ν (0, Σ 1 ) + 0.5Ν (1, Σ 2 ) 4. 0.9Ν (0, Σ 3 ) + 0.1Ν (1, Ι) 5. 0.788675Ν (0, Σ 3 ) + 0.211325Ν (1, Ι) 6. 0.5Ν (0, Σ 3 ) + 0.5Ν (1, Ι) The elliptical contoured distributions considered in the simulation study are symmetrical distributions whose contours possess equal intensity. These distributions have an elliptical shape. The form of the density function of the Here, x is a random vector, kp is a fixed scaler, and g(.) is a non-increasing positive function. The elliptical contoured distributions are shown with ECp(µ, Σ, g). Multivariate normal distribution with values are a special case of the elliptical contoured distribution. Elliptical distributions are symmetrical distributions which have a close relationship with normal ones. Therefore, mild departures occur from normality. The type II Pearson distribution and type VII Pearson distribution have been used as an example of elliptical distributions. Johnson (1987) has stated that type II and type VII Pearson distributions are quite suitable for Monte Carlo studies. That is because these distributions are easy to generate, and they cover many of the elliptical distribution family. In our study, the Pearson type II distribution has been generated for different shape parameters of m = 2, 4, and 10. For m =10 this distribution is quite close to multivariate normality. Multivariate t and Cauchy distributions are a special case of the Pearson type VII family. A multivariate Cauchy distribution with v = 1 degrees of freedom and a multivariate t distribution with v = 10 degrees of freedom are obtained. In our study, the type VII Pearson distribution with v = 1 and 10 degrees of freedom has been generated. Other than these, a multivariate t distribution and a multivariate Cauchy distribution with 2 degrees of freedom has also been generated.
Distributions where severe departures are seen from normality possess skewness, and are outside the elliptical contoured distribution family. The multivariate normality tests, which are being examined, are expected to display high levels of power for these types of distributions. Multivariate chi-square and multivariate lognormal distributions belonging to this category have been considered in our simulation study. Both of these distributions exhibit a heavily skewed and non-normal kurtosis. Therefore, they are expected to easily determine that all of the tests have nonnormality and thus to possess high levels of power. The multivariate chisquare distribution is the extended form of the well-known univariate chi-square distribution. The chi-square distribution with 1 degrees of freedom displays positive skewness. Therefore, the multivariate chi-square distribution with 1, 2 and 4 degrees of freedom has been generated in our study. The values of the lognormal distribution are inevitably positive and the distribution is positively skewed. Therefore, it is expected to display a good performance in terms of the determination of deviations from multivariate normality of all tests. A simple logarithmic transformation will bring the data from lognormal distribution to normality.
Distributions, where the univariate marginal distributions are normal but joint distributions are not normal, are cases where the departure from multivariate normality is theoretically interesting. It is not possible to detect this situation using only the univariate method, and it is problematic even for multivariate methods. One of the multivariate distributions which fits this definition is the Khintchine's family of distribution. This distribution is expected to clarify the real power of the tests. It is expected that the tests do not behave well when detecting that the data does not possess multivariate normality.
Another case which is theoretically interesting is the multivariate nonnormal distributions which possess multivariate normal skewness and kurtosis. The family of generalized exponential power distribution possesses this feature. Horswell (1990) has used two members from this distribution family in his own study.
Apart from these distributions, the symmetrical and multivariate Laplace distribution has also been used in our simulation study. The multivariate Laplace distribution is the multi-dimensionally extended form of the univariate, symmetrical Laplace distribution. This distribution is leptokurtic. Therefore, it is one of the alternative distributions which can be used against multivariate normal distribution (Farrell et al., 2007;Székely and Rizzo, 2005).
The purpose of this study is to examine the power of the 6 multivariate normality tests, which have been developed in the recent years, in order to evaluate multivariate normality via the simulation. With this purpose in mind, the Monte Carlo simulation has been used. The sizes of the samples within the study have been determined as n= 20, 50, 100, and 200, while the numbers of the variables have been determined as p= 2, 3, 5, and 10. The data sets have been generated from 4 different sizes of samples and 4 different numbers of variables, from 21 different multivariate distributions. As a result, the simulations of 21x4x4=336 cases have been tested with significance level of α=0.05. 10000 simulations of each combination have been performed. It has been chosen to restrict the sample sizes and numbers of variables in order to keep the required calculation time for the simulation at reasonable levels and to be able to evaluate the small sample sizes for multivariate analyses. According to researchers, these sample sizes define the most critical cases of the multivariate normality assumption (in terms of type I and type II errors). The multivariate normality tests examined in the study are the generalized Shapiro-Wilk test (GSW), which has been proposed by Villasenor-Alva and Gonzalez-Estrada, the Kankainen-Taskinen-Oja's skewness test (b1,new), the Kankainen-Taskinen-Oja's kurtosis test (b2,new), the Energy test, the Henze-Zirkler (HZ) test and the Royston (1992) test. The Monte Carlo simulation method has been used to determine the power of the tests. In this context, several multivariate distributions have been considered. These distributions are multivariate normal distribution, multivariate normal mixture distributions (which contain various contamination levels, means and variances), the Pearson type II and Pearson type VII distributions from the elliptical contoured distribution family, the multivariate t distribution, the multivariate Cauchy distribution and multivariate Laplace distribution, from among the symmetrical distributions, the multivariate chi-square distribution and multivariate lognormal distribution from among the heavily skewed distributions, the Khintchine distribution and the generalized exponential power distribution, from among distributions which, according to their features belong to multivariate normal distributions, but which are not themselves multivariate normal distributions.
When making a comparison in terms of type I errors, empirical type I error values have been obtained for the different p, n at nominal α value. For each test, the convergence level of the empirical type I error to the nominal α value has been examined. The rejection rates of the null hypothesis for multivariate normality have been taken as empirical type I errors, and have been obtained for every case and every distribution from every test, and presented in tables. Thus, the empirical power of each test statistic has been determined.

Results and Discussion
The empirical type I error rates and powers of the generalized Shapiro-Wilk test (GSW), which has been proposed by Villasenor-Alva and Gonzalez-Estrada, the Kankainen-Taskinen-Oja's skewness test (b1,new), the Kankainen-Taskinen-Oja's kurtosis test (b2,new), the Energy test, the Henze-Zirkler (HZ) test and the Royston (1992) test, which have been selected for the application part of this study in the multivariate normality tests, have been compared using the Monte Carlo simulation method. The first distribution on which the simulation was carried out was the multivariate normal distribution. In this case, the null hypothesis that the data set has been generated from a multivariate normal distribution will be tested. It is expected that the empirical rejection rates of data sets generated from multivariate normal distributions are close to the nominal α significance level. A rejection rate considerably higher than the α level will mean that there is a problem in the type I error rate. The other distributions considered in this study will display deviations from multivariate normality, which vary between mild to severe. Under these circumstances it is necessary to assess the null hypothesis as being incorrect, and to reject it. Low rejection rates will mean, in particular, that there is a problem in the type II error rate in the comparisons with the other tests, as well a problem in the power of the tests.
The rates of rejection of the H0 hypothesis in connection with tests of the multivariate normal distributions have been given as a percentage, as in Table 1. When Table 1 is examined, it is observed that the rejection rates of the H0 hypothesis in certain tests is much lower than the nominal α level, and in some it is higher than the nominal α level. The rates of rejection of the H0 hypothesis in the Kankainen-Taskinen-Oja (b1,new) skewness test show the changes between 0.00 to 5.50 for α=0.05. The rates of rejection of the H0 hypothesis in the Kankainen-Taskinen-Oja (b2,new) kurtosis test show the changes between 0.77 to 8.65 for α=0.05. The b1,new and b2,new tests show especially severe deviations from the nominal α significance level at the situation where n = 20. The other tests possess rates of rejection at nominal α level for the multivariate normal data. According to the Henze-Zirkler (HZ) test; generalized Shapiro-Wilk (GSW) test, and the Energy and Royston tests, even though the nominal α level is a little further away from the significance level at n = 20, in general all four tests possess consistent rejection rates. The results for n=20 and p= 2, 3, 5, and 10 related to the multivariate distributions are shown in Table 2. It is expected that the kurtosis test will be the least sensitive to the lognormal distribution. As can be seen from Tables 2 and 3, in most cases the test with the lowest empirical power is the kurtosis (b2,new) test. In addition, the b2,new test displays a lower power than the other four tests at n = 20 and 50, and its power is seen to rise as the sample size increases. The Royston test is more powerful for n = 20, p=2 and 3, and the GSW test is more powerful for n = 50, p = 2. At n = 100 and p=5 and 10 the empirical power of all tests are 100%. The Energy test has the lowest empirical power at n = 100, p = 2, 3 and n = 200. When n = 200, the empirical power of the Energy test decreases.
For multivariate Cauchy distribution, the differences between the tests can be seen when n = 20. There is quite an interesting situation for the Energy test.
While the Energy test possesses the highest empirical power when n = 20 and 50, when n = 100, p=2, 3 and n = 200, it has the lowest empirical power. A consistent increase is seen in the power of the b1,new test based on the sample size. Therefore it can be said that the b1,new test is sensitive to sample size. However, it still displays a lower power than the other tests for n = 20 and 50. The GSW test possesses the lowest power (44.04 %) at 0.05 significance levels for n = 20 and p = 10. Something else which stands out here is that the b1,new test also displays quite low power (56.65%) at 0.05 significance levels for n = 20 and p = 10. All tests possess the maximum empirical power (100%) at n=50, 100 and p = 10, n = 100 and p = 5.
It is expected that the skewness tests in the multivariate Laplace distribution display low power. Where the sample size is 20, it is observed that the empirical power for all tests are low. However, when the samples sizes rise, the empirical power of the tests increases. Even still, there is no situation where all of the tests reach the maximum (100%) empirical power levels (in connection with n and p). The test with the highest empirical power in all cases with this distribution is the Energy test.
For multivariate t(2) distribution as with the multivariate Laplace distribution, the test with the lowest power is the b1,new test, and the empirical power at the 5% significance level for p = 10, n=20 is 5.08%. The empirical power of the Energy test increases when the sample size changes from n = 20 to n = 50 and 100, but when the sample size is n = 200, this has a contrasting effect. Despite the fact that the b1,new test has low empirical power, it is sensitive to sample size. In the same way, when the sample size for the GSW, HZ, Royston and b2,new tests increases, an increase is also seen in their power. All tests reach 100% power at n=100 and p =10.
The skewness and kurtosis values of the generalized exponential power distributions are equal to the multivariate normal distribution values, but the distribution is not a multivariate normal distribution. Most of the tests display low power for the generalized exponential power distribution. In particular, the b2,new test displays a poor power performance. The b2,new test can be said to be the worst test for this distribution. Apart from this, it is also seen that the b1,new test has the lowest power (0.00%) of all the tests for n = 20. Therefore, it possesses the worst power performance for n = 20 and p = 10. However, as the sample size grows, an increase in the power is also seen. When n = 20 and 50 for the b1,new test, a decrease in power is seen together with an increase in the number of variables. The Royston test and the GSW test can be said to display good levels of power for the generalized exponential power distribution.
When Tables 2-5 is examined in order to the Khintchine distribution, it can be seen that in particular the GSW and Royston tests display quite low power. While the empirical power values for the GSW test show the changes between 3.06 and 4.89, the empirical power values of the Royston test changes between 4.85 and 6.19. The b1,new test possesses the lowest empirical power (0.00%) for n = 20 and p = 10. The Energy test can be said to display better empirical power than the other tests for this distribution.
As it is known that the multivariate chi-square distribution with 1 degrees of freedom is a severely skewed distribution with heavy tailed, it is expected that the values related to the skewness test will be high. When Tables 2-5 is examined, it can be seen that the b1,new test is sensitive to sample size. However, as the number of variables increases at n = 20, the power of the b1,new test decreases. An interesting situation seen with the b1,new test is that it displays quite low power (4.01 for α=0.05) at p = 10 and n = 20. The b2,new test displays low power at n = 20, while the remaining four tests display quite a good power performance.
All tests reach maximum power (100%) at n = 100, p =5, 10. The Royston test reaches 100% power in almost every case. When the multivariate chi-square distribution with 2 degrees of freedom in Tables 2-5 is examined, it can be seen that all tests are sensitive to sample size. However, when n = 20, p = 10, the b1,new test has the lowest power (0.00 for α=0.05).  The b2,new test has less power than the other tests. All tests reach maximum power (100%) at n = 200, p= 5. The test with the best performance for the distribution is the Royston test. According to the results related to the multivariate chi-square distribution with 4 degrees of freedom, there is no situation where all of the tests which have been considered reach maximum. Increases are seen in the empirical power of all tests as the sample size increases. The b2,new test is more powerful than the other tests. When n = 20, p = 10, the b1,new test has the lowest power performance (0.00%). As the number of variables increases at n = 20, an increase is seen in the power of the GSW and Royston tests. The best test for this distribution is the Royston test.
The type II Pearson distribution, for which the shape parameter is m = 2, is an elliptical contoured distribution with a short tailed. The power levels of all tests other than the skewness test (b1,new) display a change according to the sample size. The b1,new test has the worst power performance. Apart from this when p = 10 the power at all sample sizes is 0.00. The b2,new test does not display good performance either, while the power levels of the GSW and Royston tests show a rapid increase based on the sample size. Both the GSW and Royston tests have a power of 100% at p=5, 10 and n = 200. When the results related to the type II Pearson distribution, for which the shape parameter is m = 4, are examined in Tables 2-5, as with the previous distribution, the test with the worst power performance is the b1,new test. The power levels of the GSW and Royston tests display a lower increase for the type II Pearson distribution with the m = 4 shape parameter. The HZ and Energy tests display lower power than these two tests. None of the tests are able to reach maximum power (100%). When the results related to the type II Pearson distribution, for which the shape parameter is m = 10, are considered in the tables, the power levels of all tests decrease based on the increase in the shape parameter m. The test with the highest power when m = 10 is the HZ test, but this power level only has the low value of 6.55%. For n = 20 and 50, the power levels of all tests remain beneath the nominal α significance level.
For the results related to the type VII Pearson distribution with 1 degrees freedom in Tables 2-5, it can be said that the b1,new test is less powerful than the other tests when n = 20 and 50. An increase is seen in the power levels of the GSW, b2,new, HZ and Royston tests based on the sample size and number of variables. While the Energy test displays consistent power levels at n=20, 50 and n = 100, p =5, 10, it has quite low power at n = 100, p = 2, 3 and n=200. An important decrease is seen in the power levels of the Energy test, in particular when n = 200. All tests reach maximum power at n = 50, p =10 and n = 100, p = 5, 10. The GSW and Royston tests have the best power performance for this distribution. When the results related to the type VII Pearson distribution with 10 degrees of freedom are examined, a decrease in the power levels of the tests has been observed together with the increase in degrees of freedom. There is no situation when all of the tests reach maximum power. The b1,new test has the lowest empirical power (0.00%) at n=20, p=10. The power of all tests display increases based on sample size. As the number of variables rises for all sample sizes, the power of the GSW and Royston tests increases. However, when n = 20, the power of the other four tests decreases as the number of variables increases.
According to the results related to the type I multivariate normal mixture distribution with three different contamination levels (0.9-0.1, 0.788675-0.211325, 0.5-0.5) in Tables 2-5, while the power of the tests increases when the contamination parameter falls from 0.9 to 0788675, when the contamination parameter falls from 0.788675 to 0.5, the power of the tests decreases. At all three contamination levels, the b1,new test has the lowest power (0.00%) at n=20 and p = 10. It can be seen that the power levels of the tests increase at different sample sizes and numbers of variables, as the sample size increases, at the first contamination level (0.9-0.1). When n = 20, the power of the b1,new test decreases based on an increase in the number of variables. When p = 5 and 10 lower power values are obtained for the HZ test. The power of the Royston test increases together with an increase in the number of variables, while the power of the GSW test decreases. Therefore, the test with the best power performance at the first contamination level is the Royston test. It can be seen that the power of the tests increases at different sample sizes and numbers of variables, as the sample size increases, for the second contamination level (0.788675-0.211325). As with the previous distribution, when n = 20, the power of the b1,new test decreases based on an increase in the number of variables. When the number of variables increases, an increase is seen in the power of the b1,new, b2,new, Energy and Royston tests, while the power of the GSW test decreases. Additionally, when p = 5 and 10 lower power values are obtained for the HZ test. As a result, it can be said that the Royston test displays a more powerful performance with the multivariate normal mixture distribution at the second contamination level. As the sample size increases, it can be seen that there is also an increase in the power of the tests. When the number of variables increases, the power levels of the b2,new, Energy, HZ and Royston tests also increase. In general, it can be said that the Royston test displays a more powerful performance with the multivariate normal mixture distribution at the third contamination level.
According to the results related to the type II multivariate normal mixture distribution with three different contamination levels (0.9-0.1, 0.788675-0.211325, 0.5-0.5) are examined, it can be seen that all tests possess quite low power levels for all three contamination levels. While the power of the tests increases when the contamination parameter falls from 0.9 to 0788675, when the contamination parameter falls from 0.788675 to 0.5, the power of the tests decreases. At all three contamination levels, the b1,new test has the lowest power (0.00%) at n=20 and p = 10. It can be seen that the power of the tests increases as the sample size increases. As a result, it can be said that the test with the best power performance at the first contamination level for the multivariate normal mixture distribution is the GSW test.
It can be seen that the power of the tests increases at different sample sizes and numbers of variables, as the sample size increases, at the second contamination level (0.788675-0.211325). The Energy test has better power than the other tests at p = 3, and 5. At p = 10, the b2,new test displays the highest empirical power. As the power of the tests increase at different sample sizes and numbers of variables, as the sample size increases, at the third contamination level (0.5-0.5), it is observed that there are increases in the power levels of the GSW, b1,new, b2,new, Energy and HZ tests. It is seen that the power levels of the b1,new test decrease when the number of variables increases for n = 20, the power of the b1,new test increases together with the number of variables for n = 50, 100, 200. The Energy test provides better power results for the multivariate normal mixture distribution at the third contamination level.

Conclusions
In this study, the generalized Shapiro-Wilk test (GSW), which has been proposed by Villasenor-Alva and Gonzalez-Estrada, the Kankaiken-Taskinen-Oja's skewness test (b1,new), the Kankainen-Taskinen-Oja's kurtosis test (b2,new), the Energy test, the Henze-Zirkler (HZ) test and the Royston (1992) test have been introduced for testing of the null hypothesis in connection with multivariate normality, and comparisons have been made concerning the empirical type I errors rates of these tests and their power levels. In the comparisons concerning the type I errors for n=20, the b1,new and b2,new tests display quite poor results, while the other four tests display better results, close to the nominal α level. As the sample size increases an improvement in the b1,new and b2,new tests is seen.
In general, in all of the comparisons made concerning the power of the tests, the b1,new test gave the worst results. The b1,new test particularly possesses bad performance for symmetrical and leptokurtic distributions. Therefore, it will be more appropriate to use alternative tests instead of these ones. The b2,new test has given bad results for heavily skewed distributions, but good results for the type II multivariate normal mixture distribution. While the Energy test has poor performance in the symmetrical and platykurtic distributions, it does display better power performance for symmetrical and leptokurtic distributions. The GSW and Royston tests have provided good results for platykurtic tests.
As a result, the Energy and Royston tests have more powerful for type I normal mixture distributions, while the b2,new test has more powerful for type II normal mixture distributions. It has been observed that the GSW and Royston tests are more powerful for elliptical, skewed and generalized exponential power distributions. The Royston test has been found to be more powerful for symmetrical distributions, while the Energy test has been found to be empirically more powerful for the Khintchine distribution.