A novel selection model of random features for the estimation of facial expression

Estimation of facial expressions has been an important focus in several practical applications of machine vision and virtual reality; such as assessing the satisfaction level of customers in using products/services or modelling virtual broadcasters. In this study, we propose a novel approach in estimating the facial expressions based on the automatic mechanism to randomly select facial geometric features and organize them into a tree model. By testing with the standard dataset JAFFE, it is found that our proposed model is efficient and effective and should be considered in the practical implementation.


Introduction
*Facial expression has been one of the interesting research topics in several machine vision and virtual reality problems in practice. The researches can be classified into two major categories, including: (1) human facial expressions in combination with the face detection; and (2) human facial states in combination with face models. The recent advances in the computational hard-wares and related equipment offer huge advantages in developing human imitation models; especially, the representation of human facial expressions in 3D virtual reality has been widely employed in several applications, among which we can easily name the fictitious films called Avatar or Van Helsing with the monsters and wolf-men with their fine expressions and movements.
Besides, the identification or estimation of different states of facial expressions has also popularly used in other applications, such as the system on Google Glass eyeglasses to analyze human face developed by Fraunhofer IIS as shown in Fig. 1. In such applications, the quick and precise capture of facial features on the human face becomes one of the extremely important stages to produce satisfactory outputs.
Literally, the facial expressions are resulted from the movements of facial muscles which temporarily deform the facial features such as the eyelids, eyebrows, nose, lips, and skins (wrinkles, cutis anserine). In addition, the same facial expression can be differently interpreted as it heavily depends on the personal characteristics (age, gender, health, etc.). As such, there have been many different approaches for the facial identification and estimation problems. For example, by using facial features, some researchers use facial geometric points (Valstar and Pantic, 2007;Lucey et al., 2006), or image profile (Bartlett et al., 2006;Jiang et al., 2011), or both (Tian et al., 2001); whereas others consider the changes on the face chronologically (Jiang et al., 2011;Zhao and Pietikainen, 2007).
In this paper, we propose a novel approach to automatically select geometric facial features and organize them in a decision tree model to represent facial expressions.

Literature review
Active Appearance Models (AAM) is an algorithm used to determine key points on each face where each point has its own specific characteristics (Cootes et al., 2001;Viola & Jones, 2001). In AAM, a statistical model respective to the appearance of the object in an image combined with an optimal algorithm is used to identify the parameters representing the most appropriate model for the image. However, Baker and Matthews (2001) improved the performance of AAM by combining critical information obtained from 2D and 3D models; and they found that the improved approach results in better accuracy and real-time convergence in several particular cases (Xiao et al., 2004). Specifically, the interested object in an image is modeled with a set of control points describing its shape and structure which is actually the sample values of the image intensity within certain regions bordered by a set of control points as shown in Fig. 2 (our actual experiment). Literally, a statistical model for an object must be able to satisfactorily fully describe the variations of its shape and of its structure as well as the statistical correlation among them. The key controversial issues in this approach include the construction of a statistical model for the image object and the design of an optimal searching algorithm. Particularly, the construction of the model for the object consists of: (1) constructing a mathematical model for its shape and a model for the image structure; and (2) combining the two models to establish the expected model. And the optimal searching algorithm used in AAM is designed in such a way that the parameters of the model can be automatically estimated from the dataset and result in a constructed sample image which best describe the input image in term of minimizing the difference between the constructed and the input images.
Hien and Toan (2016) proposed a novel approach in statistically analyzing the shape parameters describing human face to detect nodding behaviors because there is a significant difference between a head in normal position and a nodding one. Fig. 3 shows the distribution of some shape parameters (Hien and Toan, 2016).

Fig. 3: Distribution of some shape parameters
From a set of input images labeled with control points and head position (normal or nodding), the model parameters were automatically and directly obtained from the data collected. From practical experiments with some parameters including pointpoint distance, point-edge distance, and triangle area, they conducted some statistics on the respective values and identified appropriate split thresholds. Some key characteristics with good split ability are used to detect nodding behavior.

Proposed algorithm
Literally, a shape human face can be effectively presented with a set of control points as discussed in Section 2; and the shape parameters are normally determined by the coordinates of one or some points in the set. As such, a shape parameter is actually the distance between 2 points or an area of any 3 points.
Consequently, we have many shape parameters to be considered. For example, for a set with 68 points, we consider the distance between 2 points; then we will have totally 2278 parameters. When the facial expression is changed, the coordinates of the points are also changed, resulting in the change of the parameters of the facial features. As a consequence, there is a significant change in the parameters which can be used to detect the change in the facial expressions.
Instead of manually identifying the geometric parameters as discussed in (Hien and Toan, 2016), we propose an automatic mechanism to select and organize them under a decision tree to estimate the facial expressions. Specifically, the tree consists of nodes; each node is a decision function learned for a particular facial feature. After a human face is successfully positioned, the set of control points is sent to the decision tree for critical analysis. For the final conclusion, the estimation values of facial expressions are determined by the average of all learning samples at the end-node.

A model of decision tree
Developed from previous researches by Hien and Toan (2016), our decision tree is established from the set of training dataset with the following structure: {( , , ): = 1,2, … , } where, S is the size of the sample, vs  [0,1] is the correct label value of sample Is and ws is respectively the weight of the sample.
In our study, ws represents the importance of each input sample in the training dataset. At each node, we select the most appropriate function which provides the best classification ability for the dataset, i.e. the objective function obtains its minimum value. Particularly, our proposed objective function is determined by: where, C0 and C1 are two clusters in the training dataset, respectively the results of 0 and 1. The ̅ 0 and ̅ 1 are respectively the averages of the label values in C0 and C1.
In other words, at each node, we consider decision functions established from the shape features and identify the best one to minimize the objective function. Thus, from the original dataset, at each learning stage of each node during the construction of the tree, the training dataset are accordingly classified into two clusters. Our proposed algorithm for the learning of each node is as shown in Fig. 4

Shape parameters
From the model of decision tree described above, shape parameters should be determined. We propose using the following three approaches.

Triangle_ Triangle
With 6 input points p1, p2, p3, p4, p5, and p6, let a, b, c denote the distances of the three points pi, pj, and pk, i.e. a = d (pi,pj), b = d(pj,pk), c = d(pi,pk). Then, let S(pi,pj,pk) denote an area of triangular formed by pi, pj, and pk; thus, we have S(pi,pj,pk) = S (a,b,c) and S(a,b,c) is determined by Heron's formula as the following:

Decision function
A decision function is constructed from a certain feature and a decision is made by comparing the obtained value against a threshold. The threshold is computed from an input dataset at each node. Specifically, in constructing a decision function at each node, a set of shape parameters are randomly generated, thus a set of decision functions are accordingly generated. With the threshold estimation for each function in the set, a function with the minimum error value is selected for the node. The algorithm for the threshold computation is proposed as shown in Fig. 5.

Empirical tests
Our empirical tests use the Japanese Female Facial Expression (JAFFE) database which contains 213 images of 7 facial expressions (6 basic facial expressions including happy, sad, surprise, angry, disappointed, panic and one neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects. A 5level scale was used for each of the 6 adjectives (5high, 1-low). Specifically, the files contain semantic rating data from psychological experiments using the images and the expression labels on the images represent the predominant expression in that image -the expression that the subject was asked to pose. Input: U ={(Is,vs,ws) In our empirical test with the JAFFE database, firstly, control points on all images in the database are positioned as shown in Fig. 6 so that shape parameters are calculated. With the control points identified on each image, we test the performance of our proposed approach through a cross validation technique; particularly, the dataset is divided into 6 groups for 6 different tests. In each test, one group is used for official test while other five groups are used for training. As such, the average of estimation error on each type of facial expression from our proposed approach is easily computed and shown in Table 1.
From our empirical tests, we have also found that a statistical relationship between thresholds and its accuracy as shown in Fig. 7.

Conclusion
Facial expression in images has been one of the interesting research topics in the field of image processing and widely employed in many practical applications. There are two major problems to be considered: (1) identifying facial expressions; and (2) demonstrating facial expressions, in which estimating facial expressions is the core issue. In this paper, we propose an estimation approach based on a random selection of geometric features via a decision tree. Through our empirical tests on a standard database, our approach provides satisfactory results which encourage us to have further research in imitating real human actions.