Brief review of facial expression recognition techniques

In this era of technology, we need applications which could be easy to use and are user-friendly that even people with certain disabilities could use them easily. Many applications exist for human behavior understanding, detection of mental disorders, and synthetic human expressions in the domain of automatic facial recognition systems. Generally, most of the publications propose two methods for automatic Facial Expression Recognition (FER) systems i.e. geometric based and appearance based approach. Much work has been done on the static analysis where facial expression recognition had been performed on still images. While facial expressions are naturally dynamic, they are not easy to detect so the focus of the study is now shifted to find new methods which would be helpful to improve accuracy, lower computational cost, and less memory consumption. This paper demonstrates a quick survey of facial expression recognition by analyzing various algorithms; evaluated by comparing their results in general which in turn broadened the scope for other researchers they could efficiently offer a solution to related problems.


Introduction
*Judging mental state of a person is one of the difficult tasks. The Best way to understand an emotional state of a person is through facial expressions (i.e. happy, sad, fear, disgust, surprise and anger) (Darwin et al., 1998;Mehrabian, 1968;Ekman and Friesen, 1971). The automated analysis of facial expression (FER) (4) is a challenging task in the field of computer vision. Its implementation is not restricted to mental state identification only (Mandal et al., 1998), it is also applicable in the security domain (Butalia et al., 2012), automatic counseling systems, face expression synthesis, lie detection, music for mood (Dureha, 2014), automated tutoring systems (Wu et al., 2008), operator fatigue detection (Zhang and Zhang, 2006) etc.
Facial expression is a natural nonverbal communication language. A person can express his or her sentiments/ state of mind through facial expressions but sometimes these expressions are not good enough for recognition systems they have to be more refined to get right results. This issue still needs an attention, but many algorithms have been proposed so far to handle these vague expressions (Hsieh et al., 2010).
Facial expression is formed by relaxing or contracting different muscles of human face (Chin and Kim, 2009) which results in deformed facial features (Fasel and Luettin, 2003). According to Chin and Kim (2009) and Ekman and Friesen (2003) facial expression are rapid signals which differs with change in of facial features like open mouth, raising eyebrows, lips, eyes, cheeks etc., and these features affect the accuracy of a system. Whereas skin color, gender, age etc., and slow signals affect rapid signals.
As shown in Fig. 1, FER process consists of five phases. The noise is reduced and enhanced in the pre-processing phase by taking image or sequence of images (series of images from neutral face to peak expression face) as an input and returns the face for more processing.
Region of interest (ROI) is extracted from facial components i.e. nose, mouth, eyes, cheeks, eyebrows, forehead, ear, etc. Extractions of ROIs are performed in feature extraction phase. Techniques which are used for feature extraction are Local Binary Patterns (LBP) (Ojala et al., 1996), Independent Component Analysis (ICA) (Bartlett et al., 2002), Principal Component Analysis (PCA) (Turk and Pentland, 1991), Local Gradient Code (LGC) (Tong et al., 2014), Linear Discriminant Analysis (LDA) (Belhumeur et al., 1997), and Local Directional Pattern (LDP) (Jabid et al., 2010). In next phase of classification, classifier classifies the features into their respective classes based on facial expressions with the help of defined classification methods which include SVM (Support Vector Machine) (Hsu et al., 2003) and NN (Nearest Neighbor) (Altman, 1992). This paper provides a survey based timeline view which performs an analysis on different technique to handle facial expressions to recognize faces. Lastly the evolution has been done by comparing the results of recognition with different algorithms. Table 1 summarizes facial expression recognition techniques that have been used in the literature by multiple researchers. Pu et al. (2015) used Action Units for Facial Expression recognition and analysis by using to random forest classifier in a video. First random frog will detect action units and these detected AUs are classified by second random forest which detects expressions. On first frame Facial Landmarks are generated by active appearance Model (AAM) landmarks are tracked throughout the sequence of frames in a video by Lucas-Kanade optical Flow tracker. A displacement vector is created between natural and Peak expression. First Random forest detects Action units from DNNP features and these AUs are sent to 2 nd Random Forest as an input that then process these AUs into Facial Expressions. The proposed methods of facial expression recognition achieve the accuracy rate of 89.37% for the two-fold Random Forest classifier can achieve accuracy rate of 96.38%. The results have been achieved by randomly selecting training and testing sets from the database 9 times. Radlak and Smolka (2016) combined 2 facial detection techniques, i.e., Zhu and Ramanan (2012) method and Dlib detector (Radlak and Smolka, 2016). First facial detection is done by Dlip library, if face is found Kazemi and Sullivan (2014) technique is used for detecting facial landmarks. If Dlib fails, Zhu and Ramanan (2012) technique is used. Result indicates that Zhu and Ramanan (2012) detector produce worst results then Kazemi and Sullivan (2014) detector. Detected face normalization was done by affine transformation that excluded face contour. Removing the background around detected face held diminish its effects in facial classification. For classification, previously detected facial landmarks were used as center point to extract multi scale patches for generating feature vectors. Uniform Local Binary Pattern histogram was computed for every area within this piece. At the end all histogram were combined to create high dimensional feature vector. For feature extraction, Random Frog algorithm for fast feature selection was used. At the end Support Vector Machine "one-on-one" method for multiclass classifier was applied. The proposed technique obtained the best classification accuracy of 36.93% on validation set. Best results were achieved in case of anger, neutral, and happy whereas all other areas preform really bad. Disgust and Fear having worst performance of all.

Literature review
A modified local binary pattern is applied which conduct not only regular but also horizontal and vertical neighbor pixel comparison which gives distinctive facial feature representation. To optimize these features Micro Generic Algorithm embedded with Particle swarm optimization (mGA-embedded PSO) is proposed by Mistry et al. (2016). It also solves local optimum problem and premature convergence by introducing non replaceable memory, a secondary swarm having 5 participants with a leader and 4 followers, new velocity updating strategy, sub dimension-based regional facial feature searching and global exploration searching. For Emotion recognition, features generated from mGAembedded PSO algorithm are classified with multiclass SVM and ensemble classifier for improved accuracy. Results from the paper shows that hvnLBP based feature extraction surpassed most recent Local Binary Pattern variants. For expression recognition, 100% accuracy was achieved in case of CK+ and 94.66% in case of MMI database for mGAembedded POS and diverse classifier. Assessment was done around of 30 trails. Yu et al. (2013) presented a semiautomatic way of creating a dataset containing facial expression (Yu et al., 2013). First a web search is performed for a certain emotion keyword; search engine returns a raw dataset which is very noisy. To remove non face images, Voila Jones facial detector is used. Images relevant to the query are selected by binary support vector machine. SVM is trained by pool base active learning method to make it able to predict existence of a facial expression matching the query keyword. SVM selected images are final expression data. Furthermore they presented a new facial feature based on WLD and histogram contextualization for multi-resolution analysis of faces. Experiment show that the suggested frame work is fast and accurate, and a diverse dataset for facial expression can be created by this framework. Limitation of this

Input Image
Pre-Processing Features Extraction Classification approach is that WLD produces a lot of dimensions that needs to be reduced once we have applied this technique. That makes it a bit slow and unreliable. Facial expression recognition can be implemented in two ways, first on consecutive images and second on single image. Proposed solution by Carcagnì et al. (2015) is an implementation based on single image (Carcagnì et al., 2015). This could also be done in two ways first component based approach second global approaches. Component based approach is not suitable due to its high computational cost whereas global based approaches still needs work to be done in its domain because it's difficult to find global descriptors on face hence to solve these problems this paper proposed a system which implements Histograms of oriented gradients (HOG) on FER system. HOG is dense feature extraction method for single image. It extracts all regions of interests from image through gradients. This technique is pretty fast. Paper describes about how to set perimeters of HOG so it could distinguish the facial expression traits to its best. Algorithmic pipeline pattern splits the system in 3 phases. In 1 st phase input frontal face in system which then performs registration of face after that HOG is applied on face. SVM technique applied for classification. Phase 2 applies HOG perimeters which are then tested on datasets; sequence of input faces starts with neutral face and ends with expressive face. Phase 3 validates system in real world. This system gave performance for edge and shape molding up to 95.8% accurate. Strength of applied technique lies in choice of perimeters plus it gives performance 95.9%, a precision 98%, and accuracy of 98.9%. It processes 7fps and this approach is good for real world too. Weakness found in proposed system was that it can't detect emotional state of person, non-frontal face was an issue, and detector was able to work only in range of (-30, 30) degree, lastly system was not capable of differentiating anger and disgust expression.
Expression recognition on low resolution images in real time is difficult. There are many methods like appearance based, geometric feature based for FER but these methods are timely, computationally and memory intensive plus they require more feature vectors. To overcome these issues, Khan et al. (2013) proposed an FER system that would be able to work with images have less resolution plus for high quality images too which could manages illuminations (Khan et al., 2013). It will be memory and time efficient, Features will be extracted from salient regions of face by using pyramid feature extraction approach PLBP, after that proposed framework was tested on different databases and obtained very good results i.e. Cohan Kanad CK+, MMI FE database. Generally face recognition systems are divided in 3 phases. Phase 1: Face detection which applies Viola jones object detection. Phase 2: Feature extraction (main focus of paper) best features are minimized with the change in expressions. Algorithms used in this phase are Pyramid of local binary pattern (PLBP) for facial feature extractions. It is spatial representation of LBP, takes texture resolution variations into account. For extraction of salient features psycho-visual experiment was implemented using tracker of eyes, conducted on 6 universal expressions. Phase 3: Expression classification. Now let's talk about strength. Strength of paper is PLBP it is simple yet computationally efficient. It performs efficiently for high resolution images and has improved performance on images with lower resolution. Framework gave illumination which remain unchanged. It is good for posed as well as for abrupt expressions. Proposed framework proposes silent regions of face only which in turn have less memory consumption and is computationally efficient plus it is useful for real world applications. Future work plans to focus on idea of movements, change in camera angles and effect of system jamming which effects performance of system.
Expression recognition in continues video is a difficult task. There is very few work of FER in presence of head motion in 3D space. A lot of work has been done on still images but it captured the peak of expressions. To solve this problem FER on dynamic head motion is introduced for good and accurate results by Dornaika et al. (2013). Too much work has been done on dynamic FER frontal faces with higher resolution but no work has been implemented on the dynamic FER with head movement in 3D space (Dornaika et al., 2013). Main focus of the paper is on 3rd stage of FER system "facial expression recognition" while head movement. Tracked system is used for recognition to detect head movement with the help of 3D face and facial actions. Algorithms and techniques which are implemented on proposed solution are Classifiers performance that exploited head poses,3D head pose and facial actions are provided with an appearance based 3D face tracker, Principal component analysis (PCA) which reduced noise, Latent Dirichlet allocations (LDA) which enhanced the discrimination between expressions. Two schemes to implement these algorithms for facial expression recognition are mentioned Scheme 1: Dynamic time wrapping technique in which trained data was given by temporal signature associated with facial expressions. Scheme 2: modeled temporal signature facial actions with constant length feature vector and to recognize expressions used machine learning algorithms. Experiments were conducted on CMU (database) and self-made video frames. It improved classification by apply dimensionality reduction technique. Maximum recognition it gave was 90%. Strength of proposed solution: a tracked facial action which dynamically learns online face appearance, used approaches are texture independent, Face recognition can be done even with non-frontal face, change in video stream or facial action stream didn't effected recognition accuracy because it used dynamic time wrapping technique which lessens the nonlinear time scale. PCA+LDA have provided better performance its classification accuracy is 90.10%, Overall recognition rate was 90.4% in CMU video sequences. Spotted weakness is: rate of recognition for real time expressions was 100% for all expressions but disgust gave 44% accuracy. 90.4% was summed up recognition rate of system. In future they plan to extend their work in nonlinear dimensionality reduction method. Face detection, facial action and 3D face tracking is not in scope of paper. Face expressions are dynamic they require high computational cost for detection. 3D approach is used by Kamarol et al. (2016) proposed framework which takes note of time and space with less computational cost (Kamarol et al., 2016). STTM (Spatiotemporal texture map) is applied for feature extraction which captures continuous and perfect motion of facial expressions which in turns provide special information. It generates 2D textured map. Has very low computational cost by giving accurate temporal and spatial variations of face expressions. In proposed framework firstly viola and jones face detector detects face then crop out background. After that STTM extract and modeled facial features by using spatiotemporal information gathered from 3 dimensional Harris corner function. Features are extracted and represented in form of histograms this is done by using block-based method. Support vector machine classifier classifies features into emotions. Following results showed strength of proposed framework: recognition rate recorded was 95.37%, 98.56% and 84.52% for different datasets having spontaneous expressions, posed expressions and close to real world expressions. For CK+ datasets high recognition rate was 100% for majority of expressions excluding happiness and sadness, based on confusion matrices STTM achieved highest accuracy on CK+ and CASME II dataset was 97.70%, 98.61%.
Overall STTM achieved highest performance with low computational cost. Spotted weakness were: In CK+ expressions which achieved lowest recognition rate was because of insufficient data available for that expressions, In AFEW, STTM have accuracy of 90% for most of expressions. 71.43% was lowest for fear which was confused with disgust most of the time. In future, paper plan to improve proposed technique in domain of head movements, identify suitable classification framework and computational complexity.
To overcome the challenge of feature extraction from images taken in uncontrolled environment Patil et al. (2016) presented a method that uses contourlet transformation and spatial domain to create feature vector unlike current working system that work on Local binary pattern or steerable pyramid that create feature vector only from transformation and spatial domain (Patil et al., 2016). As contourlet transform utilizes properties of directionality and anisotropy, it extracts important features. For contour subbands, they suggested a new coefficient enhancement algorithm which enhances skin region features to make system more vigorous. They also tested feature level fusion on multiple databases that showed face recognition rate is competitive.
By describing images in form of highorder two dimensional orthogonal Gaussian-Hermite moments (GHMs), Imran et al. (2016) proposee novel expression recognition method (Imran et al., 2016). Set of features are selected on the bases of instants having high discrimination power. The discriminative GHMs are casted on the new expression-invariants subspace using association among regular faces to get differentially expressive elements of the instances. Features attained from the differentially expressive elements of the instances and discriminative instances are applied to identify an expression using the SVM classifier. Experiments were conducted on commonly used databases, achieved resulted in overall batter performance of expression recognition than similar or existing methods. Table 1 summarizes facial expression recognition techniques that have been used in the literature by multiple researchers, along with their pros and cons.

Conclusion
Facial expression are fabricated during communication transmission so images may be acquired in uncontrollable condition like occlusion (glasses, scarf, facial hair, cosmetics and it also effects recognition rate), pose, illumination and expression variation etc. This paper has presented a survey on facial expression recognition. Recent feature extraction techniques are covered along with comparison which are very helpful for other researchers to enhance the existing techniques in order to get better and accurate results.