Extracting accurate time domain features from vibration signals for reliable classification of bearing faults

Identification of localized faults in rolling element bearing (REB) frequently utilizes vibration-based pattern recognition (PR) methods. Time domain (TD) statistical features are often part of the diagnostic models. The extracted statistical values are, however, influenced by the fluctuations present in random vibration signals. These inaccurate values consequently affect the diagnostic capability of the supervised learning based classifiers. This study examines the sensitivity of TD features to signal fluctuations. Vibration data is acquired from different REBs containing localized faults using a test rig, and a central tendency (CT) based feature extraction (CTBFE) method is proposed. The CTBFE ensures the supply of reliable feature values to the PR models. The method selects the fault related appropriate portion of a vibration signal prior to extract TD features. Variety of classifiers is used to judge the effect of CTBFE method on their fault classification accuracies, which are enhanced considerably. The results are also compared with a similar sort of existing method, where the proposed method provides better results and feasibility for on-line applications.


Introduction
*The REB is vital part of rotating machinery due to carrying dynamic loads. Beside financial losses, sudden failure of REB may cause catastrophic failures. Vibration-based condition monitoring is the most popular technology for early detection of such failures (Tandon and Choudhury, 1999). However, the localized faults in REB produce very weak impulses in vibration signals (Wictor, 1991). Therefore, the existing frequency domain methods are unable to detect these faults (Randall, 2011). The raw vibration data are oftentimes pre-processed to aid the detection process. Envelope analysis (Sawalhi et al., 2007;Wang and Lee, 2013) and wavelet-based decompositions (Caesarendra et al., 2013;Lou and Loparo, 2004;Smith et al., 2007;Purushotham et al., 2005;Altmann and Mathew, 2001;Abbasion et al., 2007) are the commonly used pre-processing methods. With the increasing technology, several vibration-based PR methods are used also to diagnose machinery faults (Rauber et al., 2010). However, noise in PR systems can reduce the performance of classifiers (Ericsson et al., 2005). Numerous supervised learning methods have so far been presented for the identification of REB's localized faults using TD statistical features Jack and Nandi, 2002;Rojas and Nandi, 2006;Yang et al., 2004;Zhang et al., 2005;Sugumaran, and Ramachandran, 2007;Kankar et al., 2011;Sugumaran and Ramachandran, 2011;Saimurugan et al., 2011). But optimizing the fault classification accuracy, using minimal set of features, is still a challenging task for the researchers.
Literature survey reveals that only a few research efforts have been made to extract accurate features from random nature of vibration signals, for reliable PR of machinery faults. Lee et al. (2010) examined the sensitivity of features for machinery prognostics and health management system. Several features, extracted from time and frequency domains, were employed to identify various faults of rotating machinery. Effect of signal quality along with machine's operating conditions like load, speed, or torque was studied. To decrease the influence of operating conditions on the features accuracy, the authors also presented a feature normalization method. It was emphasized that, for reliable results, the noise and outliers present in a vibration signal should be addressed prior to extracting diagnostic features. Recently, Tahir et al. (2017) presented a central tendency (CT) based feature processing (CTBFP) method to extract accurate TD features from random nature of vibration signals. The method operated at feature-level, i.e. after extracting the features and prior to employing classifier in a diagnostic model. The feature distributions were processed during the data preparation stage of supervised learning. The features included RMS, mean, variance, skewness, kurtosis, crest factor, impulse factor, shape factor, median and range. The paper investigated that fluctuations or spikes present in vibration signal can consequently alter the statistical value of a TD feature or produce feature outlier. During the CTBFP, abnormal or outlying values of the features were detected, and the affected instances containing one or more abnormal feature values were discarded. The authors also stated that the occurrence of fluctuations may not be related to study bearing fault patterns (Liu et al., 2013). Intent of the research was that only fault related feature values should take part in the PR process. Several classifiers were utilized to validate the performance of CTBFP method, which considerably enhanced the diagnostic capability of the classifiers. The classifiers included Support Vector Machine (SVM), BayesNet, Decision Table and  Decision Tree. Application of the CTBFP method may become somewhat limited when data-set is small, due to the strategy of discarding the affected instances. Unlike processing the feature distributions, this study proposes a new CTBFE method that works at feature extraction-level to obtain reliable TD feature values. Utilization of the CT-based extracted features, for the PR of REB's localized faults, considerably enhances the fault classification capability of the classifiers. The proposed method selects the most appropriate portion of a vibration signal for the extraction of features. This ensures the supply of very accurate feature values to a classifier for truthful decision making. The method is efficient and provides significant immunity to possible fluctuations and background noises present in vibration signals.
The CTBFE method not only preserves number of instances but also provides more accurate results compared to that of CTBFP method. Same TD features and classifiers were utilized in the present study that was used by Tahir et al. (2017). To the best of our knowledge, the proposed methodology has not been reported so far for the purpose of bearing fault diagnosis.
The manuscript is organized as follows. Section 2 briefs about the bearing structure and its localized faults. Major steps involved in the development of CTBFE method are elaborated in Section 3. Section 4 discusses the results obtained and findings of the proposed study, whereas the conclusions are drawn in Section 5.

Localized faults in REB
Localized faults commonly occur in REB because of surface fatigue (Liu et al., 2013). With the appearance of a fault on any element of the bearing, an impulsive vibration is produced that is known as fault frequency. The frequency depends on rotational speed of shaft and location of the fault. Fundamental train frequency (FTF), ball pass frequency of inner race (BPFI), ball pass frequency of outer race (BPFO), and ball spin frequency (BSF) are the common frequencies generated in REB. Fig. 1 shows the geometric parameters of the bearing involved in generating the fault frequencies, which are described below.
where ℎ is the motor speed in Hertz, is the number of balls, is the diameter of ball, is the pitch diameter and ∝ is the contact angle.
Resonance in bearing housing is produced due to metal to metal impacts of bearing components, and is modulated by fault frequency. Over many years, envelope analysis has been used as benchmark method to detect these low level impulses, as it is often difficult via conventional frequency analysis methods (Randall, 2011). The enveloping extracts the signal of interest using a band pass filter in the high frequency reign to demodulate resonance related to the fault impacts. However, proper selection of the frequency range is critical for effective demodulation.
Many approaches have been proposed in the literature for optimal selection of frequency band, such as spectral kurtosis based methods, spectral energy based methods, wavelet based methods etc., that are discussed in Barszcz and Jabłoński (2010) and Zhao et al. (2014). We have employed spectral kurtosis based fast kurtogram method proposed by Antoni (2007) for frequency range selection in our enveloping-based data validation process as described in next section.

Proposed methodology
The proposed fault diagnostic scheme works mainly in three steps, elaborated by the block diagram in Fig. 2. Details of each step are in the following subsections.

Vibration data acquisition
Vibration data from faulty bearings was acquired using a Machine Fault Simulator (MFS) from SpectraQuest Inc. A set of ball bearings ER-12K model was utilized containing different localized faults. The faults include inner race fault (IRF), outer race fault (ORF), ball fault (BLF), and mixture of the above mentioned faults (MXF). The generated fault size measured as 1.5 mm wide and 0.3 mm deep. Fig. 3 shows the schematic of the experimental setup, in which healthy bearing is installed at inboard and the faulty bearing is installed at outboard. A mass of 5kg was placed in the middle of healthy shaft acting as loader. An ICP industrial accelerometer model 608A11was stud-mounted at the top of out-boards bearing housing to measure radial vibration from bearing under test. Sensitivity of the accelerometer was 100mv/g, having operating frequency range 0.5 Hz to 10 KHz and resonance frequency 22 KHz. NI 4472 hardware was used to capture data at the rate of 60K samples/sec at motor speed of 1000 RPM. Forty vibration samples were acquired, each of 10 seconds duration, for each fault.  Table 1, are calculated by using Equations 1 to 4. The vibration data set was validated using envelope analysis method. The enveloped spectrum of IRF fault is shown in Fig. 4a. Harmonics of BPFI are present with the side-bands of shaft speed. Fig. 4b elaborates the first harmonic of BPFO representing ORF. BL fault is evident in Fig. 4c, where twice the BSF appeared with the FTF. Fig. 4d shows the enveloped spectrum of MXF, in which BPFO and BSF are dominating. Hence, all the required information related to ball bearing localized faults is present in the data set. Fig.  4c and Fig. 4d demonstrated no noteworthy frequency patterns above the 250 Hz, and thus the maximum limit of the graphs is set to 250 Hz to zoom-in the valuable part of the graphs.

CT-based feature extraction (CTBFE)
The second step is the core of diagnostic scheme. The features were extracted in three distinct stages, as shown in Fig. 5. Details of which are in the following subsections.

Data segmentation
At first stage, each acquired vibration signal or sample of every fault was segmented into n segments or sub-samples (n=30 here). As the motor speed was 16.67 Hz, thus each segment holds vibration history of more than 5.5 revolutions of the shaft. In this way, the segments contained a valid sample length to compute trustworthy statistical features.

TD feature extraction from the segments
The following ten features were extracted from every segment.
Variance ( Impulse Factor (IF) = In the above relations, X is the sequence of samples obtained after digitizing the time domain signals, X(t) is the amplitude of ℎ sample and M is the total number of samples in the sequence.

Obtaining CT of the features
The CT describes a data-set with single value. Mean, median or mode is common parameters used according to application (Watt and Van den Berg, 1995). Mean is used mostly when data distribution is symmetric. However, it exhibits sensitivity to outlying values present in data-set due to involvement of every value. On the other hand, median score exhibits lesser sensitivity to outliers as it occupies middle place in an ordered set of data (Watt and Van den Berg, 1995). When TD features were extracted from the segments of a vibration signal, outliers in the ascending ordered feature distribution were usually placed above the median score. Fig. 6a shows range feature extracted from vibration samples of every fault. The feature values are varying due to fluctuations in random vibration signals. Fig. 6b shows the same elements sorted in ascending order. Median values of the feature from every fault are nearly insensitive to these outliers. Therefore, the proposed CTBFE method considers the median values of TD features as the most accurate features to recognize bearing's faults patterns, i.e. the values that are unaffected by undesired fluctuations. This choice indirectly points out the most appropriate vibration sub-sample or portion, which produces the feature's median value. In other words, the proposed method picks a particular vibration sub-sample for feature extraction to take part in pattern recognition process, while discarding the rest of vibration subsamples.

Fault classification
Supervised learning based PR-model was employed at final stage of the proposed methodology. SVM, BayesNet, Decision Table and Decision Tree were used separately to judge the performance of CTBFE method. At first stage, a classifier is trained using known data examples or instances and then employed for testing unknown data. The process is illustrated in the block diagram in Fig. 7.
The k -fold cross-validation method was implemented to estimate the performance of any model. Data-set D is segmented into k equal parts, i.e., d1, ..., dk. Out of the k parts, k-1 are used for training the model while the remaining part is used for validation. The process continues for k-times, and thus each of the k parts is utilized once for data validation purpose. Finally, the results obtained from k folds are averaged out to have global estimation of classification accuracy.

Results and discussion
Vibration data was acquired from a set of ball bearings containing localized faults using MFS.
The intention was to identify these faults using PR methods with TD statistical features. An important phenomenon was observed that fluctuations may be occurred in random vibration signals, as shown in Fig. 8. Consequently, statistical values of the TD features can be altered (Tahir et al., 2017). The reasons behind the occurrence of these particular phenomena are outside the scope of this study. However, the fluctuations may not be related to bearing's localized faults, and can reduce the fault classification ability of classifiers (Tahir et al., 2017). The inaccurate feature values made the fault identification difficult for the classifiers. Thus, the CTBFE method was developed that ensure the provision of reliable and accurate TD features to the diagnostic models. The proposed method selects the most appropriate portion of a time domain signal before extracting any feature to take part in PR process. Unlike the conventional way, the TD features were not extracted directly from vibration signal or sample. An acquired vibration signal was initially segmented or divided into suitable number of subsamples, as already discussed in Section 3.2. Then at the next stage, any TD feature was extracted from every segment, forming distribution of that feature. The feature distribution might contain outlying values extracted from the segments having fluctuations. Finally, median value of the distribution was chosen as a reliable value of the feature used by the classifier later. Remaining values of that feature were discarded. In other words, a portion of time domain vibration signal, which produced median value in the feature distribution, was considered as the most appropriate part of the signal to extract that particular feature for classifier. Similarly, every vibration signal or sample, acquired from different faulty bearings, was processed and data-set was prepared for the supervised learning and testing of a classifier using all the TD features. It is worth mentioning that the diagnostic capability of the classifiers was considerably improved.
Figs. 9a and 9c shows fluctuated values of RMS and Median features respectively due to presence of fluctuations in the samples. For instance, overlapping among the median feature elements can be observed that are extracted conventionally against different fault classes. On the other hand, Figs. 9b and 9d show much smoother and stable values of RMS and Median features respectively, extracted using CTBFE method against every fault. Table 2 shows the results in terms of fault classification accuracies produced by the SVM, BayesNet, Decision Table and Decision Tree. The classifiers provided quite low classification accuracy when trained over the conventional TD features. The above mentioned overlapping might be a reason of misclassification. On the other hand, the CTBFE method provided the most accurate results, even higher than that of CTBFP method (Tahir et al., 2017). Every classifier considerably enhanced its classification accuracy using the features extracted through CTBFE method.  Table 3 shows the CTBFE-based sample instances fed to the classifiers. Unlike the CTBFP method, which examines a vibration data sample whether to adopt or discard before incorporating classifier in a PR system, the CTBFE method preserves vibration sample or the data instance. In other words, every vibration data sample was taken into account for the training and testing of classifier. The CTBFE method locally examines the vibration sample to find the best portion to extract a particular feature. As the proposed method operated at feature extractionlevel, thus few values in any feature distribution were processed. This makes the method computational efficient over the conventional method of pre-processing the big TD raw vibration data. Therefore, the proposed method more feasible to apply, especially in on-line systems. Finally, the Gaussian white noise was added, at different signal-to-noise ratios (SNRs), to the acquired vibration signals. The purpose was to examine the robustness of the CTBFE method against possible background noise. Table 4 shows a comparative accuracies using SVM with conventionally extracted raw TD features, CTBFP-based features and CTBFE-based features. The results are evident that the CTBFE method is considerably immune to strong background noise. In conclusion, it is worthwhile to disassociate the unrelated vibration signal fluctuations before extracting TD features for better results. The proposed method provides an effective way to extract accurate TD features for reliable PR of REB's localized faults. CTBFP method, the proposed CTBFE method provides better accuracy and feasibility for real time applications.

Conclusion
Vibration-based PR methods were utilized to identify localized faults of REB using statistical TD features. It was observed that undesired fluctuations present in random vibration signals consequently swung the statistical values of TD features. It was also observed that these fluctuations might not be related to REB's localized faults, and employment of inaccurate feature values in PR systems might be the source of misleading the supervised learning based classifiers. Thus, unlike the conventional extraction of TD features, the CTBFE method is proposed to supply accurate and reliable feature values to the diagnostic models. Only the respective appropriate portions of vibration signals were utilized to extract the desired TD features for the fault classification process.
Variety of classifiers was employed to evaluate the proposed methodology, and the results were evident that all the classifiers were performed better when utilized the CTBFE-based features. Moreover, the proposed method has shown its robustness against the strong background noise. When compared to the most related existing