Glass breaks detection system using deep auto-encoders with fuzzy rules induction algorithm

Main uses of glass windows in commercial and residential buildings are prevalent. While a glass-based material has its advantages, it also poses security risks. Therefore, glass break detectors play an important role in security protection for offices and residential buildings. Conventional vibration-based and acoustic-based glass break detectors are designed to detect predetermined temporal and frequency feature thresholds of glass breakage sound signals. This leads to the inability to differentiate glass break from environmental sounds (such as the sound of striking objects, heavy sounds and shouted sounds) that are similar in their amplitude threshold and frequency pattern. Machine learning based acoustic audio classification has been popular in security surveillance applications. Researchers are interested in this research area, and different approaches have been proposed for anomaly event detection (such as gunshots, glass breakage sounds, etc.). This paper proposes a new design of a glass break detection algorithm based on Fuzzy Deep Auto-encoder Neural Network. The algorithm reduces false alarms and improves detection accuracy. Experimental results indicate that proposed fuzzy deep auto-encoder network system attained 95.5% correct detection for the proposed audio dataset.


Introduction
*Nowadays, the primary uses of glass windows in commercial and residential buildings are prevalent. Light, comfort, style and energy efficiency are among the benefits of today's high-performing glass-based materials (Stelzer, 2010). While a glass-based material has its advantages, it also poses security risks. At night or when the owner is not present, an intruder can easily break the glass of a door, then reach inside and open the latch lock (Sidhu, 2005). Therefore, glass break detectors play an important role in security protection for commercial and residential buildings. A Glass Break detector is a sensor used in electronic burglar alarms that are designed to detect if a pane of glass is shattered or broken (Venkat, 2010).
Several electronic sensor-based techniques and machine learning based models are being considered to improve detection accuracy. Smith (2004) studied vibration signals and acoustic waves resulting from contact force applied on the glass surface when shocking or a glass breaking event occurred. Eskildsen (2005) proposed a temporal feature based glass break detection method, in which a detection alert triggers if the zero crossing rate of input audio that exceeds the predetermined threshold of glass breaking the zero crossing rate. Richard (2011) proposed a dual channel based glass break detection algorithm that contains two different audio transducers, called omnidirectional and highly directional transducers. Their proposed control circuit analyses amplitude and frequency thresholds from both the first and second transducers, then determines whether or not a glass breakage profile is present. Conventional sensor-based glass break detector is triggers a detection alarm if the input audio has the same or higher than the pre-defined vibration and frequency threshold of the glass breaking sounds. A door slamming, hitting, and falling objects can cause similar vibrations and frequency patterns of glass breaking sounds. Therefore, electronic sensor based glass break detectors can also repeatedly trigger an alarm by mistakenly identifying a high-pitched tone from the surrounding environmental (such as dogs barking, children screaming and motorcycles driving noise) as glass breaking sounds (Lopatka et al., 2016). False alarms can trigger costly fines and pull emergency professionals away from real crises and be an inconvenience for the environment and homeowner.
Machine learning algorithms advanced in the past few years and become competitors to traditional statistical models and electronic sensors for various security surveillance fields. Wang et al. (2017) proposed a new audio event detection and classification approach based on a state-of-the-art fully convolutional network framework. They extracted two spectrogram feature representations of glass break audio as input. Their proposed approach contains two stages: the first detects whether there are audio events by sliding convolutional kernel in the time axis, and then proposals which possibly contain audio events generated by Region Proposal Networks (RPN). In the second stage, time and frequency domain information is integrated to classify these proposals and refine their boundaries. They attained 94.6 % accuracy for detecting glass breaking events and outperform traditional based line approach. Alain et al. (2005) proposed a complete detection and recognition system for impulsive sounds (such as glass breaks, human screams, gunshots, explosions or door slams) using a Hidden Markov Model (HMM) classifiers under variant background noise conditions. They attained 98% correct detection accuracy at 70dB signal-to-noise ratio (SNR) and above 80% for 0 dB SNR. Their result found that machine learning based glass break detection models have better accuracy and fewer false alarms than conventional electronic sensors. The long computation process and complex training architecture is a problem for traditional machine learning models to attain higher accuracy at various noise levels. In this paper, we propose a new approach for a glass break detection algorithm based on deep learning architecture to improve detection performance and to reduce wasteful false alarms.
The remaining outline of the paper organizes as follows: Section 3.1 describes the Fast Fourier Transform (FFT) algorithm for the pre-processing process. Section 3.2 discusses the dimension reduction process by using the deep Auto Encoder Neural Network approach. Session 3.3 explains the concept of the fuzzy rules-based data extraction model referred to as Fuzzy Rule Induction Algorithm (FURIA) for the detection of glass breaking events. Section 4 describes the results of the experimental simulation model, and Section 5 summarizes results and discussion of proposed glass break detection system. Section 6 describes conclusion.

Fast Fourier transform (FFT)
Every audio wave contains different amplitude and frequency values. Time domain of input audio signal cannot detect the exact periodic frequency of breaking glass sound in the data. A Fourier transform is a mathematical function allows understanding of the frequency representation of a glass breaking event inside a signal. Therefore, a FFT algorithm converts audio input from the time domain into a frequency spectral representation data make the data more accessible for computational analysis.
The FFT algorithm is an efficient method for subtracting the Discrete Fourier Transform (DFT). DFT is a mathematical operation that precisely identifies and extracts both the frequency and the amplitude of the partial waveform and reduction of the harmonics. To overcome the slowest processing time of DFT, highly efficient computing algorithms, namely the FFT algorithm, is applied on DFT (Van Loan, 1992). The Fast Fourier Transform Algorithm of DFT defines by the formula: Where N denotes the number of time domain samples in the dataset, n denotes current samples under consideration (0 to N-1), denotes values of signals at time n, k denotes current frequency spectrum of considering (0 to N-1 Hz), and denotes the amount of amplitude and frequency spectrum k in the signals (Mohlenkamp, 1999). Fig. 1 shows the pattern of the converted frequency domain over time, and Fig. 2 denotes the transformed frequency domain pattern of non-glass break signals over time (ms) determined by using the FFT algorithm.

Deep auto-encoder neural network
Auto-encoder is the unsupervised learning artificial neural network model for efficient coding. An auto-encoder neural network can learn a set of imprecise data into compressed dimensionality form (encoding), and force reconstruction (decoding) to original data form by learning itself without significant loss in information. A deep auto-encoder neural network is required to find the correlated feature of each input signal and convert it as reduced dimensionality representation vectors before classification of audio signals.
If the input to an auto-encoder is a vector , then the encoder maps the vector x to another reduced dimensionality vector (1) as follows: (1) = ℎ (1) ( (1) + (1) ). (2) Where ℎ (1) denotes sigmoid non-linear activation function, (1) indicates the weight of network, x is the input feature vector, and (1) indicates as a bias vector. The simple form of an auto-encoder is similar to the multilayer perceptron (MLP) neural network that includes an input layer, an output layer and one or more hidden layers connecting them. However, one difference is that the output layer of auto-encoder has the same number of nodes as the input layer. Yet, the problem of conventional auto-encoder neural network is the limited performance of processing a very complex non-linear unstructured dataset (like audio, images). Therefore, we propose deep learning architecture with an auto-encoder neural network for dimensionality reduction tasks.  The basic architecture of the proposed deep autoencoder neural network is composed of two, symmetrical deep-belief networks (DBN). The encoding part (i.e., data compressed) contains three shallow hidden layers to obtain encoded bottleneck feature vectors, and the decoding part (i.e., data reconstructed) includes three hidden layers to reconstruct (i.e., original input) as output (Hinton and Salakhutdinov, 2006).
The learning structure of DBN is that the first encoded output of current layer is considered as the input vector for the next hidden layer until the final expected bottleneck layer of reduced dimensionality is attained. For optimization and reduction in cost error of the encoded and decoded results, the network uses a back propagation learning algorithm with sigmoid activation function in every layer of the training model. As a result, the deep auto-encoder neural network algorithm can produce a useful representation of encoded data without any information loss. If the data can be represented in fewer dimensions without loss of information, then the classifier can get faster computation performance of analyzing the data and better detection accuracy in the classification step. Fig. 3 denotes encoding and decoding part of symmetrical deep auto-encoder neural network for dimensionality reduction of the input audio feature vector (Tüske et al., 2014).

Fuzzy unordered rule induction algorithm
A novel fuzzy rule-based classification model called the Fuzzy Unordered Rule Induction Algorithm is an extension and adaptation of the conventional simple rule learner RIPPER algorithm (Hühn and Hüllermeier, 2010). Besides, it includes several modifications and extensions. Particularly, FURIA learns unordered fuzzy rule sets instead of general rule lists. Moreover, to handle uncovered examples, FURIA makes use of an efficient rule stretching method. FURIA automatically analyzes whether there is a statistical difference between different frequency features of glass break and nonglass break signals in the proposed dataset, and extract appropriate membership parameter values for each class. Then, the proposed algorithm creates the fuzzy rules sets by using significance membership parameters of two signals. Classification output of signals decides on a support value for each fuzzy rule set. Fig. 4 denotes the principle Learning Algorithm of FURIA, Fig. 5 explains the antecedent fuzzification algorithm for a single rule pruning, and Fig. 6 extracts rule examples from glass break and non-glass break audio data using fuzzy Rule induction algorithm as follows (Hühn and Hüllermeier, 2009):

Experiments
For data acquisition, input audio signals are recorded from an acoustic sensor microphone. Then, acquired signals are passed through the 5V Amplifier to amplify the input signals.  (Hühn and Hüllermeier, 2010) If the amplitude of the amplified input signal is higher than a threshold, then the output of the comparator is HIGH. The comparator activates the 555 timer which goes on for 5 seconds and wakes up a Micro Controller Processing Unit (MCU) and input audio signals are received. The collected audio signal for glass break detection system has a 44100 sample rate per second over 5 sec time frames. We collected 1000 audio (.wav) samples for data acquisition processing at various noise level environments in Fig. 7. This dataset made of two types sound classes consist of 500 samples of breaking glass sounds data 1. For generating a rule directly from training data (Generate the rules form of fuzzy membership function set from training examples by using covering approach of Ripper Rule Learning algorithm) 2. Rule Pruning (Select the best rules induction of training example (learn-one-rule), using a general to specific (Greedy Search) algorithm for rule pruning) 3. Classification (Define a support value for rules in each signal and decide for classification) (breaking glass sounds with different noise level) and 500 non-breaking glass data (combining of shouted sounds, cars horn, household, alarm, animals, farm and child playing, and peoples' conversations). The collected audio signal of glass break detection dataset has different sizes and different audio lengths. Therefore, before analyzing and preprocessing the dataset, this work applied a normalization process. The collected time domain of the audio sample in the dataset normalizes into 16384 (14 bit) resample per 250 milliseconds of each signal to make the system invariant to audio length and size. Then, the audio data is converted into a frequency representation form by using the FFT algorithm. After pre-processing, to reduce model complexity and simplify of input audio data, we proposed a Deep Auto-encoder neural network algorithm for dimensionality reduction. Collected normalized audio signals of the glass break detection dataset are large and complex such as, for example, 1 sample × 16384 sampling frequency features vectors. To adequately handle this large and complex data in the training state, the proposed deep autoencoder neural network algorithm is implemented for dimensionality reduction to compress vector size of audio data. For the classification process, the proposed fuzzy rule induction algorithm is applied on data of reduced dimensionality to detect glass breaking audio signals.

Result and discussions
For training and measuring accuracy, the audio input data can be divided into two randomly selected groups, such as a training group that corresponds to 70% of the samples and a test group that corresponds to 30% of the pattern. As a result, the generalization ability of the simulated model can be checked after the training phase. Besides, 30 audio signal data are select as invisible data. The True Positive Rate (TP), Error Positive Rate (FP), area under the ROC curve and correct detection rate were used as an accurate measure of the proposed classification criteria. Fig. 7 denotes step by step classification progress of the proposed glass break detection system (red point denotes glass break, blue point denotes nonglass break signals). Table 1 represents an experimental result of the proposed glass break detection system that attains 95.5% total correct detection for 1000 samples in an audio data set. Table 2 summarizes the detection result of state of the art system performance during the intrusion process. Results indicate that in three test trials the GB sensor failed to detect the glass breakage event 10 times in the testing process yielding a detection accuracy of 65% (Zidan, 2015). Since different researchers in the literature use different datasets, it is not possible to perform any direct comparison.

Conclusion
This research proposed a new architecture for a robust glass break detection algorithm by using machine learning and signal processing to improve the detection accuracy for glass break sounds. A deep auto-encoder method is proposed to reduce the size and complexity of audio data signal (without any loss of useful information). An effective rule stretching technique, called the Fuzzy Rule Induction method, is applied to detect a glass breaking event based on membership parameters of input signals. The new fuzzy rule stretching algorithm is simpler than existing hitherto techniques and can be used more efficiently when comparing existing rules. According to experiments, classification result of the glass break dataset is the attainment of correct detection accuracy of 95.5% which indicates that proposed fuzzy deep auto-encoder neural network algorithm is significantly better than the former conventional machine learning based detection

Rule Example from Fuzzy Rule Induction of a Non-Glass Break
If 10 th feature of membership parameter is

Rule Example from Fuzzy Rule Induction of a Glass Break
If 45th feature of membership parameter is [0.049824, 0.049831, ∞, ∞]) AND (40th feature is [-∞, -∞, 0.049825, 0.049826]) Then Label is Glass break (CF = 0.92) system and other sensor-based glass break detection techniques. Future work may cover real-time protection of home and office security systems based on proposed machine learning based glass break detection approaches integrated with other security devices.   Table 2: Experimental results of glass break detection system using conventional glass-break detector (Zidan, 2015) Number of Trial Success Detection Alarm False Alarm Accuracy 10 6 4 60%