Quadratic Neural Networks Show Promise in Handling Noise and Data Imbalances

cover
23 Dec 2024

Abstract and 1. Introduction

2. Preliminaries and 2.1. Blind deconvolution

2.2. Quadratic neural networks

3. Methodology

3.1. Time domain quadratic convolutional filter

3.2. Superiority of cyclic features extraction by QCNN

3.3. Frequency domain linear filter with envelope spectrum objective function

3.4. Integral optimization with uncertainty-aware weighing scheme

4. Computational experiments

4.1. Experimental configurations

4.2. Case study 1: PU dataset

4.3. Case study 2: JNU dataset

4.4. Case study 3: HIT dataset

5. Computational experiments

5.1. Comparison of BD methods

5.2. Classification results on various noise conditions

5.3. Employing ClassBD to deep learning classifiers

5.4. Employing ClassBD to machine learning classifiers

5.5. Feature extraction ability of quadratic and conventional networks

5.6. Comparison of ClassBD filters

6. Conclusions

Appendix and References

3.1. Time domain quadratic convolutional filter

As mentioned earlier, quadratic convolutional neural networks (QCNN) is a key constitutional component in the time domain filter. In this paper, we utilize the quadratic neuron proposed by Fan et al. [34], which is expressed as follows:

where ∗ denotes the convolutional operation.

Currently, quadratic networks have been demonstrated to possess certain advantages in both theoretical and practical aspects. Firstly, in terms of efficiency, quadratic networks are capable of using neurons at a polynomial level to approximate radial functions, while conventional neural network necessitate neurons at an exponential level [55]. Secondly, when it comes to feature representation, quadratic networks can achieve polynomial approximation. In contrast, conventional networks resort to piece-wise approximation via non-linear activation functions. The polynomial approximation is better than the piece-wise one in representing complex functions [56]. Lastly, in practical applications, several studies have successfully incorporated quadratic neural networks into bearing fault diagnosis, and they have reported superior performance under challenging conditions such as strong noise [35], data imbalance [57], and variation loads [58]. This further underscores the practical utility and robustness of quadratic networks in fault diagnosis.

Despite the impressive performance of quadratic networks, there is a substantial increase in the number of model parameters and non-linear multiplication operations in quadratic networks. Accordingly, the number of parameters to be optimized has increased largely. Previous studies have shown that conventional initialization techniques can significantly hinder the convergence of quadratic networks [44, 59]. To overcome this problem, we design a dedicated strategy to initialize the quadratic network:

The group initialization strategy, also known as ReLinear [59], aims to compel QCNN to commence from an approximately first-order linear neuron. The initial values of high-order weights are set to zero so that it grows slowly. This strategy greatly increases the stability of quadratic networks during training by avoiding gradient explosion. In terms of implementation, we employ two QCNN layers to form a symmetric structure which mimics a multi-layer deconvolution filter. The first QCNN layer maps the input into 16 channels, while the second one consolidates these 16 channels into a single output. The dimension of the output is deliberately maintained the same as the input. This operation effectively implements a conventional BD filter using a convolutional neural network. Finally, as the QCNN functions as a time-domain BD, the widely-used time-domain BD objective function kurtosis (Eq. (3)) is employed in this filter. Thus, we construct the time domain BD loss as follows:

Authors:

(1) Jing-Xiao Liao, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, Special Administrative Region of China and School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, China;

(2) Chao He, School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing, China;

(3) Jipu Li, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, Special Administrative Region of China;

(4) Jinwei Sun, School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, China;

(5) Shiping Zhang (Corresponding author), School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, China;

(6) Xiaoge Zhang (Corresponding author), Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, Special Administrative Region of China.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.