EEG SIGNAL CLASSIFICATION BASED ON ORTHOGONAL POLYNOMIALS, SPARSE FILTER, AND SUPPORT VECTOR MACHINE CLASSIFIER

- This work implements an Electroencephalogram (EEG) signal classifier. The implemented method uses Orthogonal Polynomials (OP) to convert the EEG signal samples to moments. A Sparse Filter (SF) reduces the number of converted moments to increase the classification accuracy. A Support Vector Machine (SVM) is used to classify the reduced moments between two classes. The proposed method’s performance is tested and compared with two methods by using two datasets. The datasets are divided into 80% for training and 20% for testing, with 5 -fold used for cross-validation. The results show that this method overcomes the accuracy of other methods. The proposed method’s best accuracy is 95 . 6% and 99 . 5% , respectively. Finally, from the results, it is obvious that the number of moments selected by the SP should exceed 30% of the overall EEG samples for accuracy to be over 90% .


I. INTRODUCTION
Brain-computer interface (BCI) technology enables people to control and interact with their surroundings [1].BCI includes applications ranging from education and entertainment to environmental communication and control based on the evaluation of noninvasive EEG [2 -4].A prevalent paradigm for implementing a BCI system involves measuring EEG during distinct motor imagery tasks (e.g., left-and right-hand motor imagery).The EEG data collected during these motor imagery tasks are classified to control the BCI system.
Extracting the relevant features from the gathered EEG data is the key to producing control signals that will aid in categorizing different mental activities.Many techniques were used to extract features from different signals [5], [6].
Tchebichef polynomials (TP), Krawtchouk polynomials (KP), and Hahn polynomials are all examples of discrete orthogonal polynomials [7].Although the TP shows notable energy compaction (EC) [8], the KP is superior to the TP in signal extraction of local features.On the basis that the Orthogonal Polynomial (OP) can be obtained by multiplying two orthogonal polynomials, numerous hybrid variants of KP and TP have been developed.Jassim et al. [9] proposed the Tchebichef-Krawtchouk polynomial (TKP), Mahmmod et al. [10] proposed the Krawtchouk-Tchebichef polynomial (KTP), and Abdulhussain et al. [11] recently offered the squared Krawtchouk-Tchebichef polynomial (SKTP).The SKTP is considered an advanced OPs combination strategy.It is implemented by multiplying two OPs derived from each hybrid OPs (TKP and KTP) [11].
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).Sparse filtering (SF) is a powerful technique in signal processing and machine learning that aims to extract informative features from high-dimensional data by encouraging sparsity in the representation [12].Sparse filtering applies an unsupervised learning algorithm to learn a set of filters or basis functions that capture the most salient features of the input data.These filters are designed to be sparse, meaning only a few filters are activated for a given input.Sparse filtering has been successfully applied in various domains, including computer vision, natural language processing, and bioinformatics.
It has shown promising results in tasks such as image denoising, feature extraction, and classification [13].
Support Vector Machines (SVM) are widely used machine learning algorithms that excel in classification and regression tasks.SVM aims to find an optimal hyperplane that maximally separates different classes in the feature space.SVMs can handle both linearly separable and nonlinearly separable data using kernel functions, enabling the input data to map into a higher-dimensional feature space.SVM has demonstrated excellent performance in various applications, including image recognition, text categorization, and bioinformatics [14].
In this paper, an EEG signal classifier is implemented.The classifier consists of converting EEG from a spatial domain to moments by using OPs then those moments are reduced by using a sparse filter.Finally, the reduced moments are fed to the SVM classifier to select the class to which this EEG signal belongs.The paper is organized as follows: The methodology section consists of theoretical concepts of OPs, SFs, and SVM classifiers.The Proposed EEG signal classifier section shows how the proposed method is implemented.The results and discussion section are shown next.Finally, a conclusion section summarizes the work.

II. THEORETICAL BACKGROUND A. Orthogonal Polynomials
The signal (frame) is taken from the time (spatial) domain and transformed into the transform domain with the help of a transformation function, which is what a discrete transform is.Discrete transforms have many applications in signal processing, representation, and communication [15].The powerful ability to analyze the elements of various signals [16] is greatly enhanced because it permits viewing signals in multiple domains.On top of that, orthogonal polynomials (OPs) have been intensively investigated and used in a wide range of applications [10].OP generates Orthogonal Moments (OMs; see [11]).Because of their remarkable efficiency and effectiveness, QMs (scalar quantities) are used as a shape descriptor [17].
OPs include several types, such as the Krawtchouk polynomial (KP) [18] and the Tchebichef polynomial (TP) [8].The OP kernels of the KP and TP are used to construct the discrete Krawtchouk transform (DKT) and the discrete Tchebichef transform (DTT).Krawtchouk-Tchebichef polynomials (KTP) [10] and TchebichefKrawtchouk polynomials (TKP) [9] are recently proposed hybrids of the original Krawtchouk and Tchebichef polynomials.Discrete Krawtchouk-Tchebichef transform (DKTT) and Tchebichef-Krawtchouk transform (DTKT) refer to transformations implemented by the KTP and TKP kernels, respectively.Signals are converted from the spatial to the moment domain using KTP and TKP kernels.
Comparative analyses of DTKT, DKTT, DKT, DTT, and DCT are presented in [9], and [10].Compared to other real transforms, the KTP and TKP sets performed exceptionally well in energy compaction (EC) and localization properties [17].This analysis demonstrated that DTKT and DKTT could use the localization property in the time and space domains This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).Now the mathematical model of the KTP is represented as follows.The KTP's nth order, R n (x), is given by [10]: where t i (x) is the i th order TP given by [8]: This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
k i (n; p, N − 1) is the KP's i th order given as [15]: At last, 3 F 2 and 2 F 1 are the hypergeometric functions given by [10]:

B. Sparse Filtering
Instead of focusing on learning the input data distribution, the sparse filtering (SF) algorithm, which is a two-layer unsupervised feature learning algorithm, optimizes the sparsity of the learned representation.It can effectively scale with the input data size [12] and is used in this way frequently.The number of features that ensures optimal solution convergence is the only parameter that needs tuning, making it very easy to use.It has been successfully implemented in image and phone classification [13] and is computationally efficient.The following characteristics of the feature distribution matrix are exploited to optimize sparsity.
-Population sparsity occurs when a small set of activated features only represents each instance.
-Lifetime sparsity means that each feature is enabled for only a few instances.
-High dispersal: All the features must have a uniform contribution.
The SF uses a nonlinear transformation to map input features to as few output features as possible, allowing optimal performance.By averaging the squared values in the feature matrix across all instances, we can achieve a high dispersal by considering the mean squared activations of each feature (resulting in nearly equal values).The averaging ensures sparsity over a lifetime and population sparsity [19].
This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/)., where Y i ∈ R N ×L is a sample value of the M th sample and N << L. For this purpose, it is conventional to think of the features as rows in the matrix and the samples as columns; specifically, N is the number of original features, while L is the number of samples.The sparse filtering model is trained with the training set to obtain the weight matrix W ∈ R N ×P .When using sparse filtering, samples are mapped onto their corresponding features Z i ∈ R P ×L using the weight matrix, W obtained from training.P is the number of learned features obtained from the original features N where P << N. Linear features are calculated from each sample using Eq.(7).
where Z i l represents the l th feature of the i th sample.When a function is optimized with sparse filtering, the ℓ 2 norm of the function is used as a criterion.A feature matrix can be constructed using the Z i l .For features to be stable on the unit ℓ 2 , the feature distribution matrix must be normalized row by row and column by column.The row and column matrix normalization are shown in Eq. ( 8) and ( 9), respectively.
Lastly, the weight matrix solution W is calculated using cost function optimization, as shown in Eq. (10).
The term Ẑi 1 measures the sparsity of the i th sample [13].
Since the feature space Ẑi is limited to a ℓ 2 -ball, the cost function decreases in sparse feature cases.Thus, population sparsity is a target of the objective function.High dispersion is satisfied by normalized features because all features are kept distinct across all samples according to their ℓ 2 -norm.Lifetime sparsity is accomplished by optimizing both population sparsity and high dispersal.To put it another way, when a population is sparse, there are a lot of blank cells in the feature distribution matrix, and when dispersion is high, those blank cells are spread out evenly.As a result, all characteristics are sparse for a lifetime, as they all contain numerous zero values.Sparse filtering is further explained in [19].

C. Support Vector Machine
In machine learning, classification is a common task.In support vector machines, a data point is represented as an n-dimensional vector.Linear SVM uses an (n-1)-dimensional hyperplane to make the distinction.To maximize the difference between the two types of processed data, we can pick two parallel hyperplanes that separate them as much as possible.We call the area bordered by these two hyperplanes the margin.Different hyperplanes can be used to partition the data, but the best one is the one that creates the largest gap between the two groups (see Fig. 2).In other words, the hyperplane is chosen to determine the greatest possible distance between it and the nearest data point on either side.The samples on the border are called support vectors [14] once the maximum-margin hyperplane has been identified.e) The Z array is fed into a pre-trained sparse filter to select the most F -appropriate moments from each row of the  The proposed method's performance is tested by calculating the accuracy of the EEG signal classification.Datasets in [4] and [21] are used to test the proposed method's performance.The first dataset from [4] consists of 20 trials of EEG signal recordings chosen randomly between two classes (left-and right-hand movement).Each trial records 5 seconds of EEG signal at a rate of 128 samples per second, meaning that each trial has 640 samples.In this paper [4], the author uses Linear Discriminant Analysis (LDA) as a features extraction algorithm and hyperplane technique for classification.The second dataset from the globally known dataset "BCI Competition II (dataset number III)" [21] consists of 280 trials of EEG signal recordings chosen randomly between two classes (left-and right-hand movement).Each trial records 6 seconds of EEG signal at a rate of 128 samples per second, meaning that each trial has 768 samples.In this paper [21]   The results in Table I show that the proposed method has the highest accuracy among other methods used in the comparison.The conversion of the EEG signal to moments using OP and then reducing the number of those moments by the sparse filter will increase the SVM classifier accuracy.The sparse filter used in the proposed method plays an important role in the performance of this method.Table II shows that the classification accuracy became over 90% when the number of moments selected is about 30% of the original EEG signal samples.

V. CONCLUSION
The paper aimed to design an EEG signal classification method that used OP to transform the spatial EEG signal to moments and then reduce the number of those moments using a sparse filter.The results of this method applied to available EEG datasets of two classes show that the proposed method outperformed other methods by giving an accuracy of over 99% in its best case.The method shows that when the number of moments the sparse filter selects exceeds 30%, the This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 to extract features from the desired signal region.In addition, since these OPs redistribute signal energy into a limited number of polynomial coefficients, they exhibit substantial EC.Moments derived from an EEG signal using DTKT and DKTT are depicted in Fig. 1.The moment distribution shows they are more evenly distributed in the middle at DKTT and at the two ends at DTKT.Compared to the DTKT, the computational cost of extracting the test EEG signal's dominant moments (features) is drastically lower in the DKTT.Therefore, DKTT is employed here to extract features from the EEG signal.

Figure 2 :
Figure2: The support vectors and the best separation hyperplane[14]

Figure 3 :
Figure 3: Proposed EEG Signal Classifier Block Diagram , Continuous Wavelet Transform (CWT) is applied to the input (EEG) signal to extract the features.After transformation, those features This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 are classified by the Convolutional Neural Network (CNN), which has one convolution layer, one max-pooling layer, and one fully connected layer.The EEG samples are divided into training samples (80%) and testing samples (20%).5fold cross-validation is used to evaluate the proposed system in classification accuracy.The classification accuracy of the proposed method and the other two methods are shown in TableI.
This research employed an SVM classifier to classify the EEG data.Several SVM techniques exist, such as linear, quadratic, and cubic SVM classifiers.SVM is a binary classifier to classify numeric EEG data between two classes.Assume there are two data classes, and the purpose is to determine which [20]s a new data item belongs to.By giving a set of training samples, an SVM algorithm creates a model that assigns each sample to one of two categories, ensuring that a wide gap separates samples from different categories.Then, new samples are predicted based on which side they fall on[20].This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 ISSN:2222-758X e-ISSN: 2789-7362 array and generate moments table D of size N × F .f) The moments' table D and the classes' table Y are used to train the SVM classifier.This is an open access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).www.ijict.edu.iqIraqi Journal of Information and Communications Technology(IJICT) Vol.6, Issue 3, December 2023 Signal preparation: an EEG signal with M samples sampled at 128 (samples per second) is stored in table X test of size 1 × M .b) Multiply the X test with the pre-prepared OP P to generate moments with a size of 1 × M Z test .c) The Z test table is fed into a pre-trained sparse filter to select the most F -appropriate moments from the Z test table and generate moments table D test of size 1 × F .d) The SVM classifier uses the moments' table D test to select which class this signal belongs to.
Table II will demonstrate this role by showing how the classification accuracy is affected by the change in the number of moments selected.

TABLE II EFFECT
OF NUMBER OF MOMENTS SELECTED BY THE SPARSE FILTER IN THE CLASSIFIER ACCURACY