Biomedical Science and Research Journals | Rapid Seizure Classification Using Feature Extraction and Channel Selection

 


Rapid Seizure Classification Using Feature Extraction and Channel Selection

Abstract

The Seizure is an abnormal electrical activity in the brain; it can be diagnosed by a neurologist and could be classified using recorded data. Medical data, such as EEG signal usually contain many features and attributes that are not important for the classification process. Dimension reduction is an important step to reduce irrelevant information. Features extraction is one algorithm for dimension reduction step. Another one is the channel selection algorithm. These algorithms speed up the process of classification and improve accuracy. This paper proposes an approach based on extracting EEG features, channel selection to reduce the computation capacity, and trained model used for classification. Variance parameter is used for channels selection, by taking the maximum three ones. Eleven features are extracted from the selected channels and averaged to be the input for the classifier. Six classifiers are used to select the most accurate one. Ensemble classifier was the more accurate one to classify all seizure cases correctly as it is got 100% sensitivity for continuous testing and 97.6% for the random testing set.

Index Terms: Seizure, Epilepsy, Electroencephalography (EEG), Features extraction, Channel selection, Seizure Classification

Background

Epilepsy is the most common neurological condition affecting humans of all ages. Due to the unpredictable characteristics of epilepsy, this makes it an understood neurological disease. There are an anticipated 50 million human beings with epilepsy in the world, up to 75% live in corrupt countries with little or no get entry to scientific services or treatment. Epilepsy is described by recurrent seizures related to sudden sporadic neuronal releases in the cerebrum and can result in other problems, even death [12].

Electroencephalography (EEG) is one of the primary modalities regularly utilized for remote epileptic seizure recognition. It becomes an inexpensive and noninvasive stage to investigate the inconspicuous qualities of the disease. The Seizure is the characterizing property of epilepsy, which reflects abnormal periods of activity in the EEG [23]. Features extraction is a critical step in epilepsy detection. Its importance lies in building an accurate model for classification. It is profitable to restrict the number of information in a classifier to have models with less computation [4] . There are many techniques used for features extraction like Auto-Regressive (AR) [5] , Principle Component Analysis (PCA) [6] , Empirical Mode Decomposition (EMD) [7] , and Statistical features technique [8] . A statistical feature had been used for extracting features in several algorithms [910].

Full-channel EEG signals recorded using electrodes. Their number varies from 18 to 23 electrodes on the scalp is neither wearable nor computationally effective. The process of channel selection is used to eliminate calculations, particularly in real-time applications. With the channel selection, it aims to decrease the number of channels not including distinguishing information [1112].

Several algorithms used the concept of channel selection, such as; combining the advantages of both feature enhancement and channel selection to progress the detector performance [13] . Another algorithm compares electrode montage reduction by using only nine electrodes instead of using all electrodes [14] . Different algorithm selected EEG channels to eliminate power consumption in the detection process without affecting accuracy [15] , and variance, the difference invariance, entropy, random selection and extra focal channels, and doctor’s choice are also used and resulted from a valid range [16] .

Classification is the step of identifying groups or classes based on similarities between them. This step is essential in this proposed approach to differentiate between seizure itself –ictal- and normal non-ictal periods. Classification involves two main steps: First step, the dataset information or concepts are grouped into two classes (seizure and normal) to learn the model. The second step, the model of the previous step is used for classification [17] .

Several algorithms are used for this task such as Artificial Neural Network (ANN) [18] , Support Vector Machine (SVM) [1719], Linear Discriminant Analysis (LDA) [2021], K-nearest neighbors (KNN) [2223], decision tree [24] , ensemble [25] and logistic regression [26] . This paper is arranged as follow: section II describes the proposed algorithm. The EEG dataset is presented in Section III. Section IV lists the performance metrics evaluation. The results are presented in section V. Section VI shows the conclusion, and the paper ends with the references.

Proposed Approach

The main target of this work is to detect the epileptic seizures from EEG recording automatically. EEG records the ictal and non-ictal cases, the proposed approach detects every seizure states and it acceptable to classify the normal state to be a seizure in limited instances, and this makes the assurance of avoiding any complications while the opposite case is not fair. The proposed approach fundamentally relies upon consequently recognizing epilepsy from a brief period EEG recording. The proposed method is work as indicated by the following four stages: channel selection, feature extraction, averaging and classification as shown in (Figure 1).

The EEG signal is composed of 23 channels generated from electrodes, which attached on the scalp. These channels make calculations complex and increase the load on the system. Due to these limitations, the channels selection step is extensively essential. Once gutters are selected, this is the time to extract features. At this point, the averaging step is taking place for feature reduction. Finally, the features will be utilized to prepare the classifier. The prepared classifier will be used to test new cases to evaluate the performance of the ready classifier.

Variance Channel Selection

The channel selection step is intended to choose the most affected channels by seizure. Only one feature is used to be calculated for all channels, according to this feature a channel would be selected. Then the other features would be calculated for only the selected ones.

The simple method for selecting channels for features extraction and classification is the variance of EEG signal amplitude:

Where c is the channel, Χc is training seizure data, μc is the mean of training seizure data, k is the number of samples of training seizure data. The channels selection based on the highest values of 

Feature Extraction

In this step, a group of features of the EGG signal are extracted. These features are most influential in the form of the signal. From their synthesis, a classification model would be trained to classify the case as a seizure or not. They are extracted in ten seconds from EEG recording; these features are Standard Deviation, Mean, Variance, Median, Kurtosis, Skewness, Entropy, Moment, Power, Maximum and Minimum of the EEG signal.

Averaging

Each channel produces 11 features, so the input for each model would be 11 x 3 for each case, which would affect the real-time calculations and may prolong the classification time. Averaging the values of extracted features would reduce the number of input nodes of the model, and then eliminate the processing load and classification time.

Classification

Extracted averaged features would be used in this step to build the classification model. Different six models are created in this work to use the best model as a classifier. The used classifiers models are Support Vector Machine (SVM) which has the following characteristics; speed is medium, memory usage is also medium, and interpretability is easy. K-Nearest Neighbors (KNN); medium for speed and memory usage and hard for interpretability. Decision Tree; speed is fast, memory usage is small, and interpretability is easy. Ensemble and Linear Discriminant; quick, short and easy. Logistic Regression; fast, medium and smooth.

Dataset Description

This work is performed using the CHB-MIT EEG dataset, CHBMIT scalp EEG database became gathered on December 2010 at the children’s hospital of Boston. The dataset incorporates 23 cases for 22 patients. Patients in the dataset contain five adult males with ages from three to 22 years old and 17 female with a while from 1.5 to 19 years old. The dataset has a sampling rate of 256 samples per second with 16-bit resolution [27] .

Evaluation Metric Parameters

After finishing the training of the different classifier, each classifier is tested using 30 samples (containing both seizures and normal). The results are evaluated using accuracy, Sensitivity (True Positive Ratio - TPR), Specificity (True Negative Ratio - TNR), False Positive Ratio (FPR) (Fall-out), False Negative Ratio(FNR) (Miss- Rate), positive predictivity (Ppr), and F1 score [928].

Experimental Results

In this work, the model goes under two phases, one is the training phase, and the other is the test phase. In the training phase, the proposed model is trained using massive data (250 samples). The training phase is applied to the data after channel selection step, and the results are estimated for these cases.

In the test phase, the proposed model is evaluated in two cases. The first case, the test is carried out on random dataset (80 samples / each sample is 10 Sec period -2560 points for each channel), and the other is carried out on continuous dataset (82 samples). The continuous data is taken from patient number 3. The obtained results are shown in the tables below.

The proposed algorithm primarily goes through the channel selection stage, this selection is performed using the variance calculation of all channels, and this is illustrated in (Table 1).

Table 1 shows an example for one sample from patient 1, at hour 21 (from 387 to 397 sec). Then the algorithm selects the three channels with maximum variance, then extract all other features (11 features) for these selected channels only as shown in (Table 2).

Then the features extracted of the selected channels are averaged to minimize the calculations as in (Table 3). These steps are repeated for all input data (250 samples), to prepare the data for training the model. Some examples of the training samples are shown in (Table 4).

Before starting the model training, an important data observation step should be performed, the cases are plotted with any selected parameter, the mean is selected, and the result is as shown in (Figure 2).

As it is clear from (Figure 2), there are two extremely abnormal points noticed, which affects the accuracy of the model at this stage as in (Table 5). These points are inpatient 4, 12 hours 5, 32 at 510 to 520. These points are filtered from the training data, and now the data was ready for the training phase.

After model training, it should be evaluated by other data to ensure the accuracy and performance of the model. In this phase, the trained model is tested using two cases; random and continuous testing sets. Then the evaluation metrics are calculated for the six classifiers. The random testing data set is built from different patients at not connected EEG recording. The performance of this test is shown in the following (Table 6) (Figure 3) shows the result in graphical form.

The other testing data set are continuous dataset from one patient with connected EEG recording. Also, the performance result is mentioned in the next (Table 7) with graphical form as in (Figure 4).

Result Discussion

From (Figure 5), it is clear that the proposed model successfully detected all cases of seizure outshout any missing, while there are limited normal instances are classified as a seizure. It is evident from the quantitative results that are in (Table 7). The model Sensitivity is 100% which means the same indication as well as the MisRate equals to zero, which means the detection of all positive –seizure cases- are well classified. From that, it would be taken into consideration that sensitivity is the primary criterion for evaluating the model. The higher the sensitivity of the model is the more accurate model.

Accuracy parameter is also essential, but it is not the better indication of the model performance to detect the seizure cases, however in training or testing phase. For example KNN model gives an accuracy higher than all other models in training phase as shown in (Table 5)&(Table 6). But, in its sensitivity is lower than ensemble model, these parameters are also evaluated by MisRate in the same (Table 6) which is equals to 13.9% for KNN and 2.3% for ensemble. This final value indicates that ensemble model classifies all seizure cases except only one example –FN- only while KNN miss 6 cases, all these results makes the sensitivity parameter is the primary parameter of model evaluation.

Conclusion

Epilepsy patient suffers from a lot of convulsions and complications, so, it is strongly needed a model that works to determine the seizure as fast as possible to avoid these symptoms. This paper presents a proposed algorithm based on extracted EEG features, channel selection to reduce the computation capacity, and trained model for classification. Variance parameter is calculated for all channels and then used to select the channels by taking the maximum three ones. Eleven features are extracted from the selected channels and averaged to be the input for the classifier in both training and testing steps. Six classifiers are used in this work; Support Vector Machine, Linear Discriminant Analysis, K-nearest neighbors, decision tree, ensemble and logistic regression. The classifiers are tested using two sets of data; random data and continuous data. The results showed that the ensemble classifier was the more accurate to classify all seizure cases correctly as it is got 100% sensitivity for continuous testing data set and also the maximum sensitivity with 97.6% for the random testing set.

Compliance with Ethical Standards

Conflict of Interest

Author Athar Ein-shoka declares that she has no conflict of interest. Author Mohamed Moawad declares that he has no conflict of interest. Author Ahmed Salem declares that he has no conflict of interest. Author Ayman El-sayed declares that he has no conflict of interest.

Ethical Approval

In this study, cases of epilepsy patients are treated through an offline database from the physionet site.

This article does not contain any studies with human participants or animals performed by any of the authors.

Comments

Popular posts from this blog

Biomedical Science and Research Journals | Advanced Applications of Biomaterials Based on Alginic Acid

Biomedical Science and Research Journals | The Need of Early Detection of Positive COVID-19 Patients in the Community by Viral Tests (e.g. RT-PCR Tests) and Antibody Tests (Serological Tests) to Stop the Spread

Biomedical Science and Research Journals | Substance Identification in Anti-Doping Control-Some Issues