Novel Multi-Modal Throat Inflammation and Chest Radiography based Early-Diagnosis and Mass-Screening of COVID-19
Abstract
Background:
The upsurge of COVID-19 has received significant international contemplation considering its life-threatening ramifications. To ensure that the susceptible patients can be quarantined to control the spread of the disease during the incubation period of the coronavirus, it becomes imperative to automatically and non-invasively mass screen patients. The diagnosis using RT-PCR is arduous and time-consuming. Currently, the non-invasive mass screening of susceptible cases is being performed by utilizing the thermal screening technique. However, with the consumption of paracetamol, the symptoms of fever can be suppressed.
Methods:
A novel multi-modal approach has been proposed. Throat inflammation-based mass screening and early prediction followed by Chest X-Ray based diagnosis have been proposed. Depth-wise separable convolutions have been utilized by fine-tuning Xception Net and Mobile Net architectures. NADAM optimizer has been leveraged to promote faster convergence.
Results:
The proposed method achieved 91% accuracy on the throat inflammation identification task and 96% accuracy on chest radiography conducted on the dataset.
Conclusion:
Evaluation of the proposed method indicates promising results and henceforth validates its clinical reliability. The future direction could be working on a larger dataset in close collaboration with the medical fraternity.
1. INTRODUCTION
The novel coronavirus disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is highly contagious with common symptoms including fever, dry cough, headache, sore throat, and chest pain. It is essential to control the spread of this contagious disease.
Thermal screening fails as it measures the body's surface temperature and does not directly indicate fever. Fever as a prodrome varies from person to person and alters several times a day. Also, with the consumption of paracetamol, the symptoms of fever can be suppressed. Thus, it becomes crucial to identify another characteristic for mass screening.
Chest Radiography based diagnosis is also conducted and analyzed by radiologists to look for visual indicators associated with SARS-CoV-2 viral infection. The Reverse Transcript Polymerase Chain Reaction (RT-PCR) has been accepted as a gold standard for the COVID diagnosis. However, it is a cumbersome process. Figs. (1 and 2) showcase the Chest X-Rays of normal and COVID-19 infected patients.
Coronavirus enters through the upper respiratory tract and multiplies in the mucosa of the nasopharynx and oropharynx, leading to irritation and slight inflammation [1, 2] in the throat. Redness in the pharynx and tonsil as compared to healthy patients can be understood from Figs. (3 and 4).
Early identification of the susceptible patients is possible by detecting the redness at an early stage. Hence, throat inflammation has been identified as a symptom for early diagnosis of COVID-19. Furthermore, swelling in pharyngitis cannot be circumvented by the consumption of any drug. The proposed method makes mass screening easy and reduces the burden on the medical and the paramedical fraternity.
Detection via the proposed method can be done without direct exposure to the infectious person, which is a massive advantage compared to various other screening methods.
2. RELATED WORK
COVID-19 testing methods include invasive laboratory-based methods like RT-PCR test and Rapid antigen test, with thermal screening being the only available method for mass screening. The COVID-19 RT-PCR test is a real-time reverse transcription-polymerase chain reaction (RT-PCR) test for the detection of nucleic acid from SARS-CoV-2 in upper and lower respiratory specimens (such as nasopharyngeal or oropharyngeal swabs, sputum, lower respiratory tract aspirates, bronchoalveolar lavage, and nasopharyngeal wash/aspirate) collected from individuals suspected of COVID-19 by their healthcare provider (HCP), as well as upper respiratory specimens (such as nasopharyngeal or oropharyngeal swabs, nasal swabs, or mid-turbinate swabs) collected from any individual, including for testing of individuals without symptoms or other reasons to suspect COVID-19 infection.
Laboratory-based testing methods are invasive and time-consuming. For mass screening, currently, only thermal-based mass screening techniques are applied, but thermal mass screening fails as high body temperature is not constant for infected patients, as well as thermal screening is affected by various human, environmental, and equipment parameters. Moreover, feverish symptoms can be concealed by the consumption of paracetamols. Hence, methods that can be applied and thermal mass screening that could assist in the forestall of COVID-19 are required. Chest radiography based COVID-19 detection has also been proposed [3]. Since a virus causes infection in the lungs, analysis of chest radiography images is manually performed by radiologists to screen infected patients.
Throat inflammation is also identified as a potential symptom of the coronavirus family. To identify throat inflammation by the fuzzy logic system [4], a 3-Channel image is an input to the system. After the preprocessing, the red channel boundary was identified for the color-based inference system. This method is said to give 80% accuracy for detecting throat inflammation. Another method ascertained was to use a green channel image for infected region extraction, followed by mean value calculation and using a Sobel edge detector [5]. In this, the RGB image and infected region area are extracted and provided to extract red color intensity and infected area in pixels. For detection of COVID-19 through chest-based X-Ray radiography, the use of Artificial Intelligence and Deep Learning techniques [6-8] tend to assist radiologists in the Diagnosis of the disease.
Deep Learning network termed DarkCovidNet [9] based on X-ray images for automated COVID-19 diagnosis was also developed. This DarkCovidNet is based upon YOLO-You Only Look Once Architecture, mainly used for real-time object detection. Their proposed model is developed to provide accurate diagnostics for binary classification (COVID vs. No-Findings) and multi-class classification (COVID vs. No-Findings vs. Pneumonia). Investigators have recognized significant discoveries in imaging studies of COVID-19.
Deep convolutional neural networks (DCNNs) are one of the robust deep learning architectures and have been widely applied in many practical applications such as pattern recognition and image. DCNNs handle the use cases by training the neural network weights on huge available datasets followed by fine-tuning the network weights of a pre-trained DCNN based on small datasets. COVIDX-Net [10] model was developed considering X-ray images with seven different CNN models. The class activation mapping (CAM) and gradient-weighted class activation mapping (Grad-CAM) methods have been proposed by X to provide more insight for model decisions. Heatmap localization was produced to highlight the important regions that are closely associated with predicted results.
A dual-sampling attention [11] network to classify the COVID-19 and CAP infection was proposed. To focus on the lungs, the method leverages a lung mask to suppress image context of non-lung regions in chest CT followed by refining of the attention of the deep learning model through an online mechanism to better focus on the infection regions in the lungs. Chest CT has also been utilized for COVID-19 classification and lesion localization [12], such that the lung region was segmented using a pre-trained UNet, and for the prediction of the probability of the disease, it is fed to a 3D deep neural network. COVID-Net [13] was proposed, the first neural network architecture designed for COVID-19 detection to introduce a lightweight projection-expansion-projection-extension (PEPX) design, which enables enhanced representational capacity while significantly reducing computational complexity.
This study proposes a multi-modal method focused on infected patient screening using throat image analysis followed by inference using chest X-ray. If the patient is noted as infectious under throat-based screening, then a Chest X-ray-based diagnosis can be performed to validate the results. The proposed method utilized Depth-Wise separable convolutions for fine-tuned Mobile Net based architecture for throat infection analysis and fine-tuned Xception Net based model for chest radiography analysis.
3. METHODOLOGY
Efficient COVID-19 diagnosis necessitates a dedicated organization of techniques that can be deployed for mass screening and early prediction. This section is intended to formulate the deep learning architectures implemented for throat-inflammation-based mass-screening and chest x-ray based early prediction.
Concepts behind depth-wise separable convolutions have been showcased with mathematical and schematic illustrations. Fine-tuning of MobileNet for mass-screening and Xception Net for early prediction are elucidated in the subsequent sub-sections. Explanation of the implemented loss function and the detailed mathematical reasons behind the choice of the optimizer have also been presented.
3.1. Depth-Wise Separable Convolution
Depth-Wise Separable Convolutions [14, 15] are a form of factorized convolutions. A standard convolution is factorized into a depth-wise convolution and 1 X 1 convolution known as point-wise convolution. In a standard convolution new set of outputs are generated by filtering and input-combining in a single step. The depth-wise separable convolution divides this process into two layers: filtering and combining. A single filter is applied to each input channel in depth-wise convolutions. The outputs of the depth-wise convolutions are then combined by the 1 X 1 point-wise convolutions. This technique significantly reduces computational complexity and model size. Consider the following notations:
F – Input Feature Map
G – Output Feature Map
Ĝ – Filtered Output Feature Map
– Depth-wise convolutional kernel
DF – Spatial Width and Height of the square input feature map
M – Number of input channels (input depth)
DG – Spatial Width and Height of a square output feature map
N – Number of output channel (output depth)
DK – Spatial width and height of the square kernel
A standard convolution layer takes input as a feature map F of dimension DF × DF × M, and generates output as a feature map G of dimension DF × DF × N.A. A convolution kernel K of dimension DK × DK × M × N. Considering stride one and padding, the output feature map is computed as shown by Eq 1.
(1) |
Thus, the computational cost for standard convolution can be computed, as shown by Eq 2.
(2) |
From equation 2, it can be observed that the number of input channels, kernel size, number of output channels, and the size of the feature map controls the computational cost multiplicatively. The proposed architecture addresses these terms by implementing depth-wise separable convolutions to drop the interaction between the output channel's number and the kernel's size. For substantial computational cost reduction, the filtering and combination steps are split into two steps by utilizing factorized convolutions: depth-wise separable convolutions.
Depth-wise convolutions and point-wise convolutions are the constituents of depthwise separable convolution. Depth-wise convolutions apply a single filter per each input channel. 1 X 1 point-wise convolution then creates a linear combination of the depthwise layer's output. Batchnorm and ReLU non-linearities are utilized for both depth-wise and point-wise convolution layers. The standard and depth-wise separable convolution has been distinguished, as shown in Fig. (5). For one filter per input channel, the depth-wise convolution is as shown by equation 3, where is the depth-wise convolution kernel of size DK × DK × M, where the mth filter in is applied to the channel of the filtered output feature map Ĝ .
(3) |
Depth-wise convolution is significantly efficient relative to standard convolution. The computational cost of depth-wise convolution is as shown by Eq 4.
(4) |
Depth-wise convolution only filters input channels; however, these are not combined to create new features. An additional layer to compute a linear combination of the output of depthwise convolution through 1 X 1 convolution is needed to generate these new features.
Thus, the combination of depth-wise convolution and point-wise convolution is called Depth-Wise Separable Convolution. Fig. (6) helps visualize the architectural difference between standard and depth-wise separable convolutions considering 3 X 3 convolutions. The sum of the depthwise and 1 X 1 point-wise convolutions gives us the Depth-Wise Separable Convolutions cost, as shown by Eq 5.
(5) |
Hence, by expressing convolution as an amalgamation of separate filtering and combination steps, the computation cost can be significantly reduced, as shown by Eq 6.
(6) |
Thus, the proposed method implements depth-wise separable convolution for efficient throat inflammation and chest x-ray based COVID-19 diagnosis.
3.2. Throat Inflammation based COVID-19 Early Prediction
Throat inflammation is one of the perceivable symptoms of COVID-19 that can be utilized for efficient mass screening of susceptible patients. This section formulates the deep learning architecture for throat inflammation detection, which can be deployed for accurate early prediction of Coronavirus patients. The utilization of fine-tuning and transfer learning techniques has been proposed considering the limited dataset acquired from web-scrapping and medical professionals' aid.
MobileNet consisting of 28 layers utilize 3 X 3 depthwise separable convolutions. The original MobileNet has been trained and evaluated over millions of images across 1000 classes of the ImageNet [16] dataset. Dense layers are added to the pre-trained model, and the entire model has been fine tuned. Using the pre-trained weights entirely is not practical since this research focuses on biomedical image-based classification for COVID-19 prediction. Since the initial layers extract low-level features, including edges and shape, preserving them becomes imperative. As shown in Fig. (7), dense layers have been added for classification, and all the layers except the last 21 layers have been frozen. Experiments on the fine-tuned model indicate promising results and validate the described idea.
3.3. Chest X-Ray based COVID-19 Diagnosis
Once the mass screening and early-prediction have been performed by utilizing throat inflammation-based analysis, accurate diagnosis can be performed by implementing the proposed architecture for Chest X-Ray based analysis since several COVID-19 patients have been diagnosed with pneumonia; hence radiological examination is considered beneficial.
Xception architecture consisting of a linear stack of 36 depth-wise separable convolutional layers with residual connections for feature extraction has been effectively fine-tuned. As shown in Fig. (8), specific dense layers have been added for accurate classification. The original XceptionNet has been trained and evaluated over millions of images across 1000 classes of the ImageNet dataset. As mentioned previously, initial layers extract fundamental features; subsequently, the first 26 layers are frozen. Experiments showcase commendable results and indicate the reliability of the method for confirming the patient's susceptibility.
3.4. Loss Function and Label Smoothing
The proposed architectures for Throat Inflammation and Chest X-Ray based COVID diagnosis deploy the Categorical Cross-Entropy Loss, also known as the Softmax Loss: Softmax Activation followed by a Cross-Entropy Loss.
The architecture is trained to output the probability distribution over the classes for each image. Considering the One-Hot Encoding of the classes, the loss can be elucidated as:
The derivative respect to positive and negative classes can be shown by Eqs. 7 and 8.
(7) |
(8) |
Label smoothing technique replaces one-hot encoded label with its mixture with uniform distribution. Label Smoothing is beneficial when the loss function is cross-entropy, and the model applies the softmax function to the penultimate layer to compute its output probabilities. The One-Hot encoded labels encourage the most extensive possible logit gaps to be fed into the softmax function.
Intuitively, large logit gaps combined with the bounded gradient make the models less adaptive and too confident about their predictions. The smoothed labels encourage small logit gaps and subsequently result in better model calibration and prevent overconfident predictions.
3.5. Optimizer
To ensure quick convergence, such that the loss function reaches the local minima in less number of epochs, a meticulously selected optimizer must be incorporated. Adam [17] optimizer ensures quick convergence, such that the loss function reaches the local minima in fewer number of epochs. Adam's learning rate is scaled by utilizing squared gradients, and the advantage of the Momentum is taken by applying the moving average of the gradient.
If the decaying hyperparameter determines how rapidly accumulated previous gradients decay is much larger than the learning rate, then the accumulated previous gradients will be dominant in the update rule; hence the gradient at the iteration will not change the current direction rapidly. On the other hand, if the decaying hyperparameter is much smaller than the learning rate, the accumulated gradients act as a smoothing factor for the gradient.
To overcome this, Nesterov Accelerated Gradient (NAG) [18] calculates the gradient w.r.t approximate future position of various parameters. NAG thus acts as the correction factor for the Momentum method. Moreover, NAG mitigates the issue of oscillations that arise in the Momentum method at large learning rates. Nesterov-accelerated Adaptive Moment Estimation (Nadam) [19, 20] combines Adam and NAG, such that Nadam can be interpreted as Adam with Nesterov momentum
Consider the following notations:
wt – Weight at timestamp t
gt – Gradient w.r.t w at timestamp t
α – Learning Rate
m – Estimate for the first moment of gradient
v – Estimate for the second moment of gradient
Such that:
(9) |
(10) |
(11) |
(12) |
Considering ADAM as an optimizer:
(13) |
Considering NADAM as an optimizer:
(14) |
The Nesterov acceleration over Adam promotes faster convergence of the loss function to the local minima as compared to Adam. Hence, the proposed architecture utilizes Nadam as an optimizer for the Throat Inflammation based mass-screening and early-prediction; and Chest X-Ray based COVID-19 diagnosis.
4. EXPERIMENTS AND RESULTS
4.1. Data Preparation
For throat images, the data was collected from the web through extensive web scraping. Around 200 images were identified to be of interest, out of which 146 images had sufficient visual information for classification and were then segregated into infected and normal images by a medical professional [21]. Moreover, image augmentation steps like width shifting, height shifting, brightness range, zooming were performed for the deep learning-based method so that the learned model could be better at generalizing all conditions. Augmentation resulted in a total of 300 images which were further divided into training and validation.
An openly available public dataset was utilized for training the Chest X-ray model. To train the chest X-Ray model, two combined datasets, i.e., for COVID Positive images, 182 images of the posteroanterior (PA) view were collected from the IEEE Dataport [22], which is a growing collection of chest radiography images and images of normal patients were obtained from Kaggle's Chest X-Ray (Pneumonia) [23] dataset.
4.2. Training Details
Model training was achieved using the Tensorflow framework. The input image was resized accordingly to the input requirement of the models. The throat infection detection model was trained using the following hyperparameters:
- Number of epochs: 100
- Batch size: 32
- Optimizer: NADAM
- Momentum Parameters: β1 = 0.9 and β2 = 0.999
- Learning Rate: 3e-4
The model for Chest X-Ray analysis was trained using the following hyperparameters:
- Number of epochs: 10
- Batch size: 32
- Label Smoothing: 0.01
- Optimizer: NADAM
- Momentum Parameters: β1 = 0.9 and β2 = 0.999
- Learning Rate: 3e-4
The convergence of training and validation loss for the throat infection model is shown in Figure 9. The figure shows that the model correctly classifies whether the input image is throat infected or not.
As showcased in Fig. (9), training loss and validation loss are converging with an increase in epochs, and there is not any significant difference in training and validation loss; hence one can even conclude that model is not overfitting. The training accuracy can be observed in Fig. (10).
Further, the chest X-Ray classification model for COVID-19 architecture demonstrates promising results as represented (Figs. 11 and 12).
4.3. Evaluation
For evaluating medical models, choosing the right and sufficient evaluation metrics is necessary as these metrics play an important role in ascertaining the practical performance. In this section, various evaluation metrics are described and calculated.
The confusion matrix mentions the model's outcomes in terms of the true positive, the true negative, the false positive, and the false-negative samples. Tables 1 and 2 show the Confusion Matrix for the model's predictions w.r.t the ground truth.
Normal | Infected | |
---|---|---|
Predicted Normal | 45 | 5 |
Predicted Infected | 4 | 46 |
Normal | Infected | |
---|---|---|
Predicted Normal | 43 | 2 |
Predicted Infected | 1 | 44 |
These values can be used to calculate different parameters such as Positive Predicted Value (PPV), Negative Predicted Value (NPV), Sensitivity, Specificity, F1 Score, and Accuracy. All the metrics mentioned above have been calculated and summarized in Table 3.
Evaluation Metric |
Throat Inflammation | Chest Radiography |
---|---|---|
Sensitivity | 0.90 | 0.95 |
Specificity | 0.91 | 0.97 |
PPV | 0.92 | 0.97 |
NPV | 0.90 | 0.95 |
FPR | 0.08 | 0.02 |
FNR | 0.09 | 0.04 |
Accuracy | 0.91 | 0.96 |
F1 Score | 0.91 | 0.96 |
MCC | 0.82 | 0.93 |
CONCLUSION
A novel multi-modal method for mass screening and early diagnosis of potential coronavirus patients has been proposed. The proposed method identifies the symptoms of throat inflammation by leveraging dedicatedly fine-tuned Mobile Net based architecture. The results are then validated by performing Chest X-Ray based COVID diagnosis by implementing fine-tuned Xception Net architecture. Depth-Wise separable convolutions have been deployed to promote the reduction in computational complexity. NADAM optimizer has been leveraged to promote faster convergence. Evaluation of the proposed method indicates promising results and subsequently validates the clinical reliability of the idea.
The future direction of this work could be working on a larger dataset in close collaboration with the medical fraternity.
ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Not applicable.
HUMAN AND ANIMAL RIGHTS
No animals/humans were used for studies that are base of this research.
CONSENT FOR PUBLICATION
Not applicable.
AVAILABILITY OF DATA AND MATERIALS
The data that support the findings of this study are available within the article.
FUNDING
None.
CONFLICT OF INTEREST
The authors declare no conflict of interest, financial or otherwise.
ACKNOWLEDGEMENTS
Declared none.