Non-invasive Estimation of Haemoglobin Level Using PCA and Artificial Neural Networks
Abstract
Objective:
Haemoglobin(Hb) measurement is generally performed by the traditional “fingerstick” test i.e., by invasively drawing blood from the body. Although the conventional laboratory measurement is accurate, it has its own limitations such as time delay, inconvenience of the patient, exposure to biohazards and the lack of real-time monitoring in critical situations. Non-invasive Haemoglobin Measurement (SpHb) has gained enormous attention among researches and can provide an earlier diagnosis to polycythemia, anaemia, various cardiovascular diseases, etc. Currently, Photoplethysmograph signal (PPG) is used for measuring oxygen saturation, to monitor the depth of anesthesia, heart rate and respiration monitoring. But through detailed statistical analysis, PPG signal can provide further information about various blood components.
Investigation / Methodology:
In this paper, an approach for non-invasive measurement of Hb using PPG, Principal Component Analysis (PCA) and Neural Network is proposed. A transmissive type PPG sensor is developed which is interfaced with Crowduino for the acquisition of PPG. From the obtained PPG signal, Principal Components (PC) are extracted. SpHb is predicted followed by the extraction of features from the PC. The analysis involves the SpHb prediction using a single PC, double PC and finally all the three PC. The predicted SpHb is evaluated with Hb_{lab} in terms of R-value, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error.
Conclusion:
An approach for non-invasive measurement of Hb using Principal Components obtained from the PPG signal is discussed. The SpHb value is compared with the Hb_{lab} values. Correlation R-value between SpHb and Hb_{lab} is 0.77 when three principal components are used. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) between SpHb and Hb_{lab} are 0.3, 0.44 and 0.6633 respectively when SpHb is measured with three principal components. It is evident from the result analysis that SpHb shows the promising result when all the three principal components are used. However, one of the limitations of the work is that the population setting chosen for the work does not include paediatric patients, accurately ill patient, pregnant population and surgical patients. With detailed analysis on a wide range of population setting, Hb prediction using PPG is a promising approach for non-invasive measurement.
1. INTRODUCTION
Haemoglobin (Hb) is a complex protein molecule in red blood cells which is responsible for carrying oxygen from the lungs to the body's tissues and to return carbon dioxide from the tissues back to the lungs. To ensure adequate tissue oxygenation, to screen for and help diagnose conditions that affect Red Blood Cells (RBCs), to assess the severity and diagnosis of anaemia/polycythemia, sufficient haemoglobin level must be maintained. Haemoglobin measurement is one of the most frequently performed laboratory tests. Test is performed during general health examination or when a person has signs and symptoms of a condition affecting red blood cells such as anaemia or polycythaemia [1-3]. This test is also performed several times or on a regular basis when someone is diagnosed with ongoing bleeding problems or chronic anaemia. Haemoglobin test is prescribed for determining the effectiveness of treatment for patients undergoing treatment for cancer. Haemoglobin test is one of the mandatory steps to make decisions during blood transfusions.
Haemoglobin measurement is generally performed by the traditional “fingerstick” test i.e., by invasively drawing blood from the body. The health professional will clean the finger, then prick the tip of it with a tiny needle (or lancet) to collect the blood. Collecting a sample of blood is only temporarily uncomfortable and can feel like a quick pinprick. Although the conventional laboratory measurement is accurate, it has its own limitations such as time delay, inconvenience of the patient, exposure to biohazards and the lack of real-time monitoring in critical situations.
Recently, non-invasive haemoglobin monitoring technology has gained popularity [4-9]. Various methods are adopted and are under research which is employed for non-invasive measurement of Haemoglobin which includes Multi-wavelength photometric measurement method [10-12], Diffuse Optical Spectroscopy (DOS) [13], Optical Method - Palpebral conjunctiva [14, 15], Optoacoustic technique [16] and Bioelectrical impedance analysis (BIA) [17, 18],
PPG signal has varied applications ranging from an estimation of heart rate, blood pressure, cardiovascular parameters and various other physiological parameters [19-23]. In this paper, Photoplethysmograph (PPG) signal is used for non-invasive estimation of Haemoglobin along with Principal Component Analysis and Neural Networks.
2. MATERIALS AND METHODS
In this paper, an attempt has been made for non-invasive estimation of haemoglobin level by adopting Principal Component Analysis (PCA) [24] in combination with neural networks.
2.1. Sensor Development
A sensor with 3 LEDs on one side of the sensor and a photodetector at the other side of the sensor is developed for the acquisition of a PPG signal. LEDs with wavelengths 670nm, 808nm and 905 nm are chosen since, the blood absorption is highly dominated by haemoglobin at these wavelengths [25, 26]. The sensor uses the principle of transmission Photoplethysmography (PPG). The sensor output is read by the Arduino board, which then transfers the data to the PC through a serial interface. The PPG output is fed to an ADC channel of Arduino to convert it into digital counts for further processing. Here, Crowduino board which is a clone of Arduino Duemilanove is used. Optical densities are calculated from the PPG of the subjects and are given to the PCA, which outputs three principal components. We have analysed the PCA in conjunction with neural networks for three cases. First, the neural network model is trained and tested with only one principal component and the results are observed and then it is trained with two and three principal components. All three cases are analysed in terms of accuracy.
A prediction model is developed using PCA and neural networks, with PPG readings given as input to PCA and the PCA output is given as input to the neural network. The invasive Hb measurements (Hb_{lab}) serve as targets to neural networks.
2.2. Database Description
For this research, primary database is obtained from 30 outpatients of SreeAbhirami Hospitals, Coimbatore. Ethical clearance for undertaking the study in the hospital was obtained. Also, Informed Consent is obtained from the subjects before the acquisition of PPG signal.The characteristics of the patients are given in Table 1. The blood Hb readings obtained through laboratory test based on the venous blood sample are used as referencevalues for validation of the proposed method. The readings are collected over a 1 minute window.
Subject | Age(years) | Gender | Height (cm) | Weight (Kg) | Laboratory Hb |
---|---|---|---|---|---|
1 | 42 | M | 185 | 99 | 15.5 |
2 | 43 | M | 175 | 76 | 14.4 |
3 | 23 | F | 149 | 57 | 11.6 |
4 | 15 | M | 172 | 69 | 12 |
5 | 63 | F | 184 | 67 | 12.5 |
6 | 23 | F | 176 | 55 | 13 |
7 | 46 | M | 163 | 78 | 14.8 |
8 | 45 | F | 155 | 89 | 12.3 |
9 | 54 | M | 166 | 75 | 15.5 |
10 | 35 | M | 168 | 62 | 16 |
11 | 26 | F | 148 | 75 | 11.3 |
12 | 57 | F | 156 | 71 | 12.6 |
13 | 43 | M | 174 | 77 | 16.1 |
14 | 41 | F | 177 | 64 | 10.5 |
15 | 48 | F | 158 | 72 | 11.6 |
16 | 24 | M | 172 | 83 | 15.8 |
17 | 51 | F | 168 | 53 | 11.9 |
18 | 56 | M | 175 | 76 | 17 |
19 | 41 | M | 163 | 84 | 16.6 |
20 | 53 | F | 159 | 63 | 12.8 |
21 | 29 | F | 161 | 67 | 13.6 |
22 | 19 | M | 183 | 95 | 15.3 |
23 | 40 | M | 175 | 82 | 16 |
24 | 37 | M | 173 | 83 | 16.9 |
25 | 58 | F | 163 | 73 | 12.7 |
26 | 48 | F | 158 | 79 | 11.5 |
27 | 36 | M | 175 | 85 | 16.5 |
28 | 48 | M | 169 | 78 | 16.4 |
29 | 27 | F | 153 | 85 | 11.6 |
30 | 34 | F | 159 | 67 | 12.2 |
2.3. Pre-Processing
The obtained PPG signal is first pre-processed to remove the noises and baseline wandering. For baseline wandering removal, Moving Average Algorithm was used followed by wavelet denoising.
2.4. Principal Component Analysis
As NIR spectroscopic data is highly co-linear, the data from the two adjacent wavelengths have high correlation coefficients. Principal Component Analysis (PCA) is the right method which is optimal for handling co-linearity like the one which is present in spectroscopic data. The principal component analysis is a mathematical technique used to find patterns in a large data set. It transforms the correlated variables to a smaller number of uncorrelated variables called principal components. It helps to reduce the dimension of the data set in identifying new underlying variables.
Let X be the input matrix with m dimensions
(1) |
The variance of X is
(2) |
Let us consider the linear combinations
(3) |
(4) |
(5) |
Using these linear regression equations, we can predict Yi from X1, X2, ....Xp. There is no intercept but e_{i}_{1}, e_{i}_{2}....e_{ip} can be viewed as regression coefficients.
Y_{i} is a function of input data X which is a random data. Therefore, it has a variance equal to
(6) |
Similarly, the covariance of Y_{i} and Y_{j} is given as
(7) |
Here, the coefficients e_{ij} are collected into the vector
(8) |
In PCA, the first principal component is the linear combination of x-variables that have maximum variance among all linear combinations, so it accounts for as much variation in the data as possible. The coefficients e11, e12....e1p will have maximum variance, subject to the constraint that the sum of the squared coefficients is equal to one.
Select e_{11}, e_{12}....e_{1}_{p}that maximize
(9) |
subject to the constraint that
(10) |
All subsequent principal components have this same property - they are linearcombinations that account for as much of the remaining variation as possible and they are not correlated with the other principal components
3. APPLYING PCA ON PPG DATA
The training set consists of PPG readings from three different LEDs.Summary of acquiring Principal components from PPG data is briefed in the flowchart (Fig. 1).
The training set is used to build the PCA model. PCA has been carried out and the component scores are obtained through the following steps.
- A data set matrix is constructed representing the optical densities calculated from PPG readings of subjects with the developed sensor.
- Each sample in the data matrix consists of 3 PPG readings from 3 different LEDs for each subject.
- The covariance matrix is calculated for the input data matrix.
- Eigenvectors and Eigenvalues are obtained from the covariance matrix. Eigenvectors are chosen that corresponds to the k largest eigenvalues, where k is the number of dimensions of the new feature subspace.
- Projection matrix is constructed from the selected k eigenvectors.
- Original dataset X is transformed via a Projection matrix k-dimensional feature data matrix Y.
After obtaining the principal components from the data matrix Y, these Principal components are used as the training vectors for artificial neural networks.
4. PCA BASED NEURAL NETWORKS
Here, principal components obtained from the analysis are used to train the neural networks. The training set consists of the three principal components obtained after performing PCA on PPG data. A three-layer neural network is used. The hidden layer has 10 hidden neurons and the output has only one neuron. In the hidden layer, the weighted sum of inputs with the sigmoid activation function (fs) is processed. The output layer has single neuron with linear activation function where the weighted sum of outputs of the hidden layer with linear activation function is processed to give the final output of the network.
By applying PCA on each PPG data sample obtained from patients, three principal components for each sample are obtained. These principal components are given as inputs to the neural network in different combinations. The invasive laboratory Hb values are given as targets to train the network and predict the near future values of Hb.
4.1. Regression Analysis
In the following section, SpHb is calculated using Principal Components and its efficiency is analysed using R-value from SPSS IBM software, Mean bias (Mean Absolute Error, MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE). In the following equations, y_{j} denotes the Hb_{lab} value and y_{j} denotes the predicted SpHb value.
(11) |
(12) |
(13) |
4.2. Regression Analysis with One Principal Component
Three Principal Components PC1, PC2 and PC3 are obtained. Initially, training and testing are carried out using only one principal component. SPSS^{®} software is used for analysing the results. The coefficient parameter ‘R’ is calculated for the estimated non-invasive Haemoglobin (SpHb) level and laboratory Hb. R-value of 0.55, 0.56 and 0.53 is obtained for PC1, PC2 and PC3, respectively in the regression analysis signifying moderate correlation between SpHb and Hb_{lab}. Mean bias between Hb_{lab} and SpHb is 3.45 g/dL, 3.1 g/dL and 4.2 g/dL for PC1, PC2 and PC3 respectively. Mean Square Error (MSE) between Hb_{lab} and SpHb is 1.83, 1.6 and 2.04 for PC1, PC2 and PC3, respectively. Root Mean Square Error (RMSE) between Hb_{lab} and SpHb is 1.3527, 1.2649 and 1.4283 for PC1, PC2 and PC3, respectively.
Principal Component used for SpHb | Correlation R value |
Mean Bias / Mean Absolute Error | Mean Square Error | Root Mean Square Error |
---|---|---|---|---|
PC1 | 0.55 | 3.45 | 1.83 | 1.3527 |
PC2 | 0.56 | 3.1 | 1.6 | 1.2649 |
PC3 | 0.53 | 4.2 | 2.04 | 1.4282 |
PC_{12} | 0.64 | 2.53 | 1.48 | 1.2165 |
PC_{23} | 0.61 | 2.9 | 1.7 | 1.3038 |
PC_{13} | 0.59 | 3.3 | 1.81 | 1.3453 |
PC_{1,2,3} | 0.77 | 0.3 | 0.44 | 0.6633 |
4.3. Regression Analysis with Two Principal Components
The training and testing are carried out using two principal components. Three combinations of two principal components are used i.e. PC_{12} shows the results obtained using PC1 and PC2, PC_{13}shows the results obtained using PC1 and PC3 and PC_{23}shows the results obtained using PC2 and PC3.
R-value of 0.64, 0.61 and 0.59 is obtained for PC_{12}, PC_{23} and PC_{13}, respectively in the regression analysis signifying moderate correlation between SpHb and Hb_{lab}. Mean bias between Hb_{lab} and SpHb is 2.53 g/dL, 2.9 g/dL and 3.3 g/dL for PC_{12}, PC_{23} and PC_{13}, respectively. Mean Square Error between Hb_{lab} and SpHb is 1.48, 1.7 and 1.81 for PC_{12}, PC_{23} and PC_{13}, respectively. Root Mean Square Error between Hb_{lab} and SpHb is 1.2165, 1.3038 and 1.3453 for PC_{12}, PC_{23} and PC_{13}, respectively.
4.4. Regression Analysis with Three Principal Components
For training and testing, all the three PCs are used and R-value of 0.77 is obtained in the regression analysis signifying high correlation between SpHb and Hb_{lab}. Mean bias, MSE and RMSE between Hb_{lab} and SpHb is 0.3 g/dL, 1.81 and 0.6633, respectively.
DISCUSSION AND CONCLUSION
In this work, the non-invasive estimation of Hb using principal component analysis and neural networks is discussed. The PCA has been performed on the PPG signal to obtain principal components. These principal components are given as input to neural networks. The design is tested with different combinations of principal components.
It is observed from Table 2 that bias between SpHb and Hb_{lab} obtained is high for estimation of SpHb using one principal component when compared with the estimation of SpHb using two and three principal components. It is also observed that the correlation between SpHb and Hb_{lab} shows better results for the estimation of SpHb using two or all PCs. It can be seen that the accuracy being achieved is increased with a number of principal components as inputs. This is because there is a loss of information in the input data if less principal components are used. But the hardware utilization is minimum with less number of principal components.
Finally, the accuracy obtained with one PC is 63.1% which is on the lower side, whereas the regression analysis carried out using two and three principal components obtained from PCA exhibit higher accuracy of 84.7% and 95.7%. Future work will involve further clinical studies, optimization of the sensor and evaluation under different population setting. However, one of the limitations of the work is that the population setting chosen for the work does not include paediatric patients, accurately ill patient, pregnant population and surgical patients. With detailed analysis on a wide range of population setting, Hb prediction using PPG is a promising approach for non-invasive measurement.
ETHICAL APPROVAL AND CONSENT TO PARTICIPATE
Institutional Ethical Clearance have been attained from Karpagam Academy of Higher Education and Sree Abirami Hospital, Coimbatore, India for performing human trials.
HUMAN AND ANIMAL RIGHTS
No animals were used inthis study. The reported experiments were performed in accordance with the Medical Council of India, Government of India and WHO norms.
CONSENT FOR PUBLICATION
Informed Consent was obtained from the subjects before the acquisition of PPG signal.
AVAILABILITY OF DATA AND MATERIALS
Not applicable.
FUNDING
None.
CONFLICT OF INTEREST
The authors declare no conflict of interest, financial or otherwise.
ACKNOWLEDGEMENTS
The authors of this paper wish to thank the help and support from the nursing and medical staff at the SreeAbhirami Hospitals, Coimbatore without whose kind support, the study could not have done.