Advanced Enhancement Techniques for Breast Cancer Classification in Mammographic Images

Mohamed, Sara EL.; Mohamed, Wael A.; Abdelhalim, Mohamed B.; Ahmed, Khaled EL.

All published articles of this journal are available on ScienceDirect.

RESEARCH ARTICLE

Advanced Enhancement Techniques for Breast Cancer Classification in Mammographic Images

Sara EL. Mohamed¹ Wael A. Mohamed¹ Mohamed B. Abdelhalim² Khaled EL. Ahmed¹^{, *}
Authors Info & Affiliations

The Open Biomedical Engineering Journal • 01 Dec 2022 • RESEARCH ARTICLE • DOI: 10.2174/18741207-v16-e2209200

Background:

Breast cancer is one of the most significant health problems in the world. Early diagnosis of breast cancer is very important for treatment. Image enhancement techniques have been used to improve the captured images for quick and accurate diagnosis. These techniques include median filtering, edge enhancement, dilation, erosion, and contrast-limited adaptive histogram equalization. Although these techniques have been used in many studies, their results have not reached optimum values based on image properties and the methods used for feature extraction and classification.

Methods:

In this study, enhancement techniques were implemented to guarantee the best image enhancement. They were applied to 319 images collected from the Mammographic Image Analysis Society (MIAS) database. The Gabor filter and local binary pattern were used as feature extraction methods together with support vector machine (SVM), linear discriminant analysis (LDA), and nearest neighbor (KNN) classifiers.

Results:

The experimental work indicates that by merging the features of the Gabor filter and local binary pattern, the results were 97.8%, 100%, and 94.6% for normal/abnormal and 85.1%, 88.7%, and 81.9% for benign/malignant using the SVM, LDA, and KNN classifiers, respectively.

Conclusion:

The best results were obtained by combining the features of the two tested strategies and using LDA as a classifier.

Keywords: Breast cancer, Gabor filter, Local binary pattern, LDA, Benign, Malignant.

1. INTRODUCTION

Breast cancer is a perfidious disease that leads to a large number of deaths in women [1]. There are many techniques used for detecting breast cancer, such as mammography, ultrasound, magnetic resonance imaging (MRI), thermography, and electrical impedance tomography. Mammography has a high specificity and sensitivity to detect cancer together with better resolution and more accuracy in detecting abnormalities deeper in breast tissue, although it uses ionizing radiation and is less sensitive to radiographically dense breasts [2]. On the other hand, ultrasound has a high diagnostic utility in women with dense breasts [3], which uses nonionizing radiation and is a safe technique. However, it cannot capture an image of the entire breast. MRI is a very accurate test with approximately 100% efficiency and can detect the intraductal spread of cancer, but it has poor specificity and is very expensive compared to others [4, 5]. On the contrary, thermography is non-invasive, non-radioactive, and promising for dense breasts [6]. However, it is easily affected by temperature and poorly extracts images from large breasts. Electrical impedance is non-invasive, non-radiative, and risk-free, works well with dense breasts, and is reasonably priced [7-9], but it has poor resolution [10]. Sahiner et al. used mammography and a convolutional neural network (CNN) to classify cancer as normal/abnormal [11]. Nega et al. used linear discriminant analysis (LDA) as a classifier [12]. However, 92% accuracy for normal/abnormal and 80% accuracy for benign/malignant were achieved by using the support vector machine (SVM) classifier, discrete wavelet transforms (DWT), and discrete shearlet transform (DST) [13]. Using the SVM classifier and wavelet decomposition, 80% accuracy was achieved at 1.1 fps/I by Campanini et al. [14], and by using the SVM classifier, 85.11% accuracy was achieved at 1.44 fps/I by Ke et al. [15]. Magnetic resonance imaging (MRI) [16-19], which uses subtracted mean intensity projection images, evaluates a fully automatic CAD system. A semiautomatic segmentation algorithm achieved accurate and consistent breast lesion segmentation in the study by Ritter et al. [20]. It is different from using ultrasound as in the study by Eltoukhy et al. [21, 22], which uses curvelet and wavelet transformation and nearest neighbor as a classifier, achieving 94.07% accuracy, while wavelet 90.07% and curvelet transform achieving 94.28% accuracy for abnormal. After the classification of the Euclidian distance and curvelet transform for feature extraction, 98.59% accuracy was achieved in the study by Eltoukhy et al. [22]. Using the local discrete cosine transform (LDCT) and curvelet transform in the wrapping technique, 77.3% accuracy was achieved by Gardezi et al. [23]. The use of a support vector machine and 1238 coefficients and 150 features achieved an accuracy of 95.84% for normal/abnormal and 96.56% for benign and malignant lesions in the study conducted by Eltoukhy et al. [24, 25]. In addition, a marker-controlled watershed transformation algorithm achieved 84.848% accuracy in the study by Shareef [26].

An image preprocessing technique was used to improve the image features and prepare for further processing by eliminating unrelated and spare parts from the background of the mammogram images [27]. Preprocessing used many steps to make the image ready to use, such as the median filtering technique, edge enhancement, dilation, erosion, and contrast-limited adaptive histogram equalization. The median filter is a nonlinear filter that efficiently eliminates the salt-and-pepper noise. The median tends to maintain the sharpness of image edges while removing the noise. Edge enhancement is the simplest linear filter that assigns equal weights (Wk) to all neighborhood pixels. A weight of Wk = 1/(NM) was used for the N × M neighborhood. It is used as a filter to suppress noise in an image and remove Gaussian noise with a reasonable effect. The mean filter smoothens and blurs the images [28]. Dilation and erosion affect the shape, structure, and the form of objects. Dilation is used to add pixels at the region's boundaries or to fill in holes in the image [29]. Dilation can also be used to connect disjoint pixels and add pixels at edges. Erosion does the opposite operation of dilation; erosion reduces boundaries and increases the size of holes. Contrast-limited adaptive histogram equalization (CLAHE) was originally applied to the enhancement of low-contrast medical images [30-32]. CLAHE differs from ordinary AHE in terms of contrast limitations. CLAHE introduces a clipping limit to overcome the noise amplification problem. CLAHE limits the amplification by clipping the histogram to a predefined value before computing the cumulative distribution function (CDF). Regarding feature extraction techniques, many studies have used the Gabor filter, wavelet transform, and local binary pattern (LBP). The Gabor filter provides the highest response at the points and edges where texture changes. Owing to these characteristics, algorithms based on Gabor filters have been successfully applied in computer vision applications [33], such as texture extraction [34, 35]. The general form of the 2D (for mammographic images) Gabor filter family is characterised by a Gaussian kernel adapted by an oriented complex sinusoidal wave [36]. LBP is an effective method for extracting textural features. The LBP operator converts the image into an array or an image with integer labels, illustrating a small-scale appearance of the image [37].

Support vector machine (SVM), linear discriminant analysis (LDA), and nearest neighbor (KNN) classifiers were used in our research. SVM is a machine learning technique that categorizes binary classes by obtaining and using a class boundary hyperplane, thereby expanding the margin of the offered training data. The training data samples along the hyperplanes close to the class boundary are known as support vectors, and the margin is the space between the support vectors and class boundary hyperplanes. The SVM is established based on the idea of decision planes that identify decision boundaries. The decision plane differentiates sets of items with different class memberships. The SVM is a valuable procedure for data classification. A classification mission typically involves training and testing data comprising data instances [38].

Linear discriminant analysis (LDA) is a frequently used procedure for data classification and dimensionality reduction. LDA handles situations in which within-class frequencies are unequal, and their actions are analysed using randomly generated data. This approach maximizes the ratio of between-class variance to within-class variance in any specific dataset, thereby ensuring maximal separability [39]. LDA often delivers robust, reliable, and interpretable results in a simple manner. When faced with real-world classification difficulties, LDA is repeatedly the first benchmarking technique before other more complicated and adaptable techniques are utilized [40].

The nearest neighbor classifier (KNN) is a commonly used pattern classification procedure owing to its ease and productivity [41-43]. Furthermore, KNN, a flexible multivariate statistical technique, uses the standard Euclidean distance to estimate the data [44, 45]. KNN evaluates the class aspect based on the k-nearest training models in the feature space. When a dataset is offered, it selects the k-nearest samples from the categorized training data and determines the class taking into consideration the most representative samples. The Euclidean distance similarity metric was applied to select neighborhoods. Our study aimed to differentiate between normal and abnormal mammographic breast images and to accurately diagnose these images.

2. MATERIALS AND METHODS

2.1. Data

The data were gathered using the Mammographic Image Analysis Society (MIAS) database, which categorizes breast tissues as normal, benign, or malignant. Although breast tissues may be classified as fatty, fatty glandular, or dense glandular, the collected images are diagnosed using image processing algorithms. The collected images were analyzed using 1024 × 1024 pixels. Their distribution is shown in Table 1, considering the radius of the abnormality as 197 pixels.

Table 1.

Numbers of images (normal, benign, malignant).

Types of Images	Normal	Benign	Malignant
319	209	60	50

2.2. Image Preprocessing

Image preprocessing techniques are regarded as one of the most significant steps for improving image quality by reducing noise or other undesired regions.

Image segmentation is used to cut and change images into abnormal regions for easy detection and diagnosis of ROIs (regions of interest). Beginning manually, a circle of radius 197 pixels was considered.

Four different preprocessing procedures were used, and each technique had its own filtration sequence. As indicated in Table 2, the median filter, average filter dilation, erosion, and adaptive histogram are four filters that can be used in specific sequences. The key variation between these sequences is the order in which filters are applied. For example, in sequence four, an adaptive histogram is first applied, followed by dilation, erosion, median filter, and average filter.

Table 2.

Sequences of preprocessing techniques.

Sequence	1	2	3	4	5
Seq1	Median	Average	Dilation	Erosion	Adaptive histogram
Seq2	Adaptive histogram	Dilation	Median	Average	Erosion
Seq3	Adaptive histogram	Average	Erosion	Dilation	Median
Seq4	Adaptive histogram	Dilation	Erosion	Median	Average

The extracted and filtered images were examined and compared using the mean square error (MSE) and structural similarity index (SSIM) to determine the best applied scenario to make the image clearer and noise-free.

The mean square error is the most common form of image quality. A higher MSE value indicates lower image quality. MSE is defined as follows:

(1)

SSIM is also used to measure the similarity between the two images in order to assess the difference in the quality of the generated image from the original image. With a moving window, SSIM considers the arrangement of image values by quantifying pixel intensities, which are composed of three components: brightness, contrast, and structure. SSIM calculates the similarity between two images, X and Y, as expressed by the following equation:

(2)

According to the retrieved findings of the two tested methodologies, scenario-2 of the sequence (adaptive histogram, dilation, median, average, and erosion) obtained the highest score, as shown in Table 3. The original image and preprocessing image are illustrated in Fig. (1), whereas Fig. (2) shows the image after it has been processed as well as the ROI extraction.

Table 3.

Results of MSE and SSIM.

Sequence No.	Seq1	Seq2	Seq3	Seq4
MSE	54.0915	56.0948	52.6007	55.0121
SSIM	0.5165	0.5187	0.5198	0.5191

Fig. (1). Applying sequence-2 of preprocessing techniques.

Fig. (2). Extracted ROI from the processed image.

3. EXPERIMENTAL

3.1. Feature Extraction

To select the most effective features in the gathered photos, the Gabor filter and local binary pattern were employed as feature extraction techniques, with features combined between them. Fig. (3) shows the flowchart of the completed work.

3.1.1. Gabor Filter

The Gabor filter is a linear filter used to extract information from images, such as texture (mean, standard deviation, skewness, variance, mean absolute, and maximum energy). Fig. (1) shows the ROI of the mammography image before and after using the Gabor filter (eq. 4) at points and edges where the texture changes. Algorithms based on Gabor filters have been effectively employed in breast cancer images to extract significant features and data to aid the classification process using these characteristics (Fig. 4). A Gaussian kernel modulated by an oriented complex sinusoidal wave represents the generic form g(x, y) of a 2D Gabor filter family, as shown in equations 3-6:

(3)

(4)

Where, δ_x and δ_y are the scaling parameters, W is the central frequency of the complex sinusoid, and ϴ ϵ [0, π] is the orientation of the normal to the parallel stripes of the Gabor function.

(5)

Fig. (3). Flowchart of the performed work.

(6)

Where, m is the total number of orientations and n is the total number of frequencies.

3.1.2. Local Binary Pattern

An LBP is a robust description of textures. The features were extracted based on a threshold. This method has proven to be a powerful tool for extracting texture features from images, such as the mean intensity value, contrast, correlation, and entropy. The mammogram image preprocessing and after the local binary pattern are shown in Fig. (5).

Fig. (5). Applying the local binary pattern.

Local binary pattern (LBP) was used to calculate the mean intensity value, contrast, correlation, and entropy of the studied image, which was considered a texture descriptor. Texture was defined for each pixel using the local structure. The binary code is extracted based on the intensity level differences between neighboring pixels. The pixel intensity level was used as the threshold value for surrounding pixels.

The general form of a local binary pattern (LBP) is represented by equation 7, as follows:

(7)

Where, gp is the value of its neighbors, gc is the gray value of the central pixel, P is the total number of neighbors involved, and R is the radius of the neighborhood. To calculate the accuracy of the classifier, a merged Gabor filter and local binary pattern features are employed as a new group of features. Ten features were created by combining these values, including mean, standard deviation, skewness, variance, mean absolute, maximum energy, mean intensity, contrast, correlation, and entropy. To determine the best features, three groups of features were introduced for the three classifiers.

3.2. Classification

After collecting three groups of features, 1) Gabor filter features, 2) local binary pattern features, and 3) merged features, the features were classified using three techniques: support vector machine (SVM), linear discriminant analysis (LDA), and nearest neighbor (KNN) classifiers. In KNN, the cosine distance metric and equal distance weights, together with 10 neighbors, are the adjusted parameters. The linear kernel function, together with the multiclass method, is an SVM factor, whereas LDA assigns a full covariance structure. These parameters are assigned to implement the classification process.

4. RESULTS

In this paper, 319 images were obtained from the Mini-MIAS database (Mammographic Image Analysis Society). The images were divided into 209 normal and 110 abnormal for the mass/non-mass classification. For benign/ malignant legions, 110 images were divided into 60 benign and 50 malignant lesions. The images were grayscale, with a size of 1024 × 1024 pixels.The ROI was manually extracted, with a radius of 197 pixels. The images were processed via a group of filters, such as adaptive histogram equalization, dilation, median, average, and erosion as selected sequences/scenarios that satisfied the best MSE and SSIM scores. Three groups of features were used: 1) features from the Gabor filter, 2) features from LBP, and 3) features from merging (GF+LBP). Three classifiers, SVM, LDA, and KNN, were used to classify the images as either normal/abnormal or benign/malignant. The combination of LDA as a classifier and GF+LBP as a group of features has satisfied the highest results with 100% differentiation between normal and abnormal images, as illustrated in Fig. (6).

The experimental results indicated that when using the Gabor filter, the results were 95.7%, 98.9%, and95.7% for normal/abnormal, and 85.1%, 85.1%, and 82.9% for benign/malignant using SVM, LDA, and KNN as classifiers, respectively. Using the local binary pattern for feature extraction, the results were 96.8%, 98.9%, and 96.8% for normal/abnormal and 85.1%, 85.1%, and 82.9% for benign/malignant, using SVM, LDA, and KNN as classifiers, respectively. By merging the features of the Gabor filter and local binary pattern features, the results were 97.8%, 100%, and 94.6% for normal/abnormal and 85.1%, 88.7%, and 81.9% for benign/malignant by using SVM, LDA, and KNN classifiers, respectively.

As shown in Fig. (7), the accuracy of applying classifiers to abnormal cases (benign and malignant) was 88.7% in the case of applying LDA together with combined features. Furthermore, as shown in Table 4, a comparison of the proposed technique and previous work ensures that the calculated results meet the higher accuracy, particularly in distinguishing between normal and abnormal cases.

Fig. (6). Accuracy of applying SVM, LDA, and KNN for features extracted from GF, LBP, and GF+LBP for normal and abnormal cases.

Fig. (7). Accuracy of applying SVM, LDA, and KNN for features extracted from GF, LBP, and GF+LBP for abnormal cases (Benign and Malignant).

Table 4.

A comparison among the previous studies and the proposed technique.

Paper	Database	Feature Extraction	Classifier	Accuracy
Breast cancer detection with Gabor features from digital mammograms (2010) [46]	DDSM	Gabor filter	KNN	TP=90% at FPI=1.21 TP=93% at FPI =1.19
A comparison of different Gabor feature extraction approaches for mass classification in mammography (2015) [47].	MIAS, DDSM	Bank of Gabor filter	KNN	TP=90% at FPI=1.21 TP=93% at FPI=1.19
Breast tumor detection and classification in mammograms: Gabor wavelet vs. statistical features (2018) [48]	DDSM	Gabor, wavelet, statistical	SVM, KNN	79.50%
Breast tissue classification using Gabor filter, PCA, and support vector machine (2012) [49]	MIAS	Gabor filter	SVM	84.37%
Optimized Gabor features for mass classification in mammography (2016) [50]	MIAS, DDSM	Gabor features	SVM	98.8% (normal/masses) 93.95% (benign/malignant)
Classification of breast cancer using local binary pattern and Gabor filter (2017) [51]	MIAS	Local binary pattern, Gabor filter	SVM	Benign 89.28% Normal 79.61% Malignant 70.37%
Computer-aided model for breast cancer detection in mammograms (2016) [52]	MIAS	Texture features, intensity features	LDA	93% normal/abnormal 79% benign/malignant
Breast tissue classification using local binary pattern variants: a comparative study (2018) [53]	MIAS	LBP	SVM, KNN, Bayesian network, Random Forest	SVM 66% KNN 69.2% Bayesian network 70.1% Random forest 67.3%
A study of different texture features based on the local operator for benign-malignant mass classification (2016) [54]	DDSM	LBP	Fisher linear discriminant analysis (FLDA)	92.20%
LBP features for breast cancer detection [55]	DDSM [56], MIAS [57]	LBP	SVM	84%
Our proposal	MIAS	Gabor filter, LBP, GF+LBP	SVM, LDA, KNN	N/Ab B/M Gabor filter SVM 95.7% 85.1% LDA 98.9% 85.1% KNN 95.7% 82.9% LBP SVM 96.8% 85.1% LDA 98.9% 85.1% KNN 96.8% 82.9% GF+LBP SVM 97.8% 85.1% LDA 100% 88.7% KNN 94.6% 81.9%
Advanced enhancement techniques for breast cancer classification in mammographic images

Digital Database for Screening Mammography (DDSM)
Mammographic Image Analysis Society (MIAS)

5. DISCUSSION

Based on the experimental results, KNN was observed to be a poor classifier, either for differentiating between normal and abnormal images or between benign and malignant images. SVM techniques provided equal results for abnormal images for all three feature groups. The texture descriptor extracted from the LBP and the maximum response at points and edges extracted from the Gabor filter correlated to the observed differences in the results between normal and abnormal images. The same accuracy as that of the LDA classifier was obtained using features extracted from either GF or LBP.

CONCLUSION

This paper has introduced a combined system that uses the best sequences of preprocessing enhancement techniques after manually segmenting ROIs extracted from the MIAS database. Three distinct classifiers were used to classify the features obtained from the Gabor filter (GB) and the local binary pattern (LBP). The LDA classifier achieved substantial improvement by integrating the features, achieving 100% accuracy for normal/abnormal images and 88.7% accuracy for benign/malignant images. The advanced technique combines these methods and determines the appropriate order of picture enhancement techniques based on the image database used.

LIST OF ABBREVIATIONS


MIAS	= Mammographic Image Analysis Society
SVM	= Support Vector Machine
LDA	= Linear Discriminant Analysis
CDF	= Cumulative Distribution Function

ETHICAL STATEMENT

The database that supports the results of this research is available online and cited appropriately. We were concerned only with data analysis and methodology and not with any clinical testing. Mammographic Image Analysis Society (MIAS) database has been used for this study. This study has been approved by the medical ethics committee.

CONSENT FOR PUBLICATION

Not applicable.

FUNDING

None.

CONFLICT OF INTEREST

The authors declare no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENTS

Declared none.

REFERENCES

1

K. Jagadeesh, K.P. Jamunalaksmi, P. Muthuvidhya, S.M. Harris, and V. Ganga, "Mammogram based automatic computer aided detection of masses in medical images", J Telecomm Study, vol. 4, pp. 1-7.

2

P.A. Carney, D.L. Miglioretti, B.C. Yankaskas, and K. Kerlikowske, "Individual and combined effects of age,breast density,and hormone replacement therapy use on the accuracy of screening mammography", Ann. Intern. Med., vol. 138, no. 3, pp. 168-175.

Abstract

Background:

Methods:

Results:

Conclusion:

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Data

2.2. Image Preprocessing

3. EXPERIMENTAL

3.1. Feature Extraction

3.1.1. Gabor Filter

3.1.2. Local Binary Pattern

3.2. Classification

4. RESULTS

5. DISCUSSION

CONCLUSION

LIST OF ABBREVIATIONS

ETHICAL STATEMENT

CONSENT FOR PUBLICATION

FUNDING

CONFLICT OF INTEREST

ACKNOWLEDGEMENTS

REFERENCES

Authors

Affiliations

Information

Published In

Article Information

Cite As

Article History

Copyright

ACKNOWLEDGEMENTS

Download1

Download

Citations

Cite As

Export Citation

Metrics

Article Usage (Last 30 Days)

Article Usage (Demographic)

Copyright And License

© 2022 Mohamed .et al

Figures

Share

Share article link

Share on social media