Skip to main content

Full text of "Utilizing Viola Jones with Haar Cascade Along with Neural Networks for Face Detection and Recognition"

See other formats

International Journal of Trend in Scientific Research and Development (IJTSRD) 
Volume 5 Issue 1, November-December 2020 Available Online: e-ISSN: 2456 - 6470 
2... ..0..:.00— "> 

Utilizing Viola Jones with Haar Cascade Along with 
Neural Networks for Face Detection and Recognition 

Karan Arora}, Sarthak Arora2 

1Department of Computer Science, Chitkara University, Rajpura, Punjab, India 

2Department of IT, Maharaja Agrasen Institute of Technology, IPU, New Delhi, India 


Viola-Jones object detection frameworkintroduced in 2001 by Dr. Paul 
Viola and Dr. Michael Jones is an object detection framework which can be Haar 
trained for detecting a variety of object classes .It is primarily used for the 
problem of face detection. In most video recording or surveillance systems it 
became impossible for human beings to retrieve large image datasets and 
analyze them for potential results. Now-a-days accurate facial recognition has 
a great impact in our ecosystem be it face unlock or face recognition in 
cameras for auto adjust. Implemented in two stages our proposed 
methodology will first utilize one of the widely accepted methods to detect 
faces i.e. viola Jones which utilizes Haar Classifiers and in the second stage we ISSN: 
will recognise the face using Principal Component Analysis (PCA) and Feed 
Forward Neural Network. Bio ID-Face-Database is used as a training database. 

Test is conducted on webcam video and image snapshots. 

Keywords: haar-features, Viola Jones, Image Analysis, face detection, feature 

extraction, face edge detection 


Face detection due to its wide application in computer vision 
and also in image processing techniques plays a vital role in 
human to computer interaction. The recent advancements in 
Video processing, Image Compressing High Rate Frame 
Rendering facilitated diversified domains to utilize face 
detection and recognition techniques. It also made possible 
for us to avail the latest technology in daily operations like 
blazing face unlock. The process of correctly recognising a 
human face is a tough task as it exhibits multiple varying 
attributes like expressions, age, change in hairstyle etc. 
Though technology has grown but still it challenges many 
aspects in image processing such as blurry face detection 
and human-animal confusion. The challenge occurs because 
of multiple layers to filter the images or editing the image 
generally makes the face incomprehensible. 

Now-a-days Face recognition is used in various domains and 
has multiple applications such as security systems, credit 
card verification, identifying criminals in airports, railway 
stations etc. Though various methods are researched on to 
detect and recognize a human face, developing a subtle 
model for a big database is still a challenging task. Thatis the 
reason face recognition is taken as a high level computer 
vision challenge to achieve accurate results multiple 
methods can be developed. 

Few methods known for face recognition are group based 
tree neural networks, artificial neural networks (ANN) and 
principal component analysis (PCA). 

How to cite this paper: Karan Arora | 
Sarthak Arora "Utilizing Viola Jones with 
Cascade Along with Neural 
Networks for Face Detection and 
Recognition" ra ~] 
Published in 
International Journal 
of Trend in Scientific 
Research and 
Development (ijtsrd), 

ime: eit 

2456-6470,  IJTSRD35848 
Volume-5 | Issue-1, a 
December 2020, pp.284-291, URL: /ijtsrd35848.pdf 

Copyright © 2020 by author(s) and 
International Journal of Trend in Scientific 
Research and Development Journal. This 
is an Open Access article distributed 
under the terms of 
the Creative 
Commons Attribution 
License (CC BY 4.0) 
(http: // 

The proposed methodology is executed in two phases - As 
the face can be characterized by special facial features, The 
first step will be to extract those features. Then they are 
quantized making it easier to recognize a face by referring to 
those features. For detection we will use Viola-Jones 
algorithm which works on Haar Cascades and we also used 
AdaBoost classifier as a modifier. The next step is to 
recognise the face for which we used (PCA) principal 
component analysis along with artificial neural networks 
(ANN). The aim of the paper is to use the methodology 
mentioned to detect and recognise a face from the database 
and then on the test set and webcam outputs. 


A strategy that enhances the recognition rate as compared to 
PCA was introduced by Muhammad Murtaza Khan et al.,[8] 
which was outperformed by sub-Holistic PCA in all test 
scenarios with 90% recognition rate registered for the ORL 

One method for face recognition based on preprocessing the 
face images was introduced by Patrik Kamencay et al.,[4| 
that used the Propagation Belief segmentation algorithm. 
The positive effect for face recognition was depicted by the 
algorithm with a face recognition rate of 84% for the ESSEX 
database. The use of linear and non linear techniques for 
feature extraction in face recognition was proposed by Hala 
M. Ebied et al., [5]. In the paper a high-dimensional feature 
Space is mapped with nonlinear methods represented via 

@IJTSRD | Unique Paper ID -TJTSRD35848 _ | 

Volume-5|Issue-1 | 

November-December 2020 Page 284 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

extension of Kernel-PCA from PCA, for the classification step 
K-nearest neighbor classifier with Euclidean distance is used. 
The SIFT-PCA method was proposed by Patrik Kamencay et 
al., [6] which implemented an impact of graph based 
segmentation algorithm on the recognition rate.SIFT related 
segmentation algorithms are used for preprocessing of the 
face images. The results depict a positive effect for face 
recognition for segmentation in combination with SIFT-PCA. 
A NP-hard problem of searching the best subnet of the 
available PCA features for face recognition is solved in the 
methodology proposed by Rammohan Mallipeddi et al.,[ 7]. 


The proposed method uses the differential equation 
algorithm called FS-DE. After maximizing the class 
separation in training data a feature set is obtained, further 
presenting an ensemble base for face recognition. A study of 
modified constructive training algorithms for Multi Layer 
Perceptron is proposed by Hayet Boughrara et al.,[8], which 
is applied to applications in face recognition. This paper 
contributed to depict the methods to simultaneously 
increase the output neurons with increasing input patterns. 
Perceived Facial Image is applied for feature extraction. 

We propose a robust methodology that is independent of facial variations like size, texture, feature position, facial expression 
etc using Viola-Jones, Principal component analysis and Neural Networks. Please refer to the flowchart stating the same in 

figure 1 

Face Recognition 


Adding image 

Extending the Contrast 

Face Detection Usina 


Feature Set Extraction 
Using POA 

Face Recognition Using 
Neural Networks 

Face Labelling 

Figure 1: Flowchart 

3.1. Data 

Standard Data used is the BIOID face database. The dataset consists of 1521 gray scale images with 384*286 pixel resolution. 
The front angle view images in the database consists of a face of 23 different persons. We have the test set with images witha 
variety of face size, lighting, background representing real life scenarios as shown in fig 2. 

@IJTSRD | Unique Paper ID -IJTSRD35848 _ | 

Volume - 5 | Issue - 1 

| November-December 2020 Page 285 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 



Figure 2: Example set - Bio ID Face Database 

We considered an image database that is readily available in either gray scale or color. Contrast stretching is performed on the 
current image where white pixels were made whiter and dark pixels are made darker. 


Right after Contrast-Stretching, Viola-Jones algorithm is utilized for detecting face in the image. We chose Viola-Jones detector 
as a detection algorithm because of its accuracy in detecting faces, and its ability to run in real time. The Viola-Jones detector 
works best with frontal images of faces and it can handle 45° face rotation both around the horizontal and vertical axis. There 
are three main concepts which allow it to run in real time first is Integral Image, Second is Ada-Boost and third is cascade 
structure Integral Image is a cost-effective generation algorithm that works on the sum of pixel intensities in a specified 
rectangle in an image. The main use-case is rapid computation of Haar-like features. The calculation done on the sum of a 
rectangular area inside the original image is extremely efficient for the initial step, requiring only four additions for any 
arbitrary rectangle size. The use of Ada-Boost is that it constructs strong classifiers as a linear combination of weak classifiers. 
voila-Jones uses Haar features, Haar Features used in the Viola Jones algorithm is shown in Fig 3. 

Fig 3: Representing features with Haar Features 

@IJTSRD | Unique Paper ID-TTSRD35848 | Volume-5|Issue-1 | November-December 2020 Page 286 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

Haar features shown above can be of various height and width. The working for the calculation for value is that from the 
applied Haar feature to the face, we calculate the sum of white pixels and also the sum of black pixels then it is subtracted to get 
a single value. If in the region the value is high, then it takes part of the face and is identified as eyes, nose, cheek etc. We have 
approximately 160000+ Haar features calculated all over per image. In real time application Summing up the entire image pixel 
and after that subtracting them to get a single value is not much efficient, which can be reduced by using the Ada-boost 
classifier. The major function of Ada-boost is reducing the redundant features. Integral image is used instead of summing up all 
the pixels as shown in figure 4. 

Fig 4: Integral Image 

To obtain a new pixel value - pixels above and pixels to the left are added then all the values around the patch are added to 
obtain the sum of all pixel values. Ada boost will be determining relevant features and irrelevant features. Post identifying 
relevant features and irrelevant features, adaboost assigns a weight value to all of them, Which constructs a strong classifier as 
a linear combination of many Weak classifiers. 

1; Identified a feature (ex: nose) 
Weak classifier = 
0; Not Identified any feature (ex: no nose in image) 

Nearly 2500 features are calculated, the number of computations can be further reduced by cascading. A set of features are 
here kept in another set of classifiers and so on in a cascading format. Using this method, one can detect if it is a face or not 
faster and can reject it if one classifier fails to provide a necessary output to the next stage. Then the detected face is cropped 
and resized to 100x100 that is the standard resolution. The step after that is to identify the detected image using Principal 
Component Analysis (PCA) and Artificial Neural Network Algorithm (ANN). 

To extract human face features, we use PCA. Fig 5 depicts the PCA operational flow. 

Feature Analysis 

Eigen Values 
Eigen Vectors 
Covariance matrix | 

Fig 5: PCA flow chart 

To extract features from a cropped and resized image Principal component analysis (PCA) is used. To transform higher 
dimensional data into lower dimensional data, It is used as a tool in exploratory data analysis and in predictive analysis. A 
bunch of M x M size facial images in a training are converted using principal component analysis technique into lower 

@IJTSRD | Unique Paper ID -IJTSRD35848 | Volume-5|Issue-1 | November-December 2020 Page 287 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

dimensional face images. Principal Component Analysis (PCA) is used for the purpose of conversion of a set of correlated N 
variables into a set of uncorrelated k variables called principal components. The number of principal components (set of 
uncorrelated k variables) are less than or equal to the number of original values i.e. K<N. The above definition is modified as 
Principal component analysis for the application like face recognition application, it is one of the mathematical procedures used 
to convert a set of correlated N face images into a set of uncorrelated k face images called as Eigen faces. 

Before calculating the principal components the dimension of the original images has to be Reduced, to reduce the number of 
calculations. Since principal components show more noise and less direction, only first few principal components (say N) are 
selected and the remaining components as they contain more noise can be neglected. 

The M-image training set is represented by the best Eigen face with largest Eigen-values that accounts for the most variance 
with the set of best closely related feature-set and facial images. Each image in the training-set after finding Eigenfaces can be 
represented by a linear combination of Eigen Faces and will be represented as vectors. Standard database features are 
compared with input image features for Recognition. 

Input Hidden Output 
Layer Layer Layer 

© — Output 

Figure 6: Example of an Artificial Neural Network 

Count of the neurons in the input layer is equal to the count Eigen faces, the type of the network is Feed forward back 
propagation network. 

Refer Figure7, For a single cell represented as f(x), its output can be calculated as output = input1 + input2 as shown in Figure 
7. The function f(x) is a neutral function because it won't add or amplify any value to the incoming inputs but it just adds the 
value of incoming inputs. One can use a mathematical function such as tanh to represent the above function. 

Input 1 + ff Output 
Output to 
Input 2 other node 

Figure 7: Single neuron cell 

Layered feed forward Artificial Neural Networks make use of the back propagation algorithm, where the neurons send their 
signals in forward direction and the errors are propagated backwards. Until ANN learns the training data, the back propagation 
reduces this error. Through the back propagation technique the neural networks learn and determine the connection weights 
between the inputs, outputs and hidden cells. To make the error minimal random weights are initially assigned to these 
networks which are to be adjusted. 

Error in network = Desired output - Calculated output 

@IJTSRD | Unique Paper ID -ITJTSRD35848 | Volume-5|Issue-1 | November-December 2020 Page 288 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

The back propagation technique is used to minimize the error, using a formula which consists of weights, inputs, outputs, error 
and learning rate (a). 

eae é _ ,GError, 
Mak, Gwe ) 

Figure 8: Updating New Weights in Back propagation. 

Training of the Neural Networks: 

For each individual in the database considered one ANN is used, twenty three networks are created since there are twenty 
three persons in the database. Face descriptors are used as input for the purpose of training ANN. The face descriptors relating 
to the same individual are used as positive examples for that individual networks output will be 1 and as for negative examples 
like others so that output will be 0. Our trained network will be utilized for the purpose of recognition. 

Neural Networks Simulation: 

The facial descriptions of the test image calculated on Eigen-faces are used as input in all networks and are simulated. The 
results produced were comparable and the output being much higher than the previously described level ensures that the test 
image belongs to a well-recognized person with a maximum output. 


In the Face Tagging stage the result from the simulation is used by the recognition system to tag an appropriate name to the 
image of the person. The data is in binary form and hence this block is also responsible for evaluating the expression into a 
certain value and matching it to a person’s name in the name list. However, if the interpreted value is not one of the values 
listed in the roster, then the name returned will be automatically predefined as “Unknown”. 

In the Face Marking phase the results from the simulation are used in the awareness program to mark the correct word in the 
person's image. Data is in binary form so this block also has a responsibility to test the expression into a certain number and 
compare it with the person’s name in the word list. However, if the translated value is not one of the listed values, then the 
name to be returned will be automatically defined as "Unknown". 

Consider an image we have taken as test image as shown in Figure 9, it is preprocessed for identification 

Figure 9: Test Image 

Image 9 is the Test image taken and depicted for analysis for the paper, after applying the Viola-Jones algorithm to the image in 
Figure 9, Identified face image shown in Figure 10 is obtained (bounding box on identified face). Then it is resized to 100x100 
pixels, that is the Haar features are calculated and all the related features are extracted. 

@IJTSRD | Unique Paper ID -ITJTSRD35848 | Volume-5|Issue-1 | November-December 2020 Page 289 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

Figure 10: Face recognised by Viola-Jones algorithm in Red boundary 

As shown by the bounding box in figure 11 main features of the face are identified by Viola-Jones algorithm and is used for 
deciding the nodes corresponding to the identified part of the face. 

Figure 11: Facial features identified by Viola-Jones algorithm (Boundary box) 

The features extracted by Viola-Jones algorithm are represented as nodes, these nodes are joined to form a shape making it 
sure that all nodes are well connected and the connected lines are named with reference numbers as shown in Figure 12. 

Figure 12: Facial Feature Calculation 

@IJTSRD | Unique Paper ID-VTSRD35848 | Volume-5|Issue-1 | November-December 2020 Page 290 

International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470 

Figure 12 shows that in order to identify the person how the specifications of features are calculated. Each detail is tabulated 
after the features are calculated from various angles. The person in the image is identified correctly based on this calculation. 
The tabulated results of the various features taken-into-account are shown in Table 1. 

Table 1: Calculation of various features Different an 

les VS Face features of Images 

INDEX Feature1 Feature2 Feature3 Feature4 Feature5 Feature6 Feature 7 




| 1912, | 362, | 560 | 546 | 666 | 700 S| 12 


| 5 | 1120 | 419 | 696 | 612 | 720 | 639 | 654 
| 8 


| 12 | 1997 | 364 | 517 | 512 | 660 | 576 | 584 


Table 2: Results 

Techniques / Authors 

Neural Networks 
Principal Component Analysis 

Kamencay [2] a) Segmented 

b)Non Segmented 

Fernandez [3] (Artificial Neural Nets and Viola-Jones) | 88.64% 

Mohammad Da'san|[4] (Viola-Jones , Neural-Networks) | 90.31% 

Proposed Method 

PaXe tb el ea 



We compared the accuracy of our proposal with existing models as shown in Table 2. The accuracy of the proposed method 
turned out to be 94%, thus the proposed method is more accurate in recognising a person in an image when compared to other 



The paper presents an efficient approach i.e. fusion of 
preprocessing PCA then Viola-Jones and utilizing neural 
networks for face detection and recognition. The accuracy is 
compared with other existing models that perform the same 
operations, where It is observed that the performance of the 
model is superior. Facial Detection plays an important role in 
a plethora of applications, where-in most cases there is a 
desire to utilize the high rate of accuracy in recognition of 
people hence the proposed method can be considered after 
taking account of the results with other existing methods 


[1] | Maria De Marsico, Michele Nappi, Daniel Riccio and 
Harry Wechsler, “ Robust Face Recognition for 
Uncontrolled Pose and Illumination Changes” IEEE 
Transaction on Systems, Man and Cybernetics, 
vol.43,No.1,Jan 2013. 

Kamencay, P, Jelsovka, D.; Zachariasova, M., "The 
impact of segmentation on face recognition using the 
principal component analysis (PCA)," IEEE 
International Conference in Signal Processing 
Algorithms, Architectures, Arrangements, and 
Applications, pp.1-4, Sept. 2011. 

Ma. Christina D. Fernandez, Kristina Joyce E. Gob, 
Aubrey Rose M. Leonidas, Ron Jason J. Ravara, Argel 
A. Bandala and Elmer P. Dadios “Simultaneous Face 
Detection and Recognition using Viola-Jones 

Algorithm and Artificial Neural Networks for Identity 
Verification”, pp 672-676, 2014 IEEE Region 10 
Symposium, 2014. 

Mohammad Da’'san, Amin Alqudah and Olivier Debeir, 
“ Face Detection using Viola and Jones Method and 
Neural Networks” IEEE International Conference on 
Information and Communication Technology 
Research, pp 40-43,2015. 

Anil K. Jain, “Face Recognition: Some Challenges in 
Forensics’, IEEE International Conference On 
Automatic Face and Gesture Recognition, pp 726- 

Ming Zhang and John Fulcher “Face Perspective 
Understanding Using Artificial Neural Network Group 
Based Tree’, IEEE International Conference On Image 
Processing, Vol.3, pp 475-478, 1996. 

Hazem M. EI-Bakry and Mohy A. Abo Elsoud “Human 
Face Recognition Using Neural Networks” 16th 
national radio science conference, Ain Shams 
University, Feb. 23-25, 1999. 

Muhammad Khan, Jocelyn Chanussot, Laurent Condat, 
Annick Montanvert. Indusion: Fusion of Multispectral 
and Panchromatic Images Using Induction Scaling 
Technique. IEEE Geoscience and Remote Sensing 
Letters, IEEE - Institute of Electrical and Electronics 
Engineers, 2008, 5 (1), pp.98- 102. 
ff10.1109/LGRS.2007.909934ff. ffhal-00348845f 

@IJTSRD | Unique Paper ID -IJTSRD35848 _ | 

Volume-5|Issue-1 | 

November-December 2020 Page 291