What is Anti-Spoofing?
In AI, Facial Recognition (FR) is basically identifying or verifying a person from a digital image or video. FR works by comparing selected facial features from a given image with faces within a database. An application of FR described in this blog is the anti-spoofing of face images. Detection of spoofing from facial recognition is used in applications where people are verified by showing their face in front of a camera.
Face spoofing attacks may take place either by showing a printed photo of a person in front of a camera or by showing the image of a person on a screen such as a mobile phone. Other sophisticated spoofing techniques include video attack or even a 3D mask attack. These attacks are analyzed in terms of descriptors and classifiers.
The following is an overview of these descriptors:
- Texture descriptors- Faces in mobile in front of the camera produce certain texture patterns that do not exist in real ones. These are texture descriptors. For example:- Local Binary Patterns (LBP), Histograms of oriented gradients (HOG), Deep neural networks (DNN).
- Motion descriptors- These detect and describe intra-face variations, such as eye blinking, facial expressions, mouth development, head rotation, etc. These help evaluate the consistency of the user interaction within the environment (Liveliness).
- Frequency descriptors- Artifacts that occur in a spoofing attack (context based) are frequency descriptors.
- User Interaction (Other)
These descriptors help in the detection of a spoofing medium if present. These can further be used in classifying the given image as a spoofed image or not, as a machine learning problem.
Alternatively, we can use deep learning techniques for anti-spoofing. In this technique, we provide a large number of images as examples of original and spoofed images. The features are identified by learning the patterns from these images, and can thereby spoof images can be detected. This can again be treated as a binary classification problem.
1. Collection of Dataset-
We started with the idea of dividing our dataset into two sets of folders. The dataset was thus categorized into True and False classes; True for the images that are spoofed, and False for the images that are original or not spoofed. For the False images, we used the ‘ Northeastern SMILE Lab — Recognizing Faces in the Wild’ dataset of images on Kaggle, and also some other open-source images. While for the True images, we created a dataset of images inside a mobile frame and then capturing that image. This set was further augmented to form the complete dataset which consisted of about a thousand images each.
2. Split the dataset-
We divided the complete dataset into the following categories:
True class datasets:
Train — (500,)
Validation — (200,)
Test — (200,)
False class datasets:
Train — (500,)
Test — (200,)
The following is the code implementation for pre-processing:
4. Training on VGG16 Architecture-
VGG16 is a convolution neural net (CNN) architecture and is considered to be one of the excellent vision model architecture to date. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was submitted to ILSVRC-2014. The following is the architecture of VGG16:
5. Checking the Performance-
The performance was checked using the Valuation set. The range of accuracy on the validation set was 0.97–1, and that of loss was 0–0.10. Then, sav this model as an h5 file.
6. Evaluating the model-
The model performed well against the test set giving an F1 score of 0.995.