Recently I was trying my hands over PCA and applied it for Eigenfaces so thought of sharing this. Objective of this article is not to describe the details of PCA (so intentionally kept algebra or other mathematics out) but the application of PCA in face compression and recognition. But to set the context, let me touch upon some aspects of PCA and then I will try to map some of these concepts to my recent attempts to know about Eigenfaces.
Principal Component Analysis:
Principal Component Analysis (PCA) is a statistical procedure that is used for dimensionality reduction through orthogonal projections on new dimensions OR we can also say that it converts set of observations of correlated variables into set of values of linearly uncorrelated variables. These new dimensions are called principal components and they capture the maximum variance of data points in the original space. Below is the function which describes the projects on new dimensions:
Y = WTX , and it turns out that W is the eigenvector of covariance matrix of the dataset in the original space.
Number of resulting eigenvectors is equal to the number of original dimensions. So if this is the case then how are we reducing the number of dimensions? Well, this is where the beauty of PCA lies. It turns out that high amount of variance of original space can be explained by few number of principal components. So we sort the components based on their variance and we select only few top variance explaining components. And it turns out that the variance explained by each component is nothing but the associated Eigenvalue. So below holds true:
Cv=λv (where C is covariance matrix, v is the eigenvector and λ is the eigenvalue)
So in summary below are high level steps to do PCA (let’s say ‘m’ is the total number of dimensions in the original space)
- Calculate the m dimensional mean vector (this makes calculations bit easy)
- Compute the covariance matrix for ‘m’ dimensions
- Compute Eigenvector matrix for the covariance matrix and corresponding eigenvalues
- Based on eigenvalues (and acceptable variance explanation) select the number of eigenvectors which explain the variance most (let’s say ‘d’ eigenvectors). So this eigenvector matrix is m x d dimensional where every columns is an eigenvector
- Now the dataset in the original space is transformed on the new dimensional space using the above mentioned equation (Y = WTX)
Above mentioned process can be applied to images too and this process is called Eigenfaces. I have used Yale university face database for my trials (http://vision.ucsd.edu/content/yale-face-database). This face database has 320 x 243 images.
So this means we are talking about a point in 77760 dimensional space for each image here (that’s how image representation is done in a matrix form).
Below are the images that were used as training set:
Lot of web resources advise to crop these images further down as PCA is sensitive to scale and lighting. But for this case, we are talking about it without cropping. Now this training set was read into a 77760 x 16 dimension matrix (let’s say X) where each column represented an image. So this is the dataset in the original space.
Average of this whole matrix was calculated which produced the below average image:
Each image differs from this average by a vector (let’s say R) which is calculated by subtracting average image from the data matrix. So this is the Step 1 mentioned in summary of PCA section above.
Now covariance matrix for the above training set matrix will be C = R.RT where size of this covariance matrix will be 77760 x 77760. It is a herculean task to do any computation on this matrix so there is a shortcut which is available. From linear algebra, if there is a M x N matrix where M > N then this matrix can only have N-1 non-zero eigenvalues. So we can do eigenvalue decomposition of C = RTR of size N x N instead:
Cv = λv ==> RTRv = λv. Now if we multiply this equation with matrix R, we will get:
R(RTRv) = R(λv) ==> RRT(Rv) = λ(Rv)…….so for RRT the eigenvector becomes Rv
Let’s apply the above process to our case where C becomes a 16 X 16 matrix. This is Step 2 from the summary section above in PCA section.
Now we compute the eigenvector and eigenvalues for this 16 X 16 covariance matrix (C). So the resulting eigenvector is v (as used in above equation). Remember we actually need the eigenvector for covariance matrix which was 77760 x 77760 in size. So we need to compute the dot product for R and v which becomes 77760 x 16 matrix. Each column of this matrix after normalization can be used to create different images which are known as Eigenfaces. This is Step 3 of PCA summary section. Below are the 16 eigenfaces created in this case:
Each of this eigenfaces is a component and below is a plot describing the % of variance explained by these eigenfaces (calculated through ratio of respective eigenvalue with total variance):
A subset of these eigenfaces can be selected depending on the minimum acceptable % of variance that must be explained. This is Step 4.
Now these eigenfaces can be used to reconstruct existing faces or match new faces. We can now project the new faces on these eigenfaces (after subtraction from the average face) to know the set of weights. These weights are then compared with the set of weights of all faces in the database and the closest neighbour is identified and analysed.
Below is one of the reconstructed faces with different number of eigenfaces (increasing order):
Image recognition through PCA has its own limitations. One of them is that it is sensitive to lighting and scale. It is difficult for PCA to know if the variance is caused due to external factors as it doesn’t do classification.