Author Archives: Hung Leo

Generating EigenFaces with Mahout SVD to recognize person faces

@ Author: Copy from Chimper Blog

catnmouseIn this tutorial, we are going to describe how to generate and use eigenfaces to recognize people faces.
Eigenfaces are a set of eigenvectors derived from the covariance matrix of the probability distribution of the high-dimensional vector space of possible faces of human beings. It can be used to identify a face on a picture from a person face database very quickly. In this post, we’ll not give much details on the mathematical aspects but if you are interested on those, you can look at the excellent post Face Recognition using Eigenfaces and Distance Classifiers: A Tutorial from the Onionesque Reality Blog.


To do this tutorial, you would need to have the following softwares installed on your machine:

  • Java >= 1.6
  • Hadoop
  • Mahout
  • Maven

You can find the instructions to install those from a previous post Playing with the Mahout recommendation engine on a Hadoop cluster.

Compiling the code

All the sourcecode, the training sets and testing sets are in the github repository at

You can fetch the files from this repository by typing:

$ git clone

This repository is structured as follow:

Once you fetched the project, you can compile it using maven:

$ mvn clean package assembly:single

It creates a jar file in the directory target which all the dependencies and the compiled class from the src/main/java directory.

Preparing the data set

You can download the yale face database by going to this page:

Unzip the file:

$ unzip

Now we are going to split this file into two sets: a training set and a testing set:

$ mkdir training-set
$ mv yalefaces/* training-set/

For the testing set, we are removing the sad facial expression from the training set and move it to the testing set:

$ mkdir testing-set
$ mv training-set/*.sad testing-set/

We also add two non face images(hamburger and cat) and one person face unknown to the training set(Bruce Lee):

$ cp [MAHOUT EIGENFACE EXAMPLE DIRECTORY]/images/yalefaces-test/* testing-set

Training the model

The training is implemented in the class GenerateCovarianceMatrix to generate the covariance matrix.

The arguments of this class are:

  • image width: it is used to scale down the image width so that the computation does not take too much memory
  • image height
  • training directory: directory containing the training face images
  • output directory

$ java -cp target/mahout-eigenface-example-1.0-jar-with-dependencies.jar com.chimpler.example.eigenface.GenerateCovarianceMatrix 80 60 [TRAINING_SET_DIRECTORY] output
This program:

  1. reads all the n image files from the training directory
  2. convert each image to greyscale and scale down the image
  3. create a matrix M with each column representing an image. The column has a length of w xand each of its element represents a shade of grey with a value between 0(black) and 255(white).
  4. compute the mean image and write it in output/mean-image.gif. It is computed by averaging each pixel of the images
  5. compute the diff matrix DM by substracting the mean image to M
  6. Compute the covariance matrix transpose(DM) x DM. It gives the matrix of size n x n
  7. write the diff matrix DM to output/diffmatrix.seq
  8. write the covariance matrix to output/covariance.seq

Now we need to compute the eigenvectors of the covariance matrix. It can be done using the Mahout Singular Value Decomposition(SVD).

To use it, first copy the file covariance.seq to HDFS:

$ hadoop fs -put output/covariance.seq covariance.seq

Then run the Mahout SVD:

$ mahout svd --input covariance.seq --numRows 150 --numCols 150 --rank 50 --output output

We set the –numRows and –numCols to the size of the covariance matrix (150 x 150) and the rank to 50 (we usually set it to one third of the number of images).

The computed eigen vectors might contain extra eigenvectors with invalid eigenvalues. To fix this, we can run mahout cleansvd:

$ mahout cleansvd -ci covariance.seq -ei output -o output2

We can now copy the clean eigen vector to the local filesystem:

$ hadoop fs -get output2/cleanEigenvectors output/cleanEigenvectors

Then execute the java class ComputeEigenFaces to create the eigenfaces.

To run the program:

$ java -cp target/mahout-eigenface-example-1.0-jar-with-dependencies.jar com.chimpler.example.eigenface.ComputeEigenFaces output/cleanEigenvectors output/diffmatrix.seq output/mean-image.gif 80 60 [TRAINING_SET_DIRECTORY] output

It creates the eigenfaces matrix in output/eigenfaces.seq and the images representing those eigenfaces in the output directory:

It also tries to reconstruct the faces of the training sets using the eigenfaces. To do that it computes the weight of each eigenface by doing a scalar product of the image pixel column with each eigenface column and then normalize it. Then it sums up each pixel of the eigenfaces weighted by those weights. You can think of this process as superposing the eigenfaces layers and give them a different transparency value (can be negative) to try to reconstruct the original image.

After having reconstructed the image, it computes the distance between the original image and the reconstructed image (using euclidian distance between the pixels):

Reconstructed Image distance for subject01.centerlight: 37.395691
Reconstructed Image distance for subject01.glasses: 32.350212
Reconstructed Image distance for subject01.happy: 27.559056
Reconstructed Image distance for subject01.leftlight: 28.008936
Reconstructed Image distance for subject01.noglasses: 47.047757
Reconstructed Image distance for subject01.normal: 32.627928
Reconstructed Image distance for subject01.rightlight: 25.465009
Reconstructed Image distance for subject01.sleepy: 23.635308
Reconstructed Image distance for subject01.surprised: 45.947206
Reconstructed Image distance for subject01.wink: 32.132286
Min distance = 14.470855648264822
Max distance = 47.047756576566904

These distances are quite small which means that our eigenfaces allows to efficiently represent faces.

Testing the Model

Now that we have trained our model, we are going to test it.
In the training set, we have some of the same people than in the training set but with a different facial expression. We also have two images with are not person face(hamburger and cat) and one image of a new person(Bruce Lee).

The class ComputeDistance tests if the images in the testing directory can be recognized as a person face and find the most similar image in the training set.

To run the program:

$ java -cp target/mahout-eigenface-example-1.0-jar-with-dependencies.jar com.chimpler.example.eigenface.ComputeDistance output/eigenfaces.seq output/mean-image.gif output/weights.seq  68 68 [TRAINING_SET_DIRECTORY] [TESTING_SET_DIRECTORY] output

For each image of the testing set, it computes the  weight that needs to be applied on each eigenface to reconstruct the image and generate the reconstructed image in the output directory:


As expected, the images representing the face of people from the training set are well reconstructed but not the cat and the hamburger images. The reconstructed face of Bruce Lee is not recognizable but we can see that it is still a face. The program also computes the distance between the original image and the reconstructed image. It also tries for each test image, to find the most similar image in the training set by comparing the eigenfaces weight using euclidian distance:

Reconstructed Image distance for brucelee.gif: 51.404904
Image brucelee.gif is most similar to subject03.surprised: 447.574353
Reconstructed Image distance for cat.gif: 65.154281
Image cat.gif is most similar to subject05.centerlight: 638.072675
Reconstructed Image distance for hamburger.gif: 52.313601
Image hamburger.gif is most similar to subject01.rightlight: 684.214467
Reconstructed Image distance for subject01.sad: 32.473280
Image subject01.sad is most similar to subject01.sleepy: 101.895815
Reconstructed Image distance for subject02.sad: 22.418869
Image subject02.sad is most similar to subject02.noglasses: 104.859642
Reconstructed Image distance for subject03.sad: 35.468822
Image subject03.sad is most similar to subject03.noglasses: 120.972063
Reconstructed Image distance for subject04.sad: 30.370102
Image subject04.sad is most similar to subject04.normal: 0.000000

Those results confirm the visual interpretations we made previously: the distance between the reconstructed image and the original image of the hamburger and the cat are pretty high, also the weight distance with the images from the training set is pretty high. The image of Bruce Lee is reconstructed fairly but the weight distance is low.

For the other people faces, this distance is pretty small and it successfully associates them to the face of the same person from the training set.

Using the weight distance, we can define two thresholds:

  • T1: threshold at which the images represent a face
  • T2: weight threshold at which the image represents a face from the training set

So if the weight distance is above T1, then the image does not represent a face. Between T1 and T2, it represents an unknown face. And below T2, it represents a face from the training set. Choosing those thresholds is done heuristically.


We show in this post how to generate the eigenfaces from a training set and then uses those eigentafces to recognize person’s face. We also introduce some metrics to to determine if an image represents a person face or not and if it is similar to a face from the training set.

If you are trying this tutorial with other images make sure that:

  • the faces are in the same position in the image
  • the faces have the same scale/rotation angle
  • the faces have the same brightness/contrasts

Some techniques were developed to alleviate those constraints. You can find several papers about this on the web.

Leave a comment

Posted by on 21/08/2014 in Mahout


Cài đặt bộ gõ tiếng Việt Scim cho CentOS 5.x

1. Cài scim và các thư viện cần thiết.

Mở Terminal, gõ

  yum -y install scim scim-libs scim-tables scim-bridge

2. Tải scim input method for vietnamese về

Tải ở đây :

3. Cài đặt

Vào chuyển con trỏ lệnh đến thư mục vừa mới tải về. Gõ :

tar -zxf scim-tables-vietnamese-ext*.tar.gz
cd scim-tables-vietnamese-ext/
make install

4. Tuỳ chỉnh

System => Preference => => Input Method, chọn custom input method, chọn scim.

Logout để scim hoạt động. Vào add thêm bộ gõ Telex trong Scim để gõ telex.

Leave a comment

Posted by on 18/02/2013 in Linux


SharePoint 2010 Powershell Feature Cmdlets

In this installment its time to look at the various cmdlets that have to do with Features. Of course you can look at the UI to do this but its much, much easier to do this powershell and dare I say more fun.

Now keep in mind that this only related to FARM level features, I will cover Sandbox solutions and features next!

Listing features on Farm, Site Collection and Site

The main cmdlet used within powershell to list features is the Get-SPFeature cmdlets. To show all the features on the farm listed by display name and sorted use this:

1 Get-SPFeature | Sort -Property DisplayName

To show all the features on the Farm grouped by scope in a table use:

1 Get-SPFeature | Sort -Property DisplayName, Scope | FT -GroupBy Scope DisplayName

To see all features for a Web Application:

1 Get-SPFeature -WebApplication http://webapplication

To see all features for a Site Collection:

1 Get-SPFeature -Site http://sitecollection

To see all features for a Site:

1 Get-SPFeature -Web http://siteurl

Remember for some more information relating to the features you can use:

1 Get-SPFeature -Web http://siteurl | Format-List

To see all the members that a feature definition has use:

1 Get-SPFeature -Web http://siteurl | Get-Member

Enabling and Disabling Features

To disable and enable features is all pretty easy once again using the Disable-SPFeature and Enable-SPFeature cmdlets but there is a trick. You need the name of the feature folder that contains the actual feature not what is displayed in the UI so be careful:

1 Enable-SPFeature -Identity "Hold" -URL http://url

You can apply this to any Site and Site Collection scoped features.
Obviously to disable a feature just use the same syntax but with the Disable-Feature cmdlet

1 Disable-SPFeature -Identity "Hold" -URL http://url

Remember though that the -Identity is the DisplayName property of the feature, not the text displayed on the UI which is actually retrieved from a resources file.
For example the Document Sets feature looks like below in the SharePoint interface:
But to actually enable it you have to use the following cmdlet:

1 Enable-SPFeature -Identity DocumentSet -URL http://url

Installing and Uninstalling Features

Once again this is pretty straight forward and is really made up of only two cmdlets: Install-SPFeature and Uninstall-SPFeature
To install a feature you need to specify the name of the folder that your feature contains:
1 Install-SPFeature "FeatureFolderName"

To uninstall simply use the same Uninstall-Feature command with the same parameters:

1 UnInstall-SPFeature "FeatureFolderName"
Leave a comment

Posted by on 03/05/2012 in SharePoint


How to: Create a Windows Communication Foundation Client

See follow link:

How to: Create a Windows Communication Foundation Client

Note: SvcUtil.exe can be found at C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin.

Leave a comment

Posted by on 24/04/2012 in WCF


Lập trình generic (template) với C#

Bài viết sưu tầm tại:
Trong C# hỗ trợ lập trình generic khá mạnh, nếu bạn biết sử dụng lập trình generic thì bạn tiết kiệm được khá khá thời gian lập trình, và tính tái sử dụng code rất cao, vẫn đảm bảo tính trong sáng khi code mà performance của chương trình hầu như ko giảm. Trong C#, bạn có thể lập trình Generic với Class, Struct, Function.
1. Lập trình Generic với Class.
Chắc bạn đã quá quen thuộc với khai báo
List<string> nameList = new List<string>();
Đây là khai báo sử dụng một danh sách các string, trong đó lớp List<T> là một lớp được lập trình dạng generic, bạn có thể dễ dàng khai báo một danh sách các đối tượng thuộc kiểu bất kì. Rõ ràng tính tái sử dụng lớp List<T> cao hơn rất nhiều lớp List ko có generic.
Áp dụng phương pháp cách định nghĩa lớp này, chúng ta có thể tạo những lớp generic khác nhau. Tớ ví dụ ở đây lớp Couple<T, E> được sử dụng rất nhiều khi tớ viết code.

    public class Couple<T, E>
        public T elementA;
        public E elementB;
        public Couple(T inA, E inB)
            elementA = inA;
            elementB = inB;

Lớp này được dùng khi bạn muốn có một đối tượng tạm thời chỉ gồm hai phần tử, bình thường bạn phải định nghĩa một class mới để phù hợp với kiểu của 2 phần tử nói trên, nhưng với Couple<T, E> bạn ko cần thiết phải làm điều đó.
Ví dụ bạn muốn một đối tượng gồm 2 phần tử string và một số integer, bạn khai báo như sau :

Couple<string, int> couple = new Couple<string, int>(”Age”, 29);

Khi đó couple.elementA sẽ có kiểu string nhận giá trị “Age” và couple.elementB sẽ có kiểu int nhận giá trị 29.
Cũng với lớp này, bạn có thể tạo một lớp danh sách các bộ đôi string và int như sau :

List<Couple<string, int>> listCouple = new List<Couple<string, int>> ();

Và khi đó List[5].elementA sẽ có kiểu string, trả về giá trị string của Couple có index 5 trong List nói trên.
Hoặc khi hàm bạn viết cần phải trả về 2 đối tượng, thì việc sử dụng Couple<T, E> cũng là một cách tốt, trong sáng và vẫn rất OPP:D.
Chú ý, bạn có thể bổ sung thêm Property cho lớp Couple<T, E> nếu thấy cần thiết (khi lập trình aspx chẳng hạn), thêm như bình thường.

    public class Couple<T, E>
        public T elementA;
        public E elementB;
        public Couple(T inA, E inB)
            elementA = inA;
            elementB = inB;
        public T ElementA
            get{return elementA;}
            set{elementA = value;}

Tương tự như vậy, bạn có thể khai báo thêm các lớp Generic Triple<T, E, F> (bộ ba) và Quad<T, E, F, V>

    public class Triple<T, E, F>
        public T elementA;
        public E elementB;
        public F elementC;
        public Triple(T inA, E inB, F inC)
            elementA = inA;
            elementB = inB;
            elementC = inC;

    public class Quad<T, E, F, G>
        public T elementA;
        public E elementB;
        public F elementC;
        public G elementD;
        public Quad(T inA, E inB, F inC, G inD)
            elementA = inA;
            elementB = inB;
            elementC = inC;
            elementD = inD;

Bạn có thể tận dụng 3 lớp nói trên trong các dự án C# của bạn, sẽ tiết kiệm nhiều thời gian đấy. Chú ý phải định nghĩa thêm nhé, đây không phải là lớp sẵn có của .Net.

2. Lập trình Generic với Struct.
Nhìn chung, ko có ji khác biệt trong cách lập trình Generic Struct và Generic Class, ví dụ sau là Struct
Couple<T, E>:

    public struct Couple<T, E>
        public T elementA;
        public E elementB;
        public Couple(T inA, E inB)
            elementA = inA;
            elementB = inB;
        public T ElementA
            get{return elementA;}
            set{elementA = value;}

Nói chung, nếu ko phải quá bận tâm vào performance, thì hầu như ta có thể dùng Class thay vì dùng Struct trong lập trình C#. Tớ sẽ phân biệt sự khác nhau, và ưu nhược điểm của việc sử dụng Struct trong C# trong một bài viết khác.

3. Lập trình Generic Function.
C# cũng cho phép lập trình Generic với hàm, sau đây là một ví dụ :

    public string toString<T>(List<Couple<string, T>> list)
        string result = "";
        foreach (Couple<string, T> pair in list){
            string tmp = pair.elementA + " : " + pair.elementB.ToString();
            result += tmp + '\n';
        return result;

Đây là ví dụ một hàm trả về một string tương ứng với nội dung của một danh sách bộ đôi của một String và một loại object nào đó (string, int, …) sẽ được thay thể trong mỗi trường hợp call cụ thể. Có lẽ đây ko phải là ví dụ hay cho Generic Function. Nhưng nhờ lập trình Generic Function, tớ đã có thể viết hàm BinarySearch rất tổng quát, có thể được tái sử dụng trong rất nhiều dự án khác, tiết kiệm rất nhiều thời gian. Tớ sẽ giới thiệu hàm này trong bài nói về Sắp xếp và Tìm kiếm sau.

Bài viết này có tham khảo tại:

Leave a comment

Posted by on 24/04/2012 in DotNet


WCF service failed to start: AddressAccessDeniedException

One of the major changes in Windows Vista / Windows 7 security is that most people are no longer going to be running with Administrator privileges by default like they were doing on earlier platforms. This impacts your ability to run HTTP web services because listening at a particular HTTP address is a restricted operation. By default, every HTTP path is reserved for use by the system administrator. Your services will fail to start with an AddressAccessDeniedException if you aren’t running the service from an elevated account. On Windows Vista/Windows 7, httpcfg.exe is no longer included and instead there’s a new command set available through netsh.exe.

I’m going to walk through delegating part of the HTTP namespace to get a web service working that wants to listen at http://localhost:8000/. Since I’m not running as the Administrator when debugging in Visual Studio, the service fails to start when I run it.

HTTP could not register URL http://+:8000/. Your process does not have access rights to
this namespace (see for details).

The plus sign in the URL just means that there’s a wildcard being applied to the hostname. I’ll have another article that talks about wildcards in more detail. To fix this problem, I first need to start a command prompt using “Run as administrator” so that I have elevated privileges. Then, I can use netsh.exe to give some of the Administrator’s HTTP namespace to my user account. You can look at the existing HTTP namespace delegations by using “netsh http show urlacl“. There should be several namespaces set up by default, including the default one that WCF uses for temporary addresses as an example.

Reserved URL : http://+:80/Temporary_Listen_Addresses/
User: \Everyone
Listen: Yes
Delegate: No
SDDL: D:(A;;GX;;;WD)

Now, I open command prompt as administrator privileges and use follow command:
netsh http add urlacl url=http://+:8000/ user=MYMACHINE\UserName
to assign some of the HTTP namespace to my user account. You can get the syntax for all of these commands by running “netsh http” without any arguments. Note that I’ve matched the URL in this command to the URL that appeared in the error message. The wildcarding is important for getting the right reservation and you’ll continue to be denied access if your reservation covers less than your service’s attempted registration. Going back to Visual Studio, my service now starts up and runs as expected.

Leave a comment

Posted by on 23/04/2012 in WCF


Parser Error: Direct Dependencies and The Limit Has Been Exceeded

Parser Error
Description: An error occurred during the parsing of a resource required to service this request. Please review the following specific parse error details and modify your source file appropriately.

Parser Error Message: The page ‘/sites/blah/_catalogs/masterpage/blah.master’ allows a limit of 11 direct dependencies, and that limit has been exceeded.

During deploying user control on the master page, in one of our project I used to get the above error. Generally such kind of exception is thrown will rendering a page. The reason behind this is the number of controls allowed on the page to render exceeds the limit specified in the web.config file

The solution of this problem can be fixed in two different ways.
Solution 1: Modify the control dependencies in the web.config file.
Solution 2: Optimize the usage of controls on the master page; this can be done either by deleting the duplicates, or integrating 2 or more control in one control.
The solution 2 is a complex one, where as the solution 1 is a simple modification as show below

Search for the following tag in the web config file and chage the DirectFileDependecies

<- SafeMode MaxControls=”200″ CallStack=”true” DirectFileDependencies=”20″ TotalFileDependencies=”50″ AllowPageLevelTrace=”false”>
By default the DirectFileDependecies will be set to 10, you can change to any limit u require.

Leave a comment

Posted by on 12/04/2012 in SharePoint