如何存储人脸的多个特征并计算距离?

huangapple go评论79阅读模式
英文:

How to store multiple features for face and find distance?

问题

我正在进行一个基于面部识别和验证的项目。我正在使用孪生网络来获取面部的128维向量(嵌入)。

我将个人面部的编码/嵌入存储在数据库中,然后检查或匹配传入面部的编码与先前存储的编码以识别该人。

为了建立一个强大的系统,我必须存储同一个人的多个编码。当我只使用单个编码向量并与以下内容进行比较时:

face_recognition库(获取距离):

face_recognition.compare_faces( stored_list_of_encodings, checking_image_encodings )

这并不总是有效,因为我只比较了一个编码。为了使系统在大多数情况下足够强大,我想要存储同一人的至少3个编码,然后与新数据进行比较。

现在的问题是:如何存储同一人的多个嵌入并进行距离比较?

我正在使用face_recognition库和孪生网络进行特征提取。

英文:

I am working on a project based on the facial recognition and verification. I am using Siamese network to get the 128 vector of the face ( Embeddings ).

I am storing the encodings/embeddings of the person's face in the database and then checking or say matching the incoming face's encodings with the previously stored encodings to recognize the person.

To make a robust system, I have to store more than one encodings of the same person. When I have used only a single encoding vector, and matched with :

From face_recognition library (to get distance):

face_recognition.compare_faces( stored_list_of_encodings, checking_image_encodings )

That doesn't work all the time because I have only compared with a single encoding. To make a system sufficient for most cases, I want to store minimum 3 encodings of a same person and then compare with the new data.

Now the question: How to store multiple embeddings of a same person and then compare the distance?

I am using face_recognition as the library and Siamese Network for feature extraction.

答案1

得分: 1

以下是翻译好的部分:

创建KNN分类器

做这个的方式是创建一个数据库,其中每个特征都与一个人的名字关联(在这种情况下,一个特征代表一个人的一张脸部图像)。然后,在比较时,您计算查询特征与每个表示的距离。您取N个最小距离的比较值。然后,您可以浏览N个距离,查看它们各自属于哪些类别,然后您可以使用出现次数最多的标签,这将是您的目标类别。根据我的经验,这不是非常健壮的方法(尽管这完全取决于您的测试数据类型,我的数据涉及到大量野外图像,所以这种方法不够健壮)。

平均表示

我使用的另一种方法是对每个人的表示进行平均。如果我有5张图像,我会取这5个表示的平均值或中位数。根据我的经验,中位数比平均值效果更好。现在,每个人都有一个平均表示,您可以计算查询表示与每个平均表示的距离,距离最小的将是您的目标类别。

聚类表示

另一种方法是使用DBScan将表示聚类成簇,然后在运行时将查询表示分类到一个簇中,并将该簇中的多数类作为标签。

根据我的经验,平均表示是最好的方法,但您最好需要多张图像,至少5张。但在我的情况下,我需要至少5张图像,因为我要涵盖多个角度等。

注意: SVM是一个不好的方法,它限制了数据库的大小,每次需要将新的人添加到数据库中时,都需要为新的类别训练一个新的SVM。

此外,出于存储目的,您始终可以将数据存储在JSON中。

英文:

Theres multiple approaches to this, i have worked on Face recognition quite extensively and there are a few things that i tried. You can do some of the following.

Create a KNN classifier

The way to do this is to create a db of sorts, where each feature has a person name associated with it (in this case a feature is representative of one face image of a person). Then at comparison time, you compute the distance of your query feature with each representation. You take the comparisons with the N smallest distances. You can then go through the N distances and see what classes each belong to, and you can then use the maximum occurring label, and this will be your target class. In my experience though this isn't extremely robust (though this is entirely dependent on the type of your test data, mine had to do with alot of in the wild images, so this wasnt robust enough)

Averaged Representations

Another approach that i used was that i averaged the representations for each person. If i had 5 images, i would take the mean or the median of the 5 representations extracted from those representations. In my experience median worked better than mean. You will now have an average representation affiliated with each person, you can just take the distance with each average representation, and the one with the least distance will be your target class.

Cluster Representations

Another approach is to cluster representation into clusters using DBScan, and then at runtime classify the query rep into a cluster and take the majority class in that cluster as the label

In my experience average representation is the best, but you do end up needing multiple images, at-least 5 i think. But in my case i needed at-least 5 since i was catering to multiple angles and what not.

NOTE :: SVM is a BAD approach, you limit your DB size, and every-time you need to add a new person to the DB you would need to train a new SVM for the extra class that has just popped up

Also, for a storing purposes you could always store it in a JSON

答案2

得分: 0

你考虑过使用SVM分类器来对人脸进行分类吗?因此,SVM分类器的输入将是大小为128的向量。然后,您可以将属于同一个人的几个向量(在您的情况下是3个)编译在一起,将其作为一个类别拟合到SVM中。然后,您可以对不同的人脸(类别)进行同样的操作。

然后,在预测人脸时,只需提供新的向量并运行

svm.predict([...])

我在我的项目中也有类似的用例,但我使用Facenet作为特征提取器。效果非常好。

英文:

Have you considered using an SVM classifier to classify the faces? So the input to the SVM classifier would be the vector of size 128. You can then compile a few of the vectors belonging to the face of a single person (3 in your case) and fit it to an SVM as a class. You can then do the same for different faces (classes).

Then, when predicting a face, simply feed in the new vector and run

svm.predict([..])

I had a similar use-case for my project, but I was using Facenet instead as the feature extractor. Works perfectly.

答案3

得分: 0

你可以将所有的人脸嵌入存储在支持最近邻查询的数据库/数据结构中,然后对于任何给定的人脸,你应该找到一个匹配项,获取它在数据库中的最近邻嵌入。通过k个最近邻和它们到查询项的距离,你可以确定这张新脸属于哪个人(如果它确实属于已知的人)。

你可以查看近似最近邻基准以获取可用的选项。

只要记住,它们被称为近似,所以你不会得到精确的结果,但如果你处理大量实体,这是你最好的选择。如果不是你的情况,你可以使用sklearn中已提供的暴力最近邻解决方案来获取精确匹配。

英文:

You can store all the face embeddings in a database/datastructure which supports nearest neighbors queries and then for any given face you ought to find a match, get it's nearest neighbors embeddings in the database. With k nearest neighbors and their distances to query item, you can decide which person this new face belongs to (if it belongs to known persons at all).

You can take a look at Approximate Nearest Neighbor Benchmark for available options.

Just remember, they are called approximate so you won't get exact results but it is the best option you have if you are dealing with a large amount of entities. If it's not the case with you, you can just use brute force nearest neighbors solutions already provided in sklearn to get exact matches.

huangapple
  • 本文由 发表于 2020年1月7日 01:00:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/59616113.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定