英文:
What's the best solution for making millions of small binaries available for testing consistently?
问题
我们正在开发一个生物识别匹配解决方案,用于验证系统。正如您可能知道的那样,生物识别数据的主要问题之一是它们是非结构化的二进制数据,每个生物识别细节都必须与整个细节数据库进行匹配。
因此,我们正在寻找一种快速和适当的解决方案,以消除物理硬盘的二进制检索(I/O)延迟,并通过使所有二进制记录可用于新的匹配请求来减少开销。
目前,我们的解决方案是使用像Redis这样的内存数据库,并带有缓存机制。这种解决方案的问题在于,如果生物识别细节二进制数据的数量非常多,内存(RAM)的大小会变得非常大。我们正在寻找一种解决方案,使所有二进制数据对我们的匹配应用程序高度可用。
请注意,通常每个生物识别细节的大小都不超过5 KB,而我们有数百万条生物识别细节记录。
英文:
We're developing a biometric matching solution for a verification system. As you may know, one of the main issues with biometric data is that they're unstructured binaries and every single biometric minutiae must be matched with the whole minutiae database.
Hence, we're looking for a fast and appropriate solution to eliminate the binary retrieval (I/O) latency from the physical hard disk and decrease the overheads by making all the binary records available for new matching requests.
Currently, our solution is to use an in-memory database like Redis with a caching mechanism. The problem with this solution is that the size of memory (RAM) goes really big if the number of biometric minutiae binary is so high. We're looking for a solution to make all the binaries highly available for our matching application.
Take note that usually each biometric minutiae are less than 5 KB only and we have millions of biometric minutiae records.
答案1
得分: 0
你可以使用内存和基于磁盘的数据库的组合来存储数百万个细微之处。
你可以将所有的细微之处存储在任何基于磁盘的数据库中,比如MySQL、PostgreSQL或其他任何数据库。
细微之处的数据将分布在三个不同的数据存储中:
- 应用程序缓存(本地缓存)
- 内存数据库(Memcache、Redis等)
- 基于磁盘的数据库(MySQL、MongoDB等)
假设你的设置中使用了Redis和MySQL。
你的代码应首先在应用程序缓存中搜索细微之处,如果未找到,则应在Redis中搜索以查看是否可用,如果找到,则获取并将其存储在本地缓存中(带有过期时间)。
即使在Redis中没有找到数据,你仍然应该在MySQL数据库中搜索并获取数据。如果找到,则应将相同的数据存储在Redis中(带有过期时间)。
使用过期时间可以避免同时将所有对象保存在内存中。
假设现在你不想使用过期时间,因为你始终需要所有的细微之处。在这种情况下,你可以要么增加Redis实例的大小,要么使用Redis集群。作为替代方案,可以使用IMDG(内存数据网格)如Hazelcast、Apache Ignite等来存储所有的细微之处。如果你不喜欢使用这种复杂的设置,那么你应该考虑使用像Sap Hana、MemSQL等内存数据库。
英文:
You can use a combination of in-memory and disk-based DB, to store millions of minutiae.
You can store all minutiae in any disk-based DBs like MySQL, PostgreSQL, or any other.
Minutiae data would be spread across three different datastores.
- Application cache (Local cache)
- In-Memory DB (Memcache, Redis, etc)
- Disk-based DB (MySQL, MongoDB, etc)
Let's say you're using Redis and MySQL in your setup.
Your code should first search for the minutiae in the application cache, if it's not found then it should search in Redis to see if it's available there, if you find there then get that and store it in the local cache with expiry.
Even if data is not available in the Redis then you should search in the MySQL database and bring it back. If you find then you should store the same data in Redis with expiry.
Using expiry you can avoid having all objects in the memory at the same time.
Let's say now you don't want to use expiry as you always need all the minutiae. In such cases, you can either increase the size of your Redis instance or use the Redis cluster. As an alternative, IMDG (In-memory data grid) like Hazelcast, Apache Ignite, etc can be used to store all the minutiae. If you don't like to use such a complex setup, then you should consider using In-memory databases like Sap Hana, MemSQL, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论