英文:
Which is the optimized way to query using aerospike client?
问题
以下是您要翻译的内容:
我有一个集合(set1)
Bins:
bin1(PK = key1)
bin2(PK = key1)
bin3(PK = key2)
bin4(PK = key2)
从以下两种方法中查询来自aerospike客户端的数据,哪种方式更优化(从查询时间、CPU使用率和1个客户端调用与2个客户端调用的失败情况来看):
方法1: 使用具有bins = [bin1, bin2, bin3, bin4]和keys = [key1, key2]的aeropsike客户端进行1次获取调用
方法2: 进行2次aerospike客户端获取调用。第一次调用将具有bins = [bin1, bin2]和keys = [key1],第二次调用将具有bins = [bin3, bin4]和keys = [key2]
我认为方法2更清晰,因为在方法1中,我们将尝试获取所有组合的记录(例如:bin1与key2作为主键),这将是额外的计算,而主键集可能很大。但方法2的缺点是需要两次Aerospike客户端调用。
英文:
I have a set (set1)
Bins :
bin1 (PK = key1)
bin2 (PK = key1)
bin3 (PK = key2)
bin4 (PK = key2)
Which is more optimized way(in terms of query time, cpu usage, failure cases for 1 client call vs 2 client calls) for querying the data from aerospike client from the below 2 approaches:
Approach 1 : Make 1 get call using aeropsike client which has bins = [bin1, bin2, bin3, bin4] and keys = [key1, key2]
Approach 2 : Make 2 aerospike client get calls. First call will have bins = [bin1, bin2] and keys = [key1] and Second call will have bins = [bin3, bin4] and keys = [key2]
I find Approach 2 more cleaner, since in Approach 1 we will try to get the record for all combinations (e.g. : bin1 with key2 as primary key) and it will be extra computation and the primary key set can be large. But the disadvantage of Approach 2 is two Aerospike client calls.
答案1
得分: 1
A. 批量读取与多次单独读取
这有点像是一个虚假的选择。是的,你可以批量调用[key1,key2](1),并且不应指定bin1,bin2,bin3,bin4,只需获取完整记录而不选择bin。或者你可以进行两个独立的get()调用,一个用于key1,一个用于key2(2)。
然而,没有理由需要读取key1,等待结果,然后读取key2。你可以在一个线程中使用同步的get(key1)来读取它,然后在另一个线程中使用同步的get(key2)来读取它。Java客户端可以处理多线程使用。或者,你可以异步获取(key1)并立即异步获取(key2)。
批量读取(例如(1)中的情况)在记录数量小于至少集群中节点数量的情况下,不如单次读取效率高。记录均匀分布,因此如果你有一个包含4个节点的集群,并且使用4个键进行批量请求,你最终会得到大约1个记录每个节点的并行子批次。当情况如此时,与批量读取相关的开销不值得。请参阅有关批量索引的文档以及知识库中的FAQ - 批量索引调优参数。FAQ - 单个记录与批量获取之间的区别应该能解答你的问题。
B. Aerospike数据库中的记录数量不影响读取性能!
你担心“主键集可能很大”。对于Aerospike来说,这一点根本不是问题。事实上,Aerospike的一个最大优点之一就是从一个包含100万条记录或包含1万亿条记录的数据库中获取单个记录的计算成本几乎是相同的。
每个记录在主索引中有一个64字节的元数据条目。主索引均匀分布在集群的节点上,因为Aerospike中的数据分布非常均匀。每个节点存储分区的均等份额,每个命名空间在集群中有4096个逻辑分区。分区表示为一组红黑二叉树(sprigs),具有指向正确sprig的哈希表。
为了找到任何记录,客户端将其键散列为20字节的摘要。使用摘要的12位,客户端找到分区ID,查找本地持有的分区映射,并找到正确的节点。读取记录现在是一次跳跃到正确节点。在该节点上,服务线程从网络卡的通道中接收调用,查找它在正确分区中(再次从摘要中找到分区ID是一个简单的O(1)操作)。它直接跳到正确的sprig(也是O(1)),然后对记录的元数据进行简单的O(n log n)二叉树查找。现在服务线程知道在存储中精确地找到记录,只需进行一次读取I/O。我在这里更详细地解释了这个读取流程(尽管在版本4.7中删除了事务队列和线程,服务线程完成了所有工作)。
另一个要点是,在索引中查找记录元数据所花费的时间比从存储中获取记录的时间要少得多。
因此,集群中的记录数量不会改变从任何大小的数据集中读取随机记录所需的时间。
我写了一篇文章Aerospike建模:用户个人资料存储,展示了如何利用这一事实从PB级别的数据存储中实现每秒数百万次事务的亚毫秒读取。
英文:
A. Batch reads vs. multiple single reads
This is kind of a false choice. Yes, you could make a batch call for [key1, key2] (1), and you shouldn't specify bin1, bin2, bin3, bin4, just get the full records without selecting bins. Or you could make two independent get() calls, one for key1, one for key2 (2).
However, there's no reason you need to read key1, wait for the result, then read key2. You can read them with a synchronous get(key1) in one thread, and a synchronous get(key2) in another thread. The Java client can handle multi-threaded use. Alternatively, you can async get(key1) and immediately async get(key2).
Batch reads (such as in (1)) are not as efficient as single reads when the number of records is smaller than at least the number of nodes in the cluster. The records are evenly distributed, so if you have a 4 node cluster, and you make a batch request with 4 keys, you end up with parallel sub-batches of roughly 1 record per-node. The overhead associated with batch-reads isn't worth it when that's the case. See more about batch index in the docs and the knowledge base FAQ - batch-index tuning parameters. The FAQ - Differences between getting single record versus batch should answer your question.
B. The number of records in an Aerospike database doesn't impact read performance!
You are worried that "the primary key set can be large". That is not a problem at all for Aerospike. In fact, one of the best things about Aerospike is that getting a single record from a database with 1 million records or one with 1 trillion records is pretty much the same big-O computational cost.
Each record has a 64 byte metadata entry in the primary index. The primary index is spread evenly across the nodes of the cluster, because data distribution in Aerospike is extremely even. Each node stores an even share of the partitions, out of 4096 logical partitions for each namespace in the cluster. The partitions are represented as a collection of red-black binary trees (sprigs) with a hash table leading to the correct sprig.
To find any record the client hashes its key into a 20 byte digest. Using 12 bits of the digest the client finds the partition ID, looks it up in the partition map it holds locally, and finds the correct node. Reading the record is now a single hop to the correct node. On that node, a service thread picks up the call from a channel of the network card, looks it up in the correct partition (again, finding the partition ID from the digest is a simple O(1) operation). It hops directly to the correct sprig (also O(1)) and then does a simple O(n log n) binary tree lookup for the record's metadata. Now the service thread knows exactly where to find the record in storage, with a single read IO. I explained this read flow in more detail here (though in version 4.7 transaction queues and threads were removed; the service thread does all the work ).
Another point is that the time spent looking up record metadata in the index is orders of magnitude less than getting the record from storage.
So, the number of records in the cluster doesn't change how fast it takes to read a random record, from a data set of any size.
I wrote an article Aerospike Modeling: User Profile Store that shows how this fact is leveraged to make sub-millisecond reads at millions of transactions-per-second from a petabyte scale data store.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论