Which is the optimized way to query using aerospike client?
bin1（PK = key1）
bin2（PK = key1）
bin3（PK = key2）
bin4（PK = key2）
方法1： 使用具有bins = [bin1, bin2, bin3, bin4]和keys = [key1, key2]的aeropsike客户端进行1次获取调用
方法2： 进行2次aerospike客户端获取调用。第一次调用将具有bins = [bin1, bin2]和keys = [key1]，第二次调用将具有bins = [bin3, bin4]和keys = [key2]
I have a set (set1)
bin1 (PK = key1)
bin2 (PK = key1)
bin3 (PK = key2)
bin4 (PK = key2)
Which is more optimized way(in terms of query time, cpu usage, failure cases for 1 client call vs 2 client calls) for querying the data from aerospike client from the below 2 approaches:
Approach 1 : Make 1 get call using aeropsike client which has bins = [bin1, bin2, bin3, bin4] and keys = [key1, key2]
Approach 2 : Make 2 aerospike client get calls. First call will have bins = [bin1, bin2] and keys = [key1] and Second call will have bins = [bin3, bin4] and keys = [key2]
I find Approach 2 more cleaner, since in Approach 1 we will try to get the record for all combinations (e.g. : bin1 with key2 as primary key) and it will be extra computation and the primary key set can be large. But the disadvantage of Approach 2 is two Aerospike client calls.
为了找到任何记录，客户端将其键散列为20字节的摘要。使用摘要的12位，客户端找到分区ID，查找本地持有的分区映射，并找到正确的节点。读取记录现在是一次跳跃到正确节点。在该节点上，服务线程从网络卡的通道中接收调用，查找它在正确分区中（再次从摘要中找到分区ID是一个简单的O(1)操作）。它直接跳到正确的sprig（也是O(1)），然后对记录的元数据进行简单的O(n log n)二叉树查找。现在服务线程知道在存储中精确地找到记录，只需进行一次读取I/O。我在这里更详细地解释了这个读取流程（尽管在版本4.7中删除了事务队列和线程，服务线程完成了所有工作）。
A. Batch reads vs. multiple single reads
This is kind of a false choice. Yes, you could make a batch call for [key1, key2] (1), and you shouldn't specify bin1, bin2, bin3, bin4, just get the full records without selecting bins. Or you could make two independent get() calls, one for key1, one for key2 (2).
However, there's no reason you need to read key1, wait for the result, then read key2. You can read them with a synchronous get(key1) in one thread, and a synchronous get(key2) in another thread. The Java client can handle multi-threaded use. Alternatively, you can async get(key1) and immediately async get(key2).
Batch reads (such as in (1)) are not as efficient as single reads when the number of records is smaller than at least the number of nodes in the cluster. The records are evenly distributed, so if you have a 4 node cluster, and you make a batch request with 4 keys, you end up with parallel sub-batches of roughly 1 record per-node. The overhead associated with batch-reads isn't worth it when that's the case. See more about batch index in the docs and the knowledge base FAQ - batch-index tuning parameters. The FAQ - Differences between getting single record versus batch should answer your question.
B. The number of records in an Aerospike database doesn't impact read performance!
You are worried that "the primary key set can be large". That is not a problem at all for Aerospike. In fact, one of the best things about Aerospike is that getting a single record from a database with 1 million records or one with 1 trillion records is pretty much the same big-O computational cost.
Each record has a 64 byte metadata entry in the primary index. The primary index is spread evenly across the nodes of the cluster, because data distribution in Aerospike is extremely even. Each node stores an even share of the partitions, out of 4096 logical partitions for each namespace in the cluster. The partitions are represented as a collection of red-black binary trees (sprigs) with a hash table leading to the correct sprig.
To find any record the client hashes its key into a 20 byte digest. Using 12 bits of the digest the client finds the partition ID, looks it up in the partition map it holds locally, and finds the correct node. Reading the record is now a single hop to the correct node. On that node, a service thread picks up the call from a channel of the network card, looks it up in the correct partition (again, finding the partition ID from the digest is a simple O(1) operation). It hops directly to the correct sprig (also O(1)) and then does a simple O(n log n) binary tree lookup for the record's metadata. Now the service thread knows exactly where to find the record in storage, with a single read IO. I explained this read flow in more detail here (though in version 4.7 transaction queues and threads were removed; the service thread does all the work ).
Another point is that the time spent looking up record metadata in the index is orders of magnitude less than getting the record from storage.
So, the number of records in the cluster doesn't change how fast it takes to read a random record, from a data set of any size.
I wrote an article Aerospike Modeling: User Profile Store that shows how this fact is leveraged to make sub-millisecond reads at millions of transactions-per-second from a petabyte scale data store.