英文:
How to compare two apache arrow tables for equality ignoring order in C++?
问题
在单元测试中,我想要比较实际和预期的 arrow::Table
,但要忽略顺序,因为系统中的并发会随机排序。
我想要像 table1.Equals(table2)
这样比较两个表的相等性,但要忽略行的顺序。
或者,我如何按所有列对表进行排序,以便然后进行比较?
英文:
In unit tests, I want to compare actual and expected arrow::Table
s but ignoring order because the ordering is randomized by concurrency in the system under test.
I'd like to compare two tables for equality like table1.Equals(table2)
but ignoring row order.
Alternatively, how can I sort the tables by all columns so I can then compare them?
答案1
得分: 0
为了对所有列进行表格排序(这是arrow-C++和pyarrow比较表格的方式,忽略顺序,内部实现):
import pyarrow as pa
import pyarrow.compute as pc
table = ...
sort_keys = [(name, 'ascending') for name in table.column_names]
sort_indices = pc.sort_indices(table, sort_keys)
sorted_table = pc.take(table, sort_indices)
英文:
To sort a table on all columns (this is how arrow-C++ and pyarrow compare tables, ignoring order, internally):
import pyarrow as pa
import pyarrow.compute as pc
table = ...
sort_keys = [(name, 'ascending') for name in table.column_names]
sort_indices = pc.sort_indices(table, sort_keys)
sorted_table = pc.take(table, sort_indices)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论