如何在C++中比较两个Apache Arrow表,忽略顺序?

huangapple go评论81阅读模式
英文:

How to compare two apache arrow tables for equality ignoring order in C++?

问题

在单元测试中,我想要比较实际和预期的 arrow::Table,但要忽略顺序,因为系统中的并发会随机排序。

我想要像 table1.Equals(table2) 这样比较两个表的相等性,但要忽略行的顺序。

或者,我如何按所有列对表进行排序,以便然后进行比较?

英文:

In unit tests, I want to compare actual and expected arrow::Tables but ignoring order because the ordering is randomized by concurrency in the system under test.

I'd like to compare two tables for equality like table1.Equals(table2) but ignoring row order.

Alternatively, how can I sort the tables by all columns so I can then compare them?

答案1

得分: 0

为了对所有列进行表格排序(这是arrow-C++和pyarrow比较表格的方式,忽略顺序,内部实现):

import pyarrow as pa
import pyarrow.compute as pc

table = ...
sort_keys = [(name, 'ascending') for name in table.column_names]
sort_indices = pc.sort_indices(table, sort_keys)
sorted_table = pc.take(table, sort_indices)
英文:

To sort a table on all columns (this is how arrow-C++ and pyarrow compare tables, ignoring order, internally):

import pyarrow as pa
import pyarrow.compute as pc

table = ...
sort_keys = [(name, 'ascending') for name in table.column_names]
sort_indices = pc.sort_indices(table, sort_keys)
sorted_table = pc.take(table, sort_indices)

huangapple
  • 本文由 发表于 2023年7月14日 05:52:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76683461.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定