如何从Apache Arrow中的RecordBatches中检索一个Array而不是ArrayData的Compute Function。

huangapple go评论60阅读模式
英文:

How to retrieve an Array instead of ArrayData from a Compute Function with RecordBatches in Apache Arrow

问题

I'm trying to extract an Array out of a Datum after a Compute Operation.

ARROW_ASSIGN_OR_RAISE(rbatch, ipc_reader->Read(i));
std::shared_ptr<arrow::Array> numbers_array_a = rbatch->column(2);
std::shared_ptr<arrow::Array> numbers_array_b = rbatch->column(3);

// Get element-wise sum of both columns A and B in our Table. Note that here we use
// CallFunction(), which takes the name of the function as the first argument.
ARROW_ASSIGN_OR_RAISE(my_datum, arrow::compute::CallFunction(
                                            "add", {numbers_array_a,
                                                    numbers_array_b}));

The Datum now holds an Int64 Array

std::cout << "Datum kind: " << my_datum.ToString()
              << " content type: " << my_datum.type()->ToString() << std::endl;

>> Datum kind: Array content type: int64

When I now try to print the Array

std::cout << my_datum.array()->ToString()<< std::end;

I get this Error

class "arrow::ArrayData" has no member "ToString"

The Array class has a ToString() function, but as far as I know I can't convert ArrayData into an Array.

I tried to convert the ArrayData into an Array, I tried to initialize an Array with the ArrayData and I tried to access the values from the ArrayData all without success.
I tried to initialize a RecordBatch with ArrayData.

I also tried to look for alternatives to retrieve the Array from the Datum but also without success.

How can I print or even access the Array inside the Datum?

英文:

I'm trying to extract an Array out of a Datum after a Compute Operation.

ARROW_ASSIGN_OR_RAISE(rbatch, ipc_reader-&gt;Read(i));
std::shared_ptr&lt;arrow::Array&gt; numbers_array_a = rbatch-&gt;column(2);
std::shared_ptr&lt;arrow::Array&gt; numbers_array_b = rbatch-&gt;column(3);

// Get element-wise sum of both columns A and B in our Table. Note that here we use
// CallFunction(), which takes the name of the function as the first argument.
ARROW_ASSIGN_OR_RAISE(my_datum, arrow::compute::CallFunction(
                                            &quot;add&quot;, {numbers_array_a,
                                                    numbers_array_b}));

The Datum now holds an Int64 Array

std::cout &lt;&lt; &quot;Datum kind: &quot; &lt;&lt; my_datum.ToString()
              &lt;&lt; &quot; content type: &quot; &lt;&lt; my_datum.type()-&gt;ToString() &lt;&lt; std::endl;

&gt;&gt; Datum kind: Array content type: int64

When I now try to print the Array

std::cout &lt;&lt; my_datum.array()-&gt;ToString()&lt;&lt; std::end;

I get this Error

class &quot;arrow::ArrayData&quot; has no member &quot;ToString&quot;

The Array class has a ToString() function, but as fas as I know I can't convert ArrayData into an Array.

I tried to convert the ArrayData into an Array, I tried to initialize an Array with the ArrayData and I tried to access the values from the ArrayData all without success.
I tried to initialize a RecordBatch with ArrayData.

I also tried to look for alternatives to retrieve the Array from the Datum but also without success.

How can I print or even access the Array inside the Datum?

答案1

得分: 0

我认为你想要使用 arrow::Datum::make_array 函数:

std::shared_ptr<arrow::Array> result_array = my_datum.make_array();
英文:

I think you want the arrow::Datum::make_array function:

std::shared_ptr&lt;arrow::Array&gt; result_array = my_datum.make_array();

huangapple
  • 本文由 发表于 2023年5月14日 17:30:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76246751.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定