PyArrow Tensor 类的用途是什么?

huangapple go评论71阅读模式
英文:

What is the use of PyArrow Tensor class?

问题

在Arrow文档中有一个名为Tensor的类,它是从numpy ndarrays创建的。然而,文档相当简洁,经过一些尝试后,我还没有找到它的用例。例如,您不能使用它构建一个表格:

import pyarrow as pa
import numpy as np

x = np.random.normal(0, 1.5, size=(4, 3, 2))
T = pa.Tensor.from_numpy(x, dim_names="xyz")

# error
pa.table([pa.array([0, 1, 2, 3]), T], names=["f1", "f2"])

同时,也没有用于模式和结构的类型。所以我的问题是:它有什么作用?有人能提供一个简单的示例吗?

这里有一个相关问题,但它是5年多前的问题,与Parquet有关。虽然我有兴趣将这些张量持久化,但在那之前,我应该先了解如何使用它们,而截止到今天,我还不了解。

英文:

In the Arrow documentation there is a class named Tensor that is created from numpy ndarrays. However, the documentation is pretty sparse, and after playing a bit I haven't found an use case for it. For example, you can't construct a table with it:

import pyarrow as pa
import numpy as np

x = np.random.normal(0, 1.5, size=(4, 3, 2))
T = pa.Tensor.from_numpy(x, dim_names="xyz")

# error
pa.table([pa.array([0, 1, 2, 3]), T], names=["f1", "f2"])

Neither there is a type for schemas and structs. So my question is: what is it there for? Can someone provide a simple example using them?

Here's a related question from over 5 years ago, but it asked about Parquet. While I'm interested in persisting these tensors, before that I should understand how to use them, and as of today, I don't.

答案1

得分: 1

AFAIK,pyarrow的Tensor类仅在IPC(序列化)中使用:https://arrow.apache.org/docs/dev/format/Other.html(因此在IPC规范中作为消息)。

要在pyarrow Table中使用张量,您需要使用扩展类型。我们目前正在进行相关工作,您可以在此找到一个综合性问题:

https://github.com/apache/arrow/issues/33924

您还可以查看PyArrow实现示例中的如何使用:
https://github.com/apache/arrow/pull/33948/files

英文:

AFAIK the pyarrow Tensor class is only used in IPC (serializing): https://arrow.apache.org/docs/dev/format/Other.html (so as a message in IPC specification).

To use tensors in pyarrow Table you would have to use an extension type for it. We are currently working on that and here you can find an umbrella issue:

https://github.com/apache/arrow/issues/33924

And you can see how it will be used in the PyArrow implementation example:
https://github.com/apache/arrow/pull/33948/files

huangapple
  • 本文由 发表于 2023年2月16日 19:21:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定