Polars – ComputeError: 从NumPy数组转换后无法将类型转换为’Object’类型

huangapple go评论139阅读模式
英文:

Polars - ComputeError: cannot cast 'Object' type after conversion from Numpy Array

问题

我有一个 polars 数据帧,我使用 np.array_split 分割成多个帧。在分割和转换回 polars 数据帧后,所有列的数据类型都变为 'object'。当我尝试使用 cast() 更改数据类型时,我收到以下错误消息:

ComputeError: 无法将 'Object' 类型转换

我做错了什么?/如何修复这个问题?我需要将列的数据类型更改为不同的类型以进行进一步处理。

df = pl.DataFrame({
    'column1': ['2021-01-01', '2021-02-02', '2021-03-03'],
    'column2': ['value1', 'value2', 'value3']
})

df = pl.from_numpy(np.array_split(df, 2)[0], schema=df.columns, orient='row')
df = df.with_columns(pl.col('column1').cast(pl.Utf8))
英文:

I have a polars dataframe which I split into multiple frames using np.array_split. After the split and the conversion back to the polars dataframe all columns have the data type 'object'. When I want to change the data type using cast() I get the following error:<br><br>ComputeError: cannot cast 'Object' type<br><br>What am I doing wrong?/How can I fix this? I need the columns to be different data types for further processing.

df = pl.DataFrame({
    &#39;column1&#39;: [&#39;2021-01-01&#39;, &#39;2021-02-02&#39;, &#39;2021-03-03&#39;],
    &#39;column2&#39;: [&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;]
})

df = pl.from_numpy(np.array_split(df, 2)[0], schema=df.columns, orient=&#39;row&#39;)
df = df.with_columns(pl.col(&#39;column1&#39;).cast(pl.Utf8))

答案1

得分: 2

Pandas 似乎会执行某些操作,最终从 np.array_split() 返回一个 Dataframe:

>>> np.array_split(df.to_pandas(), 2)[0]
      column1 column2
0  2021-01-01  value1
1  2021-02-02  value2
2  2021-03-03  value3

Polars 不会这样做:

>>> np.array_split(df, 2)[0]
array([['2021-01-01', 'value1'],
       ['2021-02-02', 'value2'],
       ['2021-03-03', 'value3']], dtype=object)

你可以使用行数和取模 (%) 来创建分组,而不是使用 np.array_split

df = pl.DataFrame({
    'column1': ['2021-01-01', '2021-02-02', '2021-03-03', '2021-04-04', '2021-05-05'],
    'column2': ['value1', 'value2', 'value3', 'value4', 'value5']
})

(df.with_row_count(offset=1)
   .with_columns(group = (pl.col('row_nr') % 2 != 0).cumsum())
)

根据目标,你可以使用 .groupby().partition_by() 来拆分数据框。

英文:

Pandas appears to do something which ends up returning a Dataframe back from np.array_split()

&gt;&gt;&gt; np.array_split(df.to_pandas(), 2)[0]
      column1 column2
0  2021-01-01  value1
1  2021-02-02  value2
2  2021-03-03  value3

Polars doesn't do this:

&gt;&gt;&gt; np.array_split(df, 2)[0]
array([[&#39;2021-01-01&#39;, &#39;value1&#39;],
       [&#39;2021-02-02&#39;, &#39;value2&#39;],
       [&#39;2021-03-03&#39;, &#39;value3&#39;]], dtype=object)

Instead of np.array_split you could use the row count and modulo (%) to create groups:

df = pl.DataFrame({
    &#39;column1&#39;: [&#39;2021-01-01&#39;, &#39;2021-02-02&#39;, &#39;2021-03-03&#39;, &#39;2021-04-04&#39;, &#39;2021-05-05&#39;],
    &#39;column2&#39;: [&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;, &#39;value4&#39;, &#39;value5&#39;]
})

(df.with_row_count(offset=1)
   .with_columns(group = (pl.col(&#39;row_nr&#39;) % 2 != 0).cumsum())
)
shape: (5, 4)
┌────────┬────────────┬─────────┬───────┐
│ row_nr ┆ column1    ┆ column2 ┆ group │
│ ---    ┆ ---        ┆ ---     ┆ ---   │
│ u32    ┆ str        ┆ str     ┆ u32   │
╞════════╪════════════╪═════════╪═══════╡
│ 1      ┆ 2021-01-01 ┆ value1  ┆ 1     │
│ 2      ┆ 2021-02-02 ┆ value2  ┆ 1     │
│ 3      ┆ 2021-03-03 ┆ value3  ┆ 2     │
│ 4      ┆ 2021-04-04 ┆ value4  ┆ 2     │
│ 5      ┆ 2021-05-05 ┆ value5  ┆ 3     │
└────────┴────────────┴─────────┴───────┘

Depending on the goal, you could then use .groupby() or .partition_by() to split the dataframe.

huangapple
  • 本文由 发表于 2023年7月10日 16:30:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652010.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定