英文:
Polars - ComputeError: cannot cast 'Object' type after conversion from Numpy Array
问题
我有一个 polars 数据帧,我使用 np.array_split 分割成多个帧。在分割和转换回 polars 数据帧后,所有列的数据类型都变为 'object'。当我尝试使用 cast() 更改数据类型时,我收到以下错误消息:
ComputeError: 无法将 'Object' 类型转换
我做错了什么?/如何修复这个问题?我需要将列的数据类型更改为不同的类型以进行进一步处理。
df = pl.DataFrame({
'column1': ['2021-01-01', '2021-02-02', '2021-03-03'],
'column2': ['value1', 'value2', 'value3']
})
df = pl.from_numpy(np.array_split(df, 2)[0], schema=df.columns, orient='row')
df = df.with_columns(pl.col('column1').cast(pl.Utf8))
英文:
I have a polars dataframe which I split into multiple frames using np.array_split. After the split and the conversion back to the polars dataframe all columns have the data type 'object'. When I want to change the data type using cast() I get the following error:<br><br>ComputeError: cannot cast 'Object' type<br><br>What am I doing wrong?/How can I fix this? I need the columns to be different data types for further processing.
df = pl.DataFrame({
'column1': ['2021-01-01', '2021-02-02', '2021-03-03'],
'column2': ['value1', 'value2', 'value3']
})
df = pl.from_numpy(np.array_split(df, 2)[0], schema=df.columns, orient='row')
df = df.with_columns(pl.col('column1').cast(pl.Utf8))
答案1
得分: 2
Pandas 似乎会执行某些操作,最终从 np.array_split()
返回一个 Dataframe:
>>> np.array_split(df.to_pandas(), 2)[0]
column1 column2
0 2021-01-01 value1
1 2021-02-02 value2
2 2021-03-03 value3
Polars 不会这样做:
>>> np.array_split(df, 2)[0]
array([['2021-01-01', 'value1'],
['2021-02-02', 'value2'],
['2021-03-03', 'value3']], dtype=object)
你可以使用行数和取模 (%
) 来创建分组,而不是使用 np.array_split
:
df = pl.DataFrame({
'column1': ['2021-01-01', '2021-02-02', '2021-03-03', '2021-04-04', '2021-05-05'],
'column2': ['value1', 'value2', 'value3', 'value4', 'value5']
})
(df.with_row_count(offset=1)
.with_columns(group = (pl.col('row_nr') % 2 != 0).cumsum())
)
根据目标,你可以使用 .groupby()
或 .partition_by()
来拆分数据框。
英文:
Pandas appears to do something which ends up returning a Dataframe back from np.array_split()
>>> np.array_split(df.to_pandas(), 2)[0]
column1 column2
0 2021-01-01 value1
1 2021-02-02 value2
2 2021-03-03 value3
Polars doesn't do this:
>>> np.array_split(df, 2)[0]
array([['2021-01-01', 'value1'],
['2021-02-02', 'value2'],
['2021-03-03', 'value3']], dtype=object)
Instead of np.array_split
you could use the row count and modulo (%
) to create groups:
df = pl.DataFrame({
'column1': ['2021-01-01', '2021-02-02', '2021-03-03', '2021-04-04', '2021-05-05'],
'column2': ['value1', 'value2', 'value3', 'value4', 'value5']
})
(df.with_row_count(offset=1)
.with_columns(group = (pl.col('row_nr') % 2 != 0).cumsum())
)
shape: (5, 4)
┌────────┬────────────┬─────────┬───────┐
│ row_nr ┆ column1 ┆ column2 ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ str ┆ str ┆ u32 │
╞════════╪════════════╪═════════╪═══════╡
│ 1 ┆ 2021-01-01 ┆ value1 ┆ 1 │
│ 2 ┆ 2021-02-02 ┆ value2 ┆ 1 │
│ 3 ┆ 2021-03-03 ┆ value3 ┆ 2 │
│ 4 ┆ 2021-04-04 ┆ value4 ┆ 2 │
│ 5 ┆ 2021-05-05 ┆ value5 ┆ 3 │
└────────┴────────────┴─────────┴───────┘
Depending on the goal, you could then use .groupby()
or .partition_by()
to split the dataframe.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论