将numpy数组添加到Pandas数据帧单元格中会导致ValueError。

huangapple go评论49阅读模式
英文:

Adding numpy array to Pandas dataframe cell results in ValueError

问题

我想将一个NumPy数组放入Pandas数据框的单元格中。出于特定原因,在将数组分配给单元格之前,我在同一数据框中添加了另一列,其值设置为NaN。

有人能帮助我理解向数据框添加带有NaN的列对我的数据框有何影响,为什么会导致代码出错,以及如何修复它吗?

将数组插入到列中的代码如下:

import pandas as pd
import numpy as np

#%% this works as expected
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df

但是在插入带有NaN的列之后,相同的代码出错,我收到以下错误消息:

ValueError: Must have equal len keys and value when setting with an iterable

#%% after adding a second column, x, filled with nan, the code breaks
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df

最后,我想将数组添加到新列中,但我收到相同的错误。

#%% this is what I want to do, breaks, too
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['x'] = df['x'].astype(object)
df.loc[4, 'x'] = np.array([5, 6, 7, 8])
df

希望这些代码片段对您有所帮助。

英文:

I want to place a numpy array in a cell of a pandas dataframe.
For specific reasons, before assigning the array to the cell, I add another column in the same dataframe, whose values are set to NaN.

Can someone help me understand what adding the column with the nans does to my data frame, why breaks the code, and how I can fix it?

Inserting an array into a column works:

import pandas as pd
import numpy as np

#%% this works as expected
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df

But after inserting the column with nans, the same code breaks and I get the following error:

ValueError: Must have equal len keys and value when setting with an iterable

#%% after adding a second column, x, filled with nan, the code breaks
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df

Finally, I want to add the array to the new column, but I get the same error.

#%% this is what I want to do, breaks, too
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['x'] = df['x'].astype(object)
df.loc[4, 'x'] = np.array([5, 6, 7, 8])
df

答案1

得分: 1

如果您只需设置单个单元格,请使用at

df.at[4, 'a'] = np.array([5, 6, 7, 8])

将numpy数组添加到Pandas数据帧单元格中会导致ValueError。

英文:

If you only need to set a single cell, use at:

df.at[4, 'a'] = np.array([5, 6, 7, 8])

将numpy数组添加到Pandas数据帧单元格中会导致ValueError。

答案2

得分: 0

你需要使用pd.concat

df = pd.concat([df, pd.DataFrame({'a': [np.array([5, 6, 7, 8])]}, index=[4])])
print(df)

# 输出
              a   x
0             0 NaN
1             1 NaN
2             2 NaN
3             3 NaN
4  [5, 6, 7, 8] NaN

除非你为所有行指定了数值:

df.loc[4] = [np.array([5, 6, 7, 8]), 3.14]
print(df)

# 输出
              a     x
0             0   NaN
1             1   NaN
2             2   NaN
3             3   NaN
4  [5, 6, 7, 8]  3.14  # 仅用于演示目的的3.14
英文:

You have to use pd.concat:

df = pd.concat([df, pd.DataFrame({'a': [np.array([5, 6, 7, 8])]}, index=[4])])
print(df)

# Output
              a   x
0             0 NaN
1             1 NaN
2             2 NaN
3             3 NaN
4  [5, 6, 7, 8] NaN

Except if you specify values for all rows:

df.loc[4] = [np.array([5, 6, 7, 8]), 3.14]
print(df)

# Output
              a     x
0             0   NaN
1             1   NaN
2             2   NaN
3             3   NaN
4  [5, 6, 7, 8]  3.14  # 3.14 for demo purpose

huangapple
  • 本文由 发表于 2023年6月1日 18:41:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381054.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定