英文:
Adding numpy array to Pandas dataframe cell results in ValueError
问题
我想将一个NumPy数组放入Pandas数据框的单元格中。出于特定原因,在将数组分配给单元格之前,我在同一数据框中添加了另一列,其值设置为NaN。
有人能帮助我理解向数据框添加带有NaN的列对我的数据框有何影响,为什么会导致代码出错,以及如何修复它吗?
将数组插入到列中的代码如下:
import pandas as pd
import numpy as np
#%% this works as expected
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df
但是在插入带有NaN的列之后,相同的代码出错,我收到以下错误消息:
ValueError: Must have equal len keys and value when setting with an iterable
#%% after adding a second column, x, filled with nan, the code breaks
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df
最后,我想将数组添加到新列中,但我收到相同的错误。
#%% this is what I want to do, breaks, too
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['x'] = df['x'].astype(object)
df.loc[4, 'x'] = np.array([5, 6, 7, 8])
df
希望这些代码片段对您有所帮助。
英文:
I want to place a numpy array in a cell of a pandas dataframe.
For specific reasons, before assigning the array to the cell, I add another column in the same dataframe, whose values are set to NaN.
Can someone help me understand what adding the column with the nans does to my data frame, why breaks the code, and how I can fix it?
Inserting an array into a column works:
import pandas as pd
import numpy as np
#%% this works as expected
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df
But after inserting the column with nans, the same code breaks and I get the following error:
ValueError: Must have equal len keys and value when setting with an iterable
#%% after adding a second column, x, filled with nan, the code breaks
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['a'] = df['a'].astype(object)
df.loc[4, 'a'] = np.array([5, 6, 7, 8])
df
Finally, I want to add the array to the new column, but I get the same error.
#%% this is what I want to do, breaks, too
df = pd.DataFrame([0, 1, 2, 3, 4], columns=['a'])
df['x'] = np.nan
df['x'] = df['x'].astype(object)
df.loc[4, 'x'] = np.array([5, 6, 7, 8])
df
答案1
得分: 1
如果您只需设置单个单元格,请使用at
:
df.at[4, 'a'] = np.array([5, 6, 7, 8])
答案2
得分: 0
你需要使用pd.concat
:
df = pd.concat([df, pd.DataFrame({'a': [np.array([5, 6, 7, 8])]}, index=[4])])
print(df)
# 输出
a x
0 0 NaN
1 1 NaN
2 2 NaN
3 3 NaN
4 [5, 6, 7, 8] NaN
除非你为所有行指定了数值:
df.loc[4] = [np.array([5, 6, 7, 8]), 3.14]
print(df)
# 输出
a x
0 0 NaN
1 1 NaN
2 2 NaN
3 3 NaN
4 [5, 6, 7, 8] 3.14 # 仅用于演示目的的3.14
英文:
You have to use pd.concat
:
df = pd.concat([df, pd.DataFrame({'a': [np.array([5, 6, 7, 8])]}, index=[4])])
print(df)
# Output
a x
0 0 NaN
1 1 NaN
2 2 NaN
3 3 NaN
4 [5, 6, 7, 8] NaN
Except if you specify values for all rows:
df.loc[4] = [np.array([5, 6, 7, 8]), 3.14]
print(df)
# Output
a x
0 0 NaN
1 1 NaN
2 2 NaN
3 3 NaN
4 [5, 6, 7, 8] 3.14 # 3.14 for demo purpose
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论