英文:
How can I store an array in a pandas dataframe?
问题
在数据框的单元格中存储字符串数组时,出现了错误ValueError: Must have equal len keys and value when setting with an iterable。
我声明了一个词语数组,如下所示:
words = []
我有一些文本需要提取单词,并将这些单词放入数组中。
for word in text:
words.append(word)
我有一个数据框列的声明如下:
dataframe['Words'] = ''
错误是由以下代码触发的:
dataframe.loc[x, 'Words'] = words
我应该能够在数据框中存储一个数组。
例如,如果
text = 'Today is a sunny day'
我对句子进行标记,每个标记都是一个单词。
我将每个单词放入words
中,
我希望数据框的Words
列如下所示:
'Today', 'is', 'a', 'sunny', 'day'
英文:
I should store in the cell of a dataframe an array of strings, but the error is
ValueError: Must have equal len keys and value when setting with an iterable
I declared an array of words like this:
words = []
I have some text to extract words from and I put the words into the array.
for word in text:
words.append(word)
And I have a dataframe column declared like this:
dataframe['Words'] = ''
The error is triggered by:
dataframe.loc[x, 'Words'] = words
I should be able to store an array in a dataframe.
For example, if
text = 'Today is a sunny day'
I tokenize the sentence, and each token is a word.
I put each word in words
I would like the column Words
of dataframe to be:
'Today', 'is', 'a', 'sunny', 'day'
答案1
得分: 0
你要的内容不是很清楚。无论如何,将列表分配给DataFrame的单元格并不容易,因为DataFrame不是设计来保存可迭代对象的。Pandas会检查赋值的右侧是否具有正确的长度,这与列表等可迭代对象不太兼容。此外,你应该很少需要分配给单个单元格。
你可以使用Series来强制正确的索引:
text = '今天是个晴天'
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=['词语'])
x = 1
dataframe.loc[[x], '词语'] = pd.Series([words], index=[x])
输出:
词语
0 NaN
1 [今天, 是个, 晴天]
2 NaN
注意,键x
必须是预先存在的,如果不是这种情况,一种解决方法是首先分配一个虚拟值(例如NaN),然后再分配列表。
x = 3
dataframe.loc[x, '词语'] = np.nan
dataframe.loc[[x], '词语'] = pd.Series([['A', 'B', 'C']], index=[x])
输出:
词语
0 NaN
1 [今天, 是个, 晴天]
2 NaN
3 [A, B, C]
英文:
What you want is not fully clear. In any case, assigning a list to a DataFrame's cell is not easy as a DataFrame is not designed to hold iterables. Pandas checks that the right-hand side of an assignment has the correct length, which doesn't work well with iterables like lists. In addition, you should rarely have to assign to single cells.
You can use a Series to force correct indexing:
text = 'Today is a sunny day'
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=['Words'])
x = 1
dataframe.loc[[x], 'Words'] = pd.Series([words], index=[x])
Output:
Words
0 NaN
1 [Today, is, a, sunny, day]
2 NaN
Note that the key x
must be pre-existing, if this is not the case, a workaround might be to first assign a dummy value (e.g., NaN), then to assign the list.
x = 3
dataframe.loc[x, 'Words'] = np.nan
dataframe.loc[[x], 'Words'] = pd.Series([['A', 'B', 'C']], index=[x])
Output:
Words
0 NaN
1 [Today, is, a, sunny, day]
2 NaN
3 [A, B, C]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论