你可以将一个数组存储在一个 pandas 数据框中吗?

huangapple go评论71阅读模式
英文:

How can I store an array in a pandas dataframe?

问题

在数据框的单元格中存储字符串数组时,出现了错误ValueError: Must have equal len keys and value when setting with an iterable

我声明了一个词语数组,如下所示:
words = []
我有一些文本需要提取单词,并将这些单词放入数组中。

for word in text:
   words.append(word)

我有一个数据框列的声明如下:
dataframe['Words'] = ''

错误是由以下代码触发的:
dataframe.loc[x, 'Words'] = words

我应该能够在数据框中存储一个数组。

例如,如果
text = 'Today is a sunny day'
我对句子进行标记,每个标记都是一个单词。
我将每个单词放入words中,
我希望数据框的Words列如下所示:
'Today', 'is', 'a', 'sunny', 'day'

英文:

I should store in the cell of a dataframe an array of strings, but the error is
ValueError: Must have equal len keys and value when setting with an iterable

I declared an array of words like this:
words = []
I have some text to extract words from and I put the words into the array.
for word in text:
words.append(word)

And I have a dataframe column declared like this:
dataframe['Words'] = ''

The error is triggered by:
dataframe.loc[x, 'Words'] = words

I should be able to store an array in a dataframe.

For example, if
text = 'Today is a sunny day'
I tokenize the sentence, and each token is a word.
I put each word in words
I would like the column Words of dataframe to be:
'Today', 'is', 'a', 'sunny', 'day'

答案1

得分: 0

你要的内容不是很清楚。无论如何,将列表分配给DataFrame的单元格并不容易,因为DataFrame不是设计来保存可迭代对象的。Pandas会检查赋值的右侧是否具有正确的长度,这与列表等可迭代对象不太兼容。此外,你应该很少需要分配给单个单元格。

你可以使用Series来强制正确的索引:

text = '今天是个晴天'
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=['词语'])
x = 1
dataframe.loc[[x], '词语'] = pd.Series([words], index=[x])

输出:

                        词语
0                         NaN
1  [今天, 是个, 晴天]
2                         NaN

注意,键x必须是预先存在的,如果不是这种情况,一种解决方法是首先分配一个虚拟值(例如NaN),然后再分配列表。

x = 3
dataframe.loc[x, '词语'] = np.nan
dataframe.loc[[x], '词语'] = pd.Series([['A', 'B', 'C']], index=[x])

输出:

                        词语
0                         NaN
1  [今天, 是个, 晴天]
2                         NaN
3                   [A, B, C]
英文:

What you want is not fully clear. In any case, assigning a list to a DataFrame's cell is not easy as a DataFrame is not designed to hold iterables. Pandas checks that the right-hand side of an assignment has the correct length, which doesn't work well with iterables like lists. In addition, you should rarely have to assign to single cells.

You can use a Series to force correct indexing:

text = 'Today is a sunny day'
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=['Words'])
x = 1
dataframe.loc[[x], 'Words'] = pd.Series([words], index=[x])

Output:

                        Words
0                         NaN
1  [Today, is, a, sunny, day]
2                         NaN

Note that the key x must be pre-existing, if this is not the case, a workaround might be to first assign a dummy value (e.g., NaN), then to assign the list.

x = 3
dataframe.loc[x, 'Words'] = np.nan
dataframe.loc[[x], 'Words'] = pd.Series([['A', 'B', 'C']], index=[x])

Output:

                        Words
0                         NaN
1  [Today, is, a, sunny, day]
2                         NaN
3                   [A, B, C]

huangapple
  • 本文由 发表于 2023年6月16日 01:54:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484324.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定