2023年6月16日 01:54:41go评论97阅读模式

英文:

How can I store an array in a pandas dataframe?

问题

在数据框的单元格中存储字符串数组时，出现了错误ValueError: Must have equal len keys and value when setting with an iterable。

我声明了一个词语数组，如下所示：
words = []
我有一些文本需要提取单词，并将这些单词放入数组中。

for word in text:
   words.append(word)

我有一个数据框列的声明如下：
dataframe['Words'] = ''

错误是由以下代码触发的：
dataframe.loc[x, 'Words'] = words

我应该能够在数据框中存储一个数组。

例如，如果
text = 'Today is a sunny day'
我对句子进行标记，每个标记都是一个单词。
我将每个单词放入words中，
我希望数据框的Words列如下所示：
'Today', 'is', 'a', 'sunny', 'day'

英文:

I should store in the cell of a dataframe an array of strings, but the error is
ValueError: Must have equal len keys and value when setting with an iterable

I declared an array of words like this:
words = []
I have some text to extract words from and I put the words into the array.
for word in text: words.append(word)

And I have a dataframe column declared like this:
dataframe['Words'] = ''

The error is triggered by:
dataframe.loc[x, 'Words'] = words

I should be able to store an array in a dataframe.

For example, if
text = 'Today is a sunny day'
I tokenize the sentence, and each token is a word.
I put each word in words
I would like the column Words of dataframe to be:
'Today', 'is', 'a', 'sunny', 'day'

答案1

得分: 0

你要的内容不是很清楚。无论如何，将列表分配给DataFrame的单元格并不容易，因为DataFrame不是设计来保存可迭代对象的。Pandas会检查赋值的右侧是否具有正确的长度，这与列表等可迭代对象不太兼容。此外，你应该很少需要分配给单个单元格。

你可以使用Series来强制正确的索引：

text = '今天是个晴天'
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=['词语'])
x = 1
dataframe.loc[[x], '词语'] = pd.Series([words], index=[x])

输出：

                        词语
0                         NaN
1  [今天, 是个, 晴天]
2                         NaN

注意，键x必须是预先存在的，如果不是这种情况，一种解决方法是首先分配一个虚拟值（例如NaN），然后再分配列表。

x = 3
dataframe.loc[x, '词语'] = np.nan
dataframe.loc[[x], '词语'] = pd.Series([['A', 'B', 'C']], index=[x])

输出：

                        词语
0                         NaN
1  [今天, 是个, 晴天]
2                         NaN
3                   [A, B, C]

英文:

What you want is not fully clear. In any case, assigning a list to a DataFrame's cell is not easy as a DataFrame is not designed to hold iterables. Pandas checks that the right-hand side of an assignment has the correct length, which doesn't work well with iterables like lists. In addition, you should rarely have to assign to single cells.

You can use a Series to force correct indexing:

text = &#39;Today is a sunny day&#39;
words = text.split()
dataframe = pd.DataFrame(index=range(3), columns=[&#39;Words&#39;])
x = 1
dataframe.loc[[x], &#39;Words&#39;] = pd.Series([words], index=[x])

Output:

                        Words
0                         NaN
1  [Today, is, a, sunny, day]
2                         NaN

Note that the key x must be pre-existing, if this is not the case, a workaround might be to first assign a dummy value (e.g., NaN), then to assign the list.

x = 3
dataframe.loc[x, &#39;Words&#39;] = np.nan
dataframe.loc[[x], &#39;Words&#39;] = pd.Series([[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;]], index=[x])

Output:

                        Words
0                         NaN
1  [Today, is, a, sunny, day]
2                         NaN
3                   [A, B, C]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以将一个数组存储在一个 pandas 数据框中吗？

问题

答案1

将数据网格化到边界的边缘

Polars arr.to_struct() throws "pyo3_runtime.PanicException: not implemented for dtype Unknown" exception

Multiple Django Projects on Apache Server with mod_wsgi 正在尝试加载错误的项目库？

根据条件和按照ID分组，在pandas中计算与下一行的时间差。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。