2023年5月8日 01:46:28go评论147阅读模式

英文:

Explode function not working on simple python dataframe

问题

您想将DataFrame中的"category_for"列展开，并得到您所示的结果。您可以尝试以下代码来实现这个目标：

import pandas as pd

# 将"category_for"列展开
df = df.explode('category_for')

# 将展开后的数据分割成两列
df[['id', 'name']] = pd.DataFrame(df['category_for'].tolist())

# 删除原始的"category_for"列
df.drop('category_for', axis=1, inplace=True)

这段代码将展开"category_for"列，并将其拆分成"id"和"name"两列，然后删除原始的"category_for"列，从而得到您所期望的结果。

英文:

I am having an issue with the explode function. I have a 2 column dataframe:

pub_id	category_for
pub.1155807502	[{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1153826092	[{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]
pub.1145064359	[{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1145747691	[{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1144315107	[{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]

And I want to "explode" the "category_for" column to obtain something like this:

pub_id	id	name
pub.1155807502	80003	32 Biomedical and Clinical Sciences
pub.1155807502	80045	3202 Clinical Sciences
pub.1153826092	80003	32 Biomedical and Clinical Sciences
pub.1153826092	80232	5202 Biological Psychology
pub.1153826092	80045	3202 Clinical Sciences
pub.1153826092	80052	3209 Neurosciences
pub.1153826092	80023	52 Psychology

I tried

df = df.explode(&#39;category_for&#39;) 
df = pd.concat([df, df.pop(&quot;category_for&quot;).apply(pd.Series)], axis=1)

but nothing happens at the "explode" step.

I also tried:

df.set_index(&#39;pub_id&#39;)[&#39;category_for&#39;].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:&#39;category_for&#39;})

but again without success.

答案1

得分: 1

The list of dicts in the category_for column are probably stored as strings. You can check if that's the case with the following.

df.category_for[0] 的字典列表可能被存储为字符串。您可以使用以下方法检查：

type(df.category_for[0])

You can convert the type of the items by applying the literal_eval function.

您可以通过应用 `literal_eval` 函数来将项目的类型转换为正确的类型。

from ast import literal_eval

# convert the column items from str to list of dicts
# 将列中的项目从字符串转换为字典列表
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))

Finally, you can use explode, and concatenate with the pub_id column.

最后，您可以使用 `explode`，然后与 `pub_id` 列连接。

df = df.explode("category_for", ignore_index=True)

df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)

英文:

The list of dicts in the category_for column are probably stored as strings. You can check if that's the case with the following.

type(df.category_for[0])
&gt;&gt;&gt; str

You can convert the type of the items by applying the literal_eval function.

from ast import literal_eval

# convert the column items from str to list of dicts
df.loc[:, &quot;category_for&quot;] = df.loc[:, &quot;category_for&quot;].apply(lambda x: literal_eval(x))

Finally, you can use explode, and concatenate with the pub_id column.

df = df.explode(&quot;category_for&quot;, ignore_index=True)

df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Explode function not working on simple python dataframe

问题

答案1

无法并行运行的程序

找到所需的矩阵，使其相加得到一个质数矩阵。

如何高效计算多个样本的逐样本梯度？

最大堆在输入一个数组[1到N]时计算所需的总交换次数。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论