英文:
Explode function not working on simple python dataframe
问题
您想将DataFrame中的"category_for"列展开,并得到您所示的结果。您可以尝试以下代码来实现这个目标:
import pandas as pd
# 将"category_for"列展开
df = df.explode('category_for')
# 将展开后的数据分割成两列
df[['id', 'name']] = pd.DataFrame(df['category_for'].tolist())
# 删除原始的"category_for"列
df.drop('category_for', axis=1, inplace=True)
这段代码将展开"category_for"列,并将其拆分成"id"和"name"两列,然后删除原始的"category_for"列,从而得到您所期望的结果。
英文:
I am having an issue with the explode function. I have a 2 column dataframe:
pub_id | category_for |
---|---|
pub.1155807502 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1153826092 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}] |
pub.1145064359 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1145747691 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1144315107 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}] |
And I want to "explode" the "category_for" column to obtain something like this:
pub_id | id | name |
---|---|---|
pub.1155807502 | 80003 | 32 Biomedical and Clinical Sciences |
pub.1155807502 | 80045 | 3202 Clinical Sciences |
pub.1153826092 | 80003 | 32 Biomedical and Clinical Sciences |
pub.1153826092 | 80232 | 5202 Biological Psychology |
pub.1153826092 | 80045 | 3202 Clinical Sciences |
pub.1153826092 | 80052 | 3209 Neurosciences |
pub.1153826092 | 80023 | 52 Psychology |
I tried
df = df.explode('category_for')
df = pd.concat([df, df.pop("category_for").apply(pd.Series)], axis=1)
but nothing happens at the "explode" step.
I also tried:
df.set_index('pub_id')['category_for'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'category_for'})
but again without success.
答案1
得分: 1
The list of dicts in the category_for
column are probably stored as strings. You can check if that's the case with the following.
df.category_for[0] 的字典列表可能被存储为字符串。您可以使用以下方法检查:
type(df.category_for[0])
You can convert the type of the items by applying the literal_eval
function.
您可以通过应用 `literal_eval` 函数来将项目的类型转换为正确的类型。
from ast import literal_eval
# convert the column items from str to list of dicts
# 将列中的项目从字符串转换为字典列表
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))
Finally, you can use explode
, and concatenate with the pub_id
column.
最后,您可以使用 `explode`,然后与 `pub_id` 列连接。
df = df.explode("category_for", ignore_index=True)
df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)
英文:
The list of dicts in the category_for
column are probably stored as strings. You can check if that's the case with the following.
type(df.category_for[0])
>>> str
You can convert the type of the items by applying the literal_eval
function.
from ast import literal_eval
# convert the column items from str to list of dicts
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))
Finally, you can use explode
, and concatenate with the pub_id
column.
df = df.explode("category_for", ignore_index=True)
df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论