Explode function not working on simple python dataframe

huangapple go评论122阅读模式
英文:

Explode function not working on simple python dataframe

问题

您想将DataFrame中的"category_for"列展开,并得到您所示的结果。您可以尝试以下代码来实现这个目标:

import pandas as pd

# 将"category_for"列展开
df = df.explode('category_for')

# 将展开后的数据分割成两列
df[['id', 'name']] = pd.DataFrame(df['category_for'].tolist())

# 删除原始的"category_for"列
df.drop('category_for', axis=1, inplace=True)

这段代码将展开"category_for"列,并将其拆分成"id"和"name"两列,然后删除原始的"category_for"列,从而得到您所期望的结果。

英文:

I am having an issue with the explode function. I have a 2 column dataframe:

pub_id category_for
pub.1155807502 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1153826092 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]
pub.1145064359 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1145747691 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1144315107 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]

And I want to "explode" the "category_for" column to obtain something like this:

pub_id id name
pub.1155807502 80003 32 Biomedical and Clinical Sciences
pub.1155807502 80045 3202 Clinical Sciences
pub.1153826092 80003 32 Biomedical and Clinical Sciences
pub.1153826092 80232 5202 Biological Psychology
pub.1153826092 80045 3202 Clinical Sciences
pub.1153826092 80052 3209 Neurosciences
pub.1153826092 80023 52 Psychology

I tried

df = df.explode('category_for') 
df = pd.concat([df, df.pop("category_for").apply(pd.Series)], axis=1)

but nothing happens at the "explode" step.

I also tried:

df.set_index('pub_id')['category_for'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'category_for'})

but again without success.

答案1

得分: 1

The list of dicts in the category_for column are probably stored as strings. You can check if that's the case with the following.

df.category_for[0] 的字典列表可能被存储为字符串您可以使用以下方法检查

type(df.category_for[0])

You can convert the type of the items by applying the literal_eval function.

您可以通过应用 `literal_eval` 函数来将项目的类型转换为正确的类型

from ast import literal_eval

# convert the column items from str to list of dicts
# 将列中的项目从字符串转换为字典列表
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))

Finally, you can use explode, and concatenate with the pub_id column.

最后您可以使用 `explode`,然后与 `pub_id` 列连接

df = df.explode("category_for", ignore_index=True)

df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)
英文:

The list of dicts in the category_for column are probably stored as strings. You can check if that's the case with the following.

type(df.category_for[0])
>>> str

You can convert the type of the items by applying the literal_eval function.

from ast import literal_eval

# convert the column items from str to list of dicts
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))

Finally, you can use explode, and concatenate with the pub_id column.

df = df.explode("category_for", ignore_index=True)

df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)

huangapple
  • 本文由 发表于 2023年5月8日 01:46:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195413.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定