英文:
Python converting a list in a nested dictionary into a Pandas Dataframe
问题
我有以下的字典
test = {'data': [
{'actions': [
{'action_type': 'link_click', 'value': '16'},
{'action_type': 'post_engagement', 'value': '16'},
{'action_type': 'page_engagement', 'value': '16'}],
'spend': '13.59',
'date_start': '2023-02-07',
'date_stop': '2023-02-07'},
{'actions': [
{'action_type': 'comment', 'value': '5'},
{'action_type': 'onsite_conversion.post_save', 'value': '1'},
{'action_type': 'link_click', 'value': '465'},
{'action_type': 'post', 'value': '1'},
{'action_type': 'post_reaction', 'value': '20'},
{'action_type': 'video_view', 'value': '4462'},
{'action_type': 'post_engagement', 'value': '4954'},
{'action_type': 'page_engagement', 'value': '4954'}],
'spend': '214.71',
'date_start': '2023-02-07',
'date_stop': '2023-02-07'}]}
我尝试将它转换为每个 action_type 后面的元素作为 pandas DataFrame 列,值作为行,类似如下:
link_click post_engagement page_engagement spend comment onsite_conversion ...
16 16 16 13.59 N/A N/A
465 4954 4954 214.71 5 1
我明白第一个列表没有 comment、post 等,行会是 N/A。如何管理这种复杂的数据结构?
英文:
I have the following dictionary
test = {'data': [
{'actions': [
{'action_type': 'link_click', 'value': '16'},
{'action_type': 'post_engagement', 'value': '16'},
{'action_type': 'page_engagement', 'value': '16'}],
'spend': '13.59',
'date_start': '2023-02-07',
'date_stop': '2023-02-07'},
{'actions': [
{'action_type': 'comment', 'value': '5'},
{'action_type': 'onsite_conversion.post_save', 'value': '1'},
{'action_type': 'link_click', 'value': '465'},
{'action_type': 'post', 'value': '1'},
{'action_type': 'post_reaction', 'value': '20'},
{'action_type': 'video_view', 'value': '4462'},
{'action_type': 'post_engagement', 'value': '4954'},
{'action_type': 'page_engagement', 'value': '4954'}],
'spend': '214.71',
'date_start': '2023-02-07',
'date_stop': '2023-02-07'}]}
And I am trying to convert it where each element after action type is a pandas DataFrame column, and the value as row. Something like
link_click post_engagement page_engagement spend comment onsite_conversion ...
16 16 16 13.59 N/A N/A
465 4954 4954 214.71 5 1
I understand that the first list does not have comment, post, etc, and the rows would be N/A. How do I manage this complicated data structure?
答案1
得分: 1
你可以使用类似以下的函数:
# 导入 pandas 库
import pandas as pd
def tabulate_actions(actionsList:list, returnDf=False):
aTbl = [{
a['action_type']: a['value'] for a in al['actions']
# 如果你对数据结构不确定,取消注释下面的条件语句
# if isinstance(a, dict) and all([k in a for k in ['action_type', 'value']])
} for al in actionsList
# 如果你对数据结构不确定,取消注释下面的条件语句
# if isinstance(al, dict) and isinstance(al.get('actions'), list)
]
return pd.DataFrame(aTbl) if returnDf else aTbl
## 如果你对数据结构不确定,取消注释下面的条件语句
## uncomment the conditions if you're unsure of your data structure
tabulate_actions(test['data'])
应该返回以下的字典列表:
[{'link_click': '16',
'post_engagement': '16',
'page_engagement': '16'},
{'comment': '5',
'onsite_conversion.post_save': '1',
'link_click': '465',
'post': '1',
'post_reaction': '20',
'video_view': '4462',
'post_engagement': '4954',
'page_engagement': '4954'}]
传入 returnDf=True
应该返回一个 DataFrame:
英文:
You could use something like this function:
# import pandas as pd
def tabulate_actions(actionsList:list, returnDf=False):
aTbl = [{
a['action_type']: a['value'] for a in al['actions']
# if isinstance(a, dict) and all([k in a for k in ['action_type', 'value']])
} for al in actionsList
# if isinstance(al, dict) and isinstance(al.get('actions'), list)
]
return pd.DataFrame(aTbl) if returnDf else aTbl
## uncomment the conditions if you're unsure of your data structure
tabulate_actions(test['data'])
should return this list of dictionaries:
> python
> [{'link_click': '16',
> 'post_engagement': '16',
> 'page_engagement': '16'},
> {'comment': '5',
> 'onsite_conversion.post_save': '1',
> 'link_click': '465',
> 'post': '1',
> 'post_reaction': '20',
> 'video_view': '4462',
> 'post_engagement': '4954',
> 'page_engagement': '4954'}]
>
and passing returnDf=True
should make it return a DataFrame:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论