Python将嵌套字典中的列表转换为Pandas数据框。

huangapple go评论103阅读模式
英文:

Python converting a list in a nested dictionary into a Pandas Dataframe

问题

我有以下的字典

  1. test = {'data': [
  2. {'actions': [
  3. {'action_type': 'link_click', 'value': '16'},
  4. {'action_type': 'post_engagement', 'value': '16'},
  5. {'action_type': 'page_engagement', 'value': '16'}],
  6. 'spend': '13.59',
  7. 'date_start': '2023-02-07',
  8. 'date_stop': '2023-02-07'},
  9. {'actions': [
  10. {'action_type': 'comment', 'value': '5'},
  11. {'action_type': 'onsite_conversion.post_save', 'value': '1'},
  12. {'action_type': 'link_click', 'value': '465'},
  13. {'action_type': 'post', 'value': '1'},
  14. {'action_type': 'post_reaction', 'value': '20'},
  15. {'action_type': 'video_view', 'value': '4462'},
  16. {'action_type': 'post_engagement', 'value': '4954'},
  17. {'action_type': 'page_engagement', 'value': '4954'}],
  18. 'spend': '214.71',
  19. 'date_start': '2023-02-07',
  20. 'date_stop': '2023-02-07'}]}

我尝试将它转换为每个 action_type 后面的元素作为 pandas DataFrame 列,值作为行,类似如下:

  1. link_click post_engagement page_engagement spend comment onsite_conversion ...
  2. 16 16 16 13.59 N/A N/A
  3. 465 4954 4954 214.71 5 1

我明白第一个列表没有 comment、post 等,行会是 N/A。如何管理这种复杂的数据结构?

英文:

I have the following dictionary

  1. test = {'data': [
  2. {'actions': [
  3. {'action_type': 'link_click', 'value': '16'},
  4. {'action_type': 'post_engagement', 'value': '16'},
  5. {'action_type': 'page_engagement', 'value': '16'}],
  6. 'spend': '13.59',
  7. 'date_start': '2023-02-07',
  8. 'date_stop': '2023-02-07'},
  9. {'actions': [
  10. {'action_type': 'comment', 'value': '5'},
  11. {'action_type': 'onsite_conversion.post_save', 'value': '1'},
  12. {'action_type': 'link_click', 'value': '465'},
  13. {'action_type': 'post', 'value': '1'},
  14. {'action_type': 'post_reaction', 'value': '20'},
  15. {'action_type': 'video_view', 'value': '4462'},
  16. {'action_type': 'post_engagement', 'value': '4954'},
  17. {'action_type': 'page_engagement', 'value': '4954'}],
  18. 'spend': '214.71',
  19. 'date_start': '2023-02-07',
  20. 'date_stop': '2023-02-07'}]}

And I am trying to convert it where each element after action type is a pandas DataFrame column, and the value as row. Something like

  1. link_click post_engagement page_engagement spend comment onsite_conversion ...
  2. 16 16 16 13.59 N/A N/A
  3. 465 4954 4954 214.71 5 1

I understand that the first list does not have comment, post, etc, and the rows would be N/A. How do I manage this complicated data structure?

答案1

得分: 1

你可以使用类似以下的函数:

  1. # 导入 pandas 库
  2. import pandas as pd
  3. def tabulate_actions(actionsList:list, returnDf=False):
  4. aTbl = [{
  5. a['action_type']: a['value'] for a in al['actions']
  6. # 如果你对数据结构不确定,取消注释下面的条件语句
  7. # if isinstance(a, dict) and all([k in a for k in ['action_type', 'value']])
  8. } for al in actionsList
  9. # 如果你对数据结构不确定,取消注释下面的条件语句
  10. # if isinstance(al, dict) and isinstance(al.get('actions'), list)
  11. ]
  12. return pd.DataFrame(aTbl) if returnDf else aTbl
  13. ## 如果你对数据结构不确定,取消注释下面的条件语句
  14. ## uncomment the conditions if you're unsure of your data structure

tabulate_actions(test['data']) 应该返回以下的字典列表:

  1. [{'link_click': '16',
  2. 'post_engagement': '16',
  3. 'page_engagement': '16'},
  4. {'comment': '5',
  5. 'onsite_conversion.post_save': '1',
  6. 'link_click': '465',
  7. 'post': '1',
  8. 'post_reaction': '20',
  9. 'video_view': '4462',
  10. 'post_engagement': '4954',
  11. 'page_engagement': '4954'}]

传入 returnDf=True 应该返回一个 DataFrame:

Python将嵌套字典中的列表转换为Pandas数据框。

英文:

You could use something like this function:

  1. # import pandas as pd
  2. def tabulate_actions(actionsList:list, returnDf=False):
  3. aTbl = [{
  4. a['action_type']: a['value'] for a in al['actions']
  5. # if isinstance(a, dict) and all([k in a for k in ['action_type', 'value']])
  6. } for al in actionsList
  7. # if isinstance(al, dict) and isinstance(al.get('actions'), list)
  8. ]
  9. return pd.DataFrame(aTbl) if returnDf else aTbl
  10. ## uncomment the conditions if you're unsure of your data structure

tabulate_actions(test['data']) should return this list of dictionaries:

> python
> [{'link_click': '16',
> 'post_engagement': '16',
> 'page_engagement': '16'},
> {'comment': '5',
> 'onsite_conversion.post_save': '1',
> 'link_click': '465',
> 'post': '1',
> 'post_reaction': '20',
> 'video_view': '4462',
> 'post_engagement': '4954',
> 'page_engagement': '4954'}]
>

and passing returnDf=True should make it return a DataFrame:

Python将嵌套字典中的列表转换为Pandas数据框。

huangapple
  • 本文由 发表于 2023年2月10日 04:39:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404195.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定