将API拉取的JSON输出转换为pandas数据框?

huangapple go评论248阅读模式
英文:

Converting JSON output from API pull to pandas Dataframe?

问题

我可以帮你将asset_list列转化为一个单独的数据框(DataFrame),其中包含asset_IDasset_classbegin_date_utcbegin_date_mptmetered_volume作为列名。以下是相应的代码:

  1. import pandas as pd
  2. # 假设你已经有一个名为df的数据框,其中包含了JSON数据
  3. # 首先,将asset_list列展开成一个新的数据框
  4. asset_df = pd.concat([pd.json_normalize(x) for x in df['return']], ignore_index=True)
  5. # 选取所需的列
  6. asset_df = asset_df[['asset_ID', 'asset_class', 'metered_volume_list']]
  7. # 将metered_volume_list列展开
  8. metered_volume_df = pd.concat([pd.json_normalize(x) for x in asset_df['metered_volume_list']], ignore_index=True)
  9. # 选取所需的列
  10. metered_volume_df = metered_volume_df[['begin_date_utc', 'begin_date_mpt', 'metered_volume']]
  11. # 将两个数据框合并
  12. result_df = pd.concat([asset_df[['asset_ID', 'asset_class']], metered_volume_df], axis=1)
  13. # 打印结果数据框
  14. print(result_df)

这段代码将会把asset_list列中的数据展开成一个新的数据框,然后从中选择所需的列,并将metered_volume_list列也展开并选择所需的列,最后将两个数据框合并为一个包含asset_IDasset_classbegin_date_utcbegin_date_mptmetered_volume列的数据框。

英文:

I am using an API pull to extract data from the AESO API in python. My code is as follows:

  1. API_KEY = 'api_key_here'
  2. merit_order_url = 'https://api.aeso.ca/report/v1/meteredvolume/details?startDate=2022-01-01'
  3. url = merit_order_url
  4. headers = {'accept': 'application/json', 'X-API-Key': API_KEY}
  5. response = requests.get(url, headers=headers)

The JSON response looks something like this:

  1. {'timestamp': '2023-08-10 14:07:24.976+0000',
  2. 'responseCode': '200',
  3. 'return': [{'pool_participant_ID': '9496',
  4. 'asset_list': [{'asset_ID': '941A',
  5. 'asset_class': 'RETAILER',
  6. 'metered_volume_list': [{'begin_date_utc': '2022-01-01 07:00',
  7. 'begin_date_mpt': '2022-01-01 00:00',
  8. 'metered_volume': '0.0005865'},
  9. {'begin_date_utc': '2022-01-01 08:00',
  10. 'begin_date_mpt': '2022-01-01 01:00',
  11. 'metered_volume': '0.0005363'},
  12. {'begin_date_utc': '2022-01-01 09:00',
  13. 'begin_date_mpt': '2022-01-01 02:00',
  14. 'metered_volume': '0.0005209'},
  15. {'begin_date_utc': '2022-01-01 10:00',
  16. 'begin_date_mpt': '2022-01-01 03:00',
  17. 'metered_volume': '0.0005171'},
  18. {'begin_date_utc': '2022-01-01 11:00',
  19. 'begin_date_mpt': '2022-01-01 04:00',
  20. 'metered_volume': '0.0005152'},
  21. {'begin_date_utc': '2022-01-01 12:00',
  22. 'begin_date_mpt': '2022-01-01 05:00',
  23. 'metered_volume': '0.0005104'},
  24. {'begin_date_utc': '2022-01-01 13:00',
  25. 'begin_date_mpt': '2022-01-01 06:00',
  26. 'metered_volume': '0.0005164'},
  27. {'begin_date_utc': '2022-01-01 14:00',
  28. 'begin_date_mpt': '2022-01-01 07:00',
  29. 'metered_volume': '0.0005426'},
  30. {'begin_date_utc': '2022-01-01 15:00',
  31. 'begin_date_mpt': '2022-01-01 08:00',
  32. 'metered_volume': '0.0005907'},
  33. {'begin_date_utc': '2022-01-01 16:00',
  34. 'begin_date_mpt': '2022-01-01 09:00',
  35. 'metered_volume': '0.0006283'},
  36. {'begin_date_utc': '2022-01-01 17:00',
  37. 'begin_date_mpt': '2022-01-01 10:00',
  38. 'metered_volume': '0.0006528'},
  39. {'begin_date_utc': '2022-01-01 18:00',
  40. 'begin_date_mpt': '2022-01-01 11:00',
  41. 'metered_volume': '0.0007141'},
  42. {'begin_date_utc': '2022-01-01 19:00',
  43. 'begin_date_mpt': '2022-01-01 12:00',
  44. 'metered_volume': '0.0007192'},
  45. {'begin_date_utc': '2022-01-01 20:00',
  46. 'begin_date_mpt': '2022-01-01 13:00',
  47. 'metered_volume': '0.0007495'},
  48. {'begin_date_utc': '2022-01-01 21:00',
  49. 'begin_date_mpt': '2022-01-01 14:00',
  50. 'metered_volume': '0.0006842'},
  51. {'begin_date_utc': '2022-01-01 22:00',
  52. 'begin_date_mpt': '2022-01-01 15:00',
  53. 'metered_volume': '0.0006804'},
  54. {'begin_date_utc': '2022-01-01 23:00',
  55. 'begin_date_mpt': '2022-01-01 16:00',
  56. 'metered_volume': '0.0007282'},
  57. {'begin_date_utc': '2022-01-02 00:00',
  58. 'begin_date_mpt': '2022-01-01 17:00',
  59. 'metered_volume': '0.0008322'},
  60. {'begin_date_utc': '2022-01-02 01:00',
  61. 'begin_date_mpt': '2022-01-01 18:00',
  62. 'metered_volume': '0.0008516'},
  63. {'begin_date_utc': '2022-01-02 02:00',
  64. 'begin_date_mpt': '2022-01-01 19:00',
  65. 'metered_volume': '0.0007729'},
  66. {'begin_date_utc': '2022-01-02 03:00',
  67. 'begin_date_mpt': '2022-01-01 20:00',
  68. 'metered_volume': '0.0006861'},
  69. {'begin_date_utc': '2022-01-02 04:00',
  70. 'begin_date_mpt': '2022-01-01 21:00',
  71. 'metered_volume': '0.0006861'},
  72. {'begin_date_utc': '2022-01-02 05:00',
  73. 'begin_date_mpt': '2022-01-01 22:00',
  74. 'metered_volume': '0.0006434'},
  75. {'begin_date_utc': '2022-01-02 06:00',
  76. 'begin_date_mpt': '2022-01-01 23:00',
  77. 'metered_volume': '0.0005783'}]},
  78. {'asset_ID': '941C',
  79. 'asset_class': 'RETAILER',
  80. 'metered_volume_list': [{'begin_date_utc': '2022-01-01 07:00',
  81. 'begin_date_mpt': '2022-01-01 00:00',
  82. 'metered_volume': '0'},
  83. {'begin_date_utc': '2022-01-01 08:00',
  84. 'begin_date_mpt': '2022-01-01 01:00',
  85. 'metered_volume': '0'},
  86. {'begin_date_utc': '2022-01-01 09:00',
  87. 'begin_date_mpt': '2022-01-01 02:00',
  88. 'metered_volume': '0'},
  89. {'begin_date_utc': '2022-01-01 10:00',
  90. 'begin_date_mpt': '2022-01-01 03:00',
  91. 'metered_volume': '0'},
  92. {'begin_date_utc': '2022-01-01 11:00',
  93. 'begin_date_mpt': '2022-01-01 04:00',
  94. 'metered_volume': '0'},
  95. {'begin_date_utc': '2022-01-01 12:00',
  96. 'begin_date_mpt': '2022-01-01 05:00',
  97. 'metered_volume': '0'},
  98. {'begin_date_utc': '2022-01-01 13:00',
  99. 'begin_date_mpt': '2022-01-01 06:00',
  100. 'metered_volume': '0'},
  101. {'begin_date_utc': '2022-01-01 14:00',
  102. 'begin_date_mpt': '2022-01-01 07:00',
  103. 'metered_volume': '0'},
  104. {'begin_date_utc': '2022-01-01 15:00',
  105. 'begin_date_mpt': '2022-01-01 08:00',
  106. 'metered_volume': '0'},
  107. {'begin_date_utc': '2022-01-01 16:00',
  108. 'begin_date_mpt': '2022-01-01 09:00',
  109. 'metered_volume': '0'},
  110. {'begin_date_utc': '2022-01-01 17:00',
  111. 'begin_date_mpt': '2022-01-01 10:00',
  112. 'metered_volume': '0'},
  113. {'begin_date_utc': '2022-01-01 18:00',
  114. 'begin_date_mpt': '2022-01-01 11:00',
  115. 'metered_volume': '0'},
  116. {'begin_date_utc': '2022-01-01 19:00',
  117. 'begin_date_mpt': '2022-01-01 12:00',
  118. 'metered_volume': '0'},
  119. {'begin_date_utc': '2022-01-01 20:00',
  120. 'begin_date_mpt': '2022-01-01 13:00',
  121. 'metered_volume': '0'},
  122. {'begin_date_utc': '2022-01-01 21:00',
  123. 'begin_date_mpt': '2022-01-01 14:00',
  124. 'metered_volume': '0'},
  125. {'begin_date_utc': '2022-01-01 22:00',
  126. 'begin_date_mpt': '2022-01-01 15:00',
  127. 'metered_volume': '0'},
  128. {'begin_date_utc': '2022-01-01 23:00',
  129. 'begin_date_mpt': '2022-01-01 16:00',
  130. 'metered_volume': '0'},
  131. {'begin_date_utc': '2022-01-02 00:00',
  132. 'begin_date_mpt': '2022-01-01 17:00',
  133. 'metered_volume': '0'},
  134. {'begin_date_utc': '2022-01-02 01:00',
  135. 'begin_date_mpt': '2022-01-01 18:00',
  136. 'metered_volume': '0'},
  137. {'begin_date_utc': '2022-01-02 02:00',
  138. 'begin_date_mpt': '2022-01-01 19:00',
  139. 'metered_volume': '0'},
  140. {'begin_date_utc': '2022-01-02 03:00',
  141. 'begin_date_mpt': '2022-01-01 20:00',
  142. 'metered_volume': '0'},
  143. {'begin_date_utc': '2022-01-02 04:00',
  144. 'begin_date_mpt': '2022-01-01 21:00',
  145. 'metered_volume': '0'},
  146. {'begin_date_utc': '2022-01-02 05:00',
  147. 'begin_date_mpt': '2022-01-01 22:00',
  148. 'metered_volume': '0'},
  149. {'begin_date_utc': '2022-01-02 06:00',
  150. 'begin_date_mpt': '2022-01-01 23:00',
  151. 'metered_volume': '0'}]},

When I use the following code:

  1. df1 = pd.json_normalize(df['return'])

The dataset looks like the following:

将API拉取的JSON输出转换为pandas数据框?

I would like to convert the asset_list column into its own dataframe. Where asset_ID, asset_class, begin_date_utc, begin_date_mpt and metered_volume` are column. How would I go about this?

答案1

得分: 3

使用 json_normalize(),您需要映射 meta 和 record_path 中的级别:

代码:

  1. df = pd.json_normalize(
  2. data=data,
  3. meta=[
  4. ["return", "pool_participant_ID"],
  5. ["return", "asset_list", "asset_ID"],
  6. ["return", "asset_list", "asset_class"],
  7. ],
  8. record_path=["return", "asset_list", "metered_volume_list"]
  9. ).rename(columns=lambda x: x.split(".")[-1])
  10. print(df)

输出:

  1. begin_date_utc begin_date_mpt metered_volume pool_participant_ID asset_ID asset_class
  2. 0 2022-01-01 07:00 2022-01-01 00:00 0.0005865 9496 941A RETAILER
  3. 1 2022-01-01 08:00 2022-01-01 01:00 0.0005363 9496 941A RETAILER
  4. 2 2022-01-01 09:00 2022-01-01 02:00 0.0005209 9496 941A RETAILER
  5. 3 2022-01-01 10:00 2022-01-01 03:00 0.0005171 9496 941A RETAILER
  6. 4 2022-01-01 11:00 2022-01-01 04:00 0.0005152 9496 941A RETAILER
  7. 5 2022-01-01 12:00 2022-01-01 05:00 0.0005104 9496 941A RETAILER
  8. 6 2022-01-01 13:00 2022-01-01 06:00 0.0005164 9496 941A RETAILER
  9. 7 2022-01-01 14:00 2022-01-01 07:00 0.0005426 9496 941A RETAILER
  10. 8 2022-01-01 15:00 2022-01-01 08:00 0.0005907 9496 941A RETAILER
  11. 9 2022-01-01 16:00 2022-01-01 09:00 0.0006283 9496 941A RETAILER
  12. 10 2022-01-01 17:00 2022-01-01 10:00 0.0006528 9496 941A RETAILER
  13. 11 2022-01-01 18:00 2022-01-01 11:00 0.0007141 9496 941A RETAILER
  14. 12 2022-01-01 19:00 2022-01-01 12:00 0.0007192 9496 941A RETAILER
  15. 13 2022-01-01 20:00 2022-01-01 13:00 0.0007495 9496 941A RETAILER
  16. 14 2022-01-01 21:00 2022-01-01 14:00 0.0006842 9496 941A RETAILER
  17. 15 2022-01-01 22:00 2022-01-01 15:00 0.0006804 9496 941A RETAILER
  18. 16 2022-01-01 23:00 2022-01-01 16:00 0.0007282 9496 941A RETAILER
  19. 17 2022-01-02 00:00 2022-01-01 17:00 0.0008322 9496 941A RETAILER
  20. 18 2022-01-02 01:00 2022-01-01 18:00 0.0008516 9496 941A RETAILER
  21. 19 2022-01-02 02:00 2022-01-01 19:00 0.0007729 9496 941A RETAILER
  22. 20 2022-01-02 03:00 2022-01-01 20:00 0.0006861 9496 941A RETAILER
  23. 21 2022-01-02 04:00 2022-01-01 21:00 0.0006861 9496 941A RETAILER
  24. 22 2022-01-02 05:00 2022-01-01 22:00 0.0006434 9496 941A RETAILER
  25. 23 2022-01-02 06:00 2022-01-01 23:00 0.0005783 9496 941A RETAILER
  26. 24 2022-01-01 07:00 2022-01-01 00:00 0 9496 941C RETAILER
  27. 25 2022-01-01 08:00 2022-01-01 01:00 0 9496 941C RETAILER
  28. 26 2022-01-01 09:00 2022-01-01 02:00 0 9496 941C RETAILER
  29. 27 2022-01-01 10:00 2022-01-01 03:00 0 9496 941C RETAILER
  30. 28 2022-01-01 11:00 2022-01-01 04:00 0 9496 941C RETAILER
  31. 29 2022-01-01 12:00 2022-01-01 05:00 0 9496 941C RETAILER
  32. 30 2022-01-01 13:00 2022-01-01 06:00 0 9496 941C RETAILER
  33. 31 2022-01-01 14:00 2022-01-01 07:00 0 9496
  34. <details>
  35. <summary>英文:</summary>
  36. Using [json_normalize()][1] You need to map the levels in meta and record_path:
  37. Code:
  38. df = pd.json_normalize(
  39. data=data,
  40. meta=[
  41. [&quot;return&quot;, &quot;pool_participant_ID&quot;],
  42. [&quot;return&quot;, &quot;asset_list&quot;, &quot;asset_ID&quot;],
  43. [&quot;return&quot;, &quot;asset_list&quot;, &quot;asset_class&quot;],
  44. ],
  45. record_path=[&quot;return&quot;, &quot;asset_list&quot;, &quot;metered_volume_list&quot;]
  46. ).rename(columns=lambda x: x.split(&quot;.&quot;)[-1])
  47. print(df)
  48. Output:
  49. begin_date_utc begin_date_mpt metered_volume pool_participant_ID asset_ID asset_class
  50. 0 2022-01-01 07:00 2022-01-01 00:00 0.0005865 9496 941A RETAILER
  51. 1 2022-01-01 08:00 2022-01-01 01:00 0.0005363 9496 941A RETAILER
  52. 2 2022-01-01 09:00 2022-01-01 02:00 0.0005209 9496 941A RETAILER
  53. 3 2022-01-01 10:00 2022-01-01 03:00 0.0005171 9496 941A RETAILER
  54. 4 2022-01-01 11:00 2022-01-01 04:00 0.0005152 9496 941A RETAILER
  55. 5 2022-01-01 12:00 2022-01-01 05:00 0.0005104 9496 941A RETAILER
  56. 6 2022-01-01 13:00 2022-01-01 06:00 0.0005164 9496 941A RETAILER
  57. 7 2022-01-01 14:00 2022-01-01 07:00 0.0005426 9496 941A RETAILER
  58. 8 2022-01-01 15:00 2022-01-01 08:00 0.0005907 9496 941A RETAILER
  59. 9 2022-01-01 16:00 2022-01-01 09:00 0.0006283 9496 941A RETAILER
  60. 10 2022-01-01 17:00 2022-01-01 10:00 0.0006528 9496 941A RETAILER
  61. 11 2022-01-01 18:00 2022-01-01 11:00 0.0007141 9496 941A RETAILER
  62. 12 2022-01-01 19:00 2022-01-01 12:00 0.0007192 9496 941A RETAILER
  63. 13 2022-01-01 20:00 2022-01-01 13:00 0.0007495 9496 941A RETAILER
  64. 14 2022-01-01 21:00 2022-01-01 14:00 0.0006842 9496 941A RETAILER
  65. 15 2022-01-01 22:00 2022-01-01 15:00 0.0006804 9496 941A RETAILER
  66. 16 2022-01-01 23:00 2022-01-01 16:00 0.0007282 9496 941A RETAILER
  67. 17 2022-01-02 00:00 2022-01-01 17:00 0.0008322 9496 941A RETAILER
  68. 18 2022-01-02 01:00 2022-01-01 18:00 0.0008516 9496 941A RETAILER
  69. 19 2022-01-02 02:00 2022-01-01 19:00 0.0007729 9496 941A RETAILER
  70. 20 2022-01-02 03:00 2022-01-01 20:00 0.0006861 9496 941A RETAILER
  71. 21 2022-01-02 04:00 2022-01-01 21:00 0.0006861 9496 941A RETAILER
  72. 22 2022-01-02 05:00 2022-01-01 22:00 0.0006434 9496 941A RETAILER
  73. 23 2022-01-02 06:00 2022-01-01 23:00 0.0005783 9496 941A RETAILER
  74. 24 2022-01-01 07:00 2022-01-01 00:00 0 9496 941C RETAILER
  75. 25 2022-01-01 08:00 2022-01-01 01:00 0 9496 941C RETAILER
  76. 26 2022-01-01 09:00 2022-01-01 02:00 0 9496 941C RETAILER
  77. 27 2022-01-01 10:00 2022-01-01 03:00 0 9496 941C RETAILER
  78. 28 2022-01-01 11:00 2022-01-01 04:00 0 9496 941C RETAILER
  79. 29 2022-01-01 12:00 2022-01-01 05:00 0 9496 941C RETAILER
  80. 30 2022-01-01 13:00 2022-01-01 06:00 0 9496 941C RETAILER
  81. 31 2022-01-01 14:00 2022-01-01 07:00 0 9496 941C RETAILER
  82. 32 2022-01-01 15:00 2022-01-01 08:00 0 9496 941C RETAILER
  83. 33 2022-01-01 16:00 2022-01-01 09:00 0 9496 941C RETAILER
  84. 34 2022-01-01 17:00 2022-01-01 10:00 0 9496 941C RETAILER
  85. 35 2022-01-01 18:00 2022-01-01 11:00 0 9496 941C RETAILER
  86. 36 2022-01-01 19:00 2022-01-01 12:00 0 9496 941C RETAILER
  87. 37 2022-01-01 20:00 2022-01-01 13:00 0 9496 941C RETAILER
  88. 38 2022-01-01 21:00 2022-01-01 14:00 0 9496 941C RETAILER
  89. 39 2022-01-01 22:00 2022-01-01 15:00 0 9496 941C RETAILER
  90. 40 2022-01-01 23:00 2022-01-01 16:00 0 9496 941C RETAILER
  91. 41 2022-01-02 00:00 2022-01-01 17:00 0 9496 941C RETAILER
  92. 42 2022-01-02 01:00 2022-01-01 18:00 0 9496 941C RETAILER
  93. 43 2022-01-02 02:00 2022-01-01 19:00 0 9496 941C RETAILER
  94. 44 2022-01-02 03:00 2022-01-01 20:00 0 9496 941C RETAILER
  95. 45 2022-01-02 04:00 2022-01-01 21:00 0 9496 941C RETAILER
  96. 46 2022-01-02 05:00 2022-01-01 22:00 0 9496 941C RETAILER
  97. 47 2022-01-02 06:00 2022-01-01 23:00 0 9496 941C RETAILER
  98. [1]: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
  99. </details>
  100. # 答案2
  101. **得分**: 2
  102. 你可以尝试这样做。对于列表部分使用 `explode`,然后再次对嵌套对象使用 `json_normalize`
  103. ```python
  104. df = (pd.json_normalize(df['return'], record_path=['asset_list'])
  105. .explode('metered_volume_list'))
  106. df = pd.concat([df[['asset_ID', 'asset_class']].reset_index(drop=True),
  107. pd.json_normalize(df.metered_volume_list)], axis=1)
英文:

you can try this. Use explode for the list and use json_normalize again on the nested object.

  1. df = (pd.json_normalize(df[&#39;return&#39;], record_path=[&#39;asset_list&#39;])
  2. .explode(&#39;metered_volume_list&#39;))
  3. df = pd.concat([df[[&#39;asset_ID&#39;, &#39;asset_class&#39;]].reset_index(drop=True),
  4. pd.json_normalize(df.metered_volume_list)], axis=1)

huangapple
  • 本文由 发表于 2023年8月10日 22:10:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876534.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定