将深度嵌套的Json数据规范化为pandas数据框

huangapple go评论92阅读模式
英文:

Normalizing deeply nested Json data for pandas dataframe

问题

我是新手使用pandas,我在尝试从一系列化妆品产品中提取颜色变化的数据时遇到了困难。

我的目标是创建一个DataFrame,其中包含每个产品的所有颜色变化,每个颜色变化都在自己的列表中。

类似这样的表格:

Name Type URL Price Description Images Shades Hex
product1 lipstick ... 27.00 ... [.,.,.] [.,.,.] [.,.]

我试图展平这个数据,但我一直收到关键错误。

这是最初的请求。

  1. import requests
  2. import pandas as pd
  3. headers = {
  4. 'authority': 'ncsa.sdapi.io',
  5. 'accept': 'application/json',
  6. 'accept-language': 'en-US,en;q=0.9',
  7. 'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
  8. 'business-unit': '2-mc-us-en-ecommv1',
  9. 'cache-control': 'no-cache',
  10. 'clientid': 'stardust-fe-client',
  11. 'content-type': 'application/json',
  12. 'origin': 'https://m.maccosmetics.com',
  13. 'referer': 'https://m.maccosmetics.com/',
  14. 'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
  15. 'sec-ch-ua-mobile': '?1',
  16. 'sec-ch-ua-platform': '"Android"',
  17. 'sec-fetch-dest': 'empty',
  18. 'sec-fetch-mode': 'cors',
  19. 'sec-fetch-site': 'cross-site',
  20. 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
  21. }
  22. json_data = {
  23. 'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product
  24. <details>
  25. <summary>英文:</summary>
  26. I am new to working with pandas, and I am having difficulty extracting the data for color variations from a series of makeup products.
  27. My goal is to set up a dataframe with all color variations for each product in their own lists.
  28. Something along these lines:
  29. | Name | Type | URL | Price | Description |Images |Shades | Hex |
  30. | -------- | -------- | -------- | -------- | ----------- |------ | ------| ---- |
  31. | product1 | lipstick | ... | 27.00 | ... |[.,.,.]|[.,.,.]|[.,.] |
  32. [I am trying to flatten this data, but I keep receiving key errors.](https://i.stack.imgur.com/ds2jU.png)
  33. Here is the initial request.

import requests
import pandas as pd

headers = {
'authority': 'ncsa.sdapi.io',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
'business-unit': '2-mc-us-en-ecommv1',
'cache-control': 'no-cache',
'clientid': 'stardust-fe-client',
'content-type': 'application/json',
'origin': 'https://m.maccosmetics.com',
'referer': 'https://m.maccosmetics.com/',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}

json_data = {
'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuMediumImages \n on product__skus {\n media {\n medium {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuSmallImages \n on product__skus {\n media {\n small {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_vtoFoundation \n on product__skus {\n vto {\n is_foundation_experience\n }\n }\n\n\nfragment product__skus_vtoMakeup \n on product__skus {\n vto {\n is_color_experience\n }\n }\n',
'variables': {},
}

response = requests.post(
'https://ncsa.sdapi.io/stardust-prodcat-product-v3/graphql/core/v1/extension/v1',
headers=headers,
json=json_data,
)

  1. All of these values returned as expected

json_object = response.json()

result_items = json_object['data']['products']['items']

result_items[0]['skus']['items'][0]['prices'][0]['include_tax']['price_formatted']

result_items[0]['skus']['items'][0]['shades']

result_items[0]['skus']['items'][0]['media']['large'][0]['src']

result_items[0]['skus']['items'][0]['media']['large'][0]['alt']

result_items[0]['skus']['items'][0]['color_family'][0]['value']

  1. I was able to access the shade names for a single product like so

shade_list = []
def get_shade_names():
items = result_items[0]['skus']['items']
for item in items:
shades = item['shades']
for shade_data in shades:
shade = shade_data['name']
shade_list.append(shade)

get_shade_names()
print(shade_list)

  1. but several attempts at implementing the nested loop for the list of lists has just resulted in a single list or a series of errors.
  2. This is when I pivoted from DataFrame to json_normalize. However, I keep receiving key errors when trying to use record path and meta.
  3. Can someone show me how to proceed? I tried to go off of the examples in the pandas documentation, but nothing seems to be working. Any help would be greatly appreciated.
  4. </details>
  5. # 答案1
  6. **得分**: 1
  7. 使用metarecord_path的示例:
  8. ```python
  9. data = response.json()
  10. df = pd.json_normalize(
  11. data=data["data"]["products"]["items"],
  12. meta="product_id",
  13. record_path=["skus", "items", "shades"]
  14. )

选择一个产品:

  1. shades = df.query("product_id.eq('99908')")["name"].to_list()
  2. print(shades)

输出:

  1. ['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']

注意:这是给定的示例代码的翻译。

英文:

An example using meta and record_path:

  1. data = response.json()
  2. df = pd.json_normalize(
  3. data=data[&quot;data&quot;][&quot;products&quot;][&quot;items&quot;],
  4. meta=&quot;product_id&quot;,
  5. record_path=[&quot;skus&quot;, &quot;items&quot;, &quot;shades&quot;]
  6. )

Select a product:

  1. shades = df.query(&quot;product_id.eq(&#39;99908&#39;)&quot;)[&quot;name&quot;].to_list()
  2. print(shades)

Output:

  1. [&#39;Marrakesh-mere&#39;, &#39;Sheer Outrage&#39;, &#39;Dubonnet Buzz&#39;, &#39;Mull It Over&#39;, &#39;Rose Mary&#39;, &#39;Sweet Cinnamon&#39;, &#39;Devoted To Chili&#39;, &#39;Wild Rebel&#39;, &#39;Devoted To Danger&#39;, &#39;Love Clove&#39;, &#39;Ruby New&#39;, &#39;Gingerella&#39;, &#39;Stay Curious&#39;, &#39;Peppery Pink&#39;, &#39;All-Star Anise&#39;, &#39;Nice Spice&#39;, &#39;Spice World&#39;, &#39;Over The Taupe&#39;, &#39;Brickthrough&#39;, &#39;Nutmeg Ganache&#39;, &#39;Sorry Not Sorry&#39;, &#39;Pumpkin Spiced&#39;, &#39;Hot Paprika&#39;]

答案2

得分: 0

创建多个数据框
你的输出不够清晰,但你可以使用类似以下的方法:

  1. # 从顶级记录中提取基础数据
  2. main_cols = ['product_id', 'display_name', 'description']
  3. main_df = pd.json_normalize(result_items)[main_cols]
  4. # 提取子数据集
  5. shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')
  6. # 合并基础数据和其他子数据集
  7. df = main_df.merge(shade_df, on='product_id')

输出:

  1. >>> df
  2. product_id display_name description shade.name shade.description shade.hex_val
  3. 0 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果使用我们的... Marrakesh-mere 强烈的橙色棕色 #b0594d
  4. 1 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果使用我们的... Sheer Outrage 葡萄柚粉色 #ca5a5a
  5. 2 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果使用我们的... Dubonnet Buzz 深红葡萄酒 #c95c54
  6. 3 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果使用我们的... Mull It Over 肮脏的桃子 #a45f51
  7. 4 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果使用我们的... Rose Mary 柔和的紫色 #b96161
  8. .. ... ... ... ... ... ...
  9. 329 19393 Lipmix / Satin Lipmix是化妆师的工具就像... 亚光 #dedede
  10. 330 1625 Lip Erase M·A·C专业唇部修复产品... 苍白 N27 #e3bd92
  11. 331 19392 Lipmix / Gloss Lipmix是化妆师的工具就像... 光泽 #dfdbcb
  12. 332 82134 Lipstick / Frosted Firework 五款节日专享LustreFros... Once Bitten, Ice Shy 珠光的白色 #eae8df
  13. 333 52596 Lustre Lipstick M·A·C口红-标志性的产品... Lady Bug 黄番茄 #b23532
英文:

Create multiple dataframes
Your output is not really clear but you can use something like:

  1. # Extract base data from top level records
  2. main_cols = [&#39;product_id&#39;, &#39;display_name&#39;, &#39;description&#39;]
  3. main_df = pd.json_normalize(result_items)[main_cols]
  4. # Extract sub dataset
  5. shade_df = pd.json_normalize(result_items, [&#39;skus&#39;, &#39;items&#39;, &#39;shades&#39;], &#39;product_id&#39;, record_prefix=&#39;shade.&#39;)
  6. # Merge base and other sub dataset
  7. df = main_df.merge(shade_df, on=&#39;product_id&#39;)

Output:

  1. &gt;&gt;&gt; df
  2. product_id display_name description shade.name shade.description shade.hex_val
  3. 0 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Marrakesh-mere Intense orange brown #b0594d
  4. 1 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Sheer Outrage Grapefruit pink #ca5a5a
  5. 2 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Dubonnet Buzz Deep red wine #c95c54
  6. 3 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Mull It Over Dirty peach #a45f51
  7. 4 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Rose Mary Soft mauve #b96161
  8. .. ... ... ... ... ... ...
  9. 329 19393 Lipmix / Satin Lipmix is to the makeup artist as tubes of pai... Satin #dedede
  10. 330 1625 Lip Erase M&#183;A&#183;C Pro Lip Erase is a professional product ... Pale N27 #e3bd92
  11. 331 19392 Lipmix / Gloss Lipmix is to the makeup artist as tubes of pai... Gloss #dfdbcb
  12. 332 82134 Lipstick / Frosted Firework A blast of five holiday-exclusive Lustre, Fros... Once Bitten, Ice Shy Sheer white w/ pearl #eae8df
  13. 333 52596 Lustre Lipstick M&#183;A&#183;C Lipstick the iconic product that made ... Lady Bug Yellow tomato #b23532

huangapple
  • 本文由 发表于 2023年2月19日 07:01:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496910.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定