英文:
Normalizing deeply nested Json data for pandas dataframe
问题
我是新手使用pandas,我在尝试从一系列化妆品产品中提取颜色变化的数据时遇到了困难。
我的目标是创建一个DataFrame,其中包含每个产品的所有颜色变化,每个颜色变化都在自己的列表中。
类似这样的表格:
Name | Type | URL | Price | Description | Images | Shades | Hex |
---|---|---|---|---|---|---|---|
product1 | lipstick | ... | 27.00 | ... | [.,.,.] | [.,.,.] | [.,.] |
这是最初的请求。
import requests
import pandas as pd
headers = {
'authority': 'ncsa.sdapi.io',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
'business-unit': '2-mc-us-en-ecommv1',
'cache-control': 'no-cache',
'clientid': 'stardust-fe-client',
'content-type': 'application/json',
'origin': 'https://m.maccosmetics.com',
'referer': 'https://m.maccosmetics.com/',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}
json_data = {
'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product
<details>
<summary>英文:</summary>
I am new to working with pandas, and I am having difficulty extracting the data for color variations from a series of makeup products.
My goal is to set up a dataframe with all color variations for each product in their own lists.
Something along these lines:
| Name | Type | URL | Price | Description |Images |Shades | Hex |
| -------- | -------- | -------- | -------- | ----------- |------ | ------| ---- |
| product1 | lipstick | ... | 27.00 | ... |[.,.,.]|[.,.,.]|[.,.] |
[I am trying to flatten this data, but I keep receiving key errors.](https://i.stack.imgur.com/ds2jU.png)
Here is the initial request.
import requests
import pandas as pd
headers = {
'authority': 'ncsa.sdapi.io',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
'business-unit': '2-mc-us-en-ecommv1',
'cache-control': 'no-cache',
'clientid': 'stardust-fe-client',
'content-type': 'application/json',
'origin': 'https://m.maccosmetics.com',
'referer': 'https://m.maccosmetics.com/',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}
json_data = {
'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuMediumImages \n on product__skus {\n media {\n medium {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuSmallImages \n on product__skus {\n media {\n small {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_vtoFoundation \n on product__skus {\n vto {\n is_foundation_experience\n }\n }\n\n\nfragment product__skus_vtoMakeup \n on product__skus {\n vto {\n is_color_experience\n }\n }\n',
'variables': {},
}
response = requests.post(
'https://ncsa.sdapi.io/stardust-prodcat-product-v3/graphql/core/v1/extension/v1',
headers=headers,
json=json_data,
)
All of these values returned as expected
json_object = response.json()
result_items = json_object['data']['products']['items']
result_items[0]['skus']['items'][0]['prices'][0]['include_tax']['price_formatted']
result_items[0]['skus']['items'][0]['shades']
result_items[0]['skus']['items'][0]['media']['large'][0]['src']
result_items[0]['skus']['items'][0]['media']['large'][0]['alt']
result_items[0]['skus']['items'][0]['color_family'][0]['value']
I was able to access the shade names for a single product like so
shade_list = []
def get_shade_names():
items = result_items[0]['skus']['items']
for item in items:
shades = item['shades']
for shade_data in shades:
shade = shade_data['name']
shade_list.append(shade)
get_shade_names()
print(shade_list)
but several attempts at implementing the nested loop for the list of lists has just resulted in a single list or a series of errors.
This is when I pivoted from DataFrame to json_normalize. However, I keep receiving key errors when trying to use record path and meta.
Can someone show me how to proceed? I tried to go off of the examples in the pandas documentation, but nothing seems to be working. Any help would be greatly appreciated.
</details>
# 答案1
**得分**: 1
使用meta和record_path的示例:
```python
data = response.json()
df = pd.json_normalize(
data=data["data"]["products"]["items"],
meta="product_id",
record_path=["skus", "items", "shades"]
)
选择一个产品:
shades = df.query("product_id.eq('99908')")["name"].to_list()
print(shades)
输出:
['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']
注意:这是给定的示例代码的翻译。
英文:
An example using meta and record_path:
data = response.json()
df = pd.json_normalize(
data=data["data"]["products"]["items"],
meta="product_id",
record_path=["skus", "items", "shades"]
)
Select a product:
shades = df.query("product_id.eq('99908')")["name"].to_list()
print(shades)
Output:
['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']
答案2
得分: 0
创建多个数据框
你的输出不够清晰,但你可以使用类似以下的方法:
# 从顶级记录中提取基础数据
main_cols = ['product_id', 'display_name', 'description']
main_df = pd.json_normalize(result_items)[main_cols]
# 提取子数据集
shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')
# 合并基础数据和其他子数据集
df = main_df.merge(shade_df, on='product_id')
输出:
>>> df
product_id display_name description shade.name shade.description shade.hex_val
0 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果,使用我们的... Marrakesh-mere 强烈的橙色棕色 #b0594d
1 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果,使用我们的... Sheer Outrage 葡萄柚粉色 #ca5a5a
2 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果,使用我们的... Dubonnet Buzz 深红葡萄酒 #c95c54
3 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果,使用我们的... Mull It Over 肮脏的桃子 #a45f51
4 99908 Powder Kiss Velvet Blur Slim Stick 体验最大限度的水润雾面效果,使用我们的... Rose Mary 柔和的紫色 #b96161
.. ... ... ... ... ... ...
329 19393 Lipmix / Satin Lipmix是化妆师的工具,就像... 亚光 #dedede
330 1625 Lip Erase M·A·C专业唇部修复产品... 苍白 N27 #e3bd92
331 19392 Lipmix / Gloss Lipmix是化妆师的工具,就像... 光泽 #dfdbcb
332 82134 Lipstick / Frosted Firework 五款节日专享Lustre、Fros... Once Bitten, Ice Shy 珠光的白色 #eae8df
333 52596 Lustre Lipstick M·A·C口红-标志性的产品... Lady Bug 黄番茄 #b23532
英文:
Create multiple dataframes
Your output is not really clear but you can use something like:
# Extract base data from top level records
main_cols = ['product_id', 'display_name', 'description']
main_df = pd.json_normalize(result_items)[main_cols]
# Extract sub dataset
shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')
# Merge base and other sub dataset
df = main_df.merge(shade_df, on='product_id')
Output:
>>> df
product_id display_name description shade.name shade.description shade.hex_val
0 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Marrakesh-mere Intense orange brown #b0594d
1 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Sheer Outrage Grapefruit pink #ca5a5a
2 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Dubonnet Buzz Deep red wine #c95c54
3 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Mull It Over Dirty peach #a45f51
4 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Rose Mary Soft mauve #b96161
.. ... ... ... ... ... ...
329 19393 Lipmix / Satin Lipmix is to the makeup artist as tubes of pai... Satin #dedede
330 1625 Lip Erase M·A·C Pro Lip Erase is a professional product ... Pale N27 #e3bd92
331 19392 Lipmix / Gloss Lipmix is to the makeup artist as tubes of pai... Gloss #dfdbcb
332 82134 Lipstick / Frosted Firework A blast of five holiday-exclusive Lustre, Fros... Once Bitten, Ice Shy Sheer white w/ pearl #eae8df
333 52596 Lustre Lipstick M·A·C Lipstick – the iconic product that made ... Lady Bug Yellow tomato #b23532
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论