将深度嵌套的Json数据规范化为pandas数据框

huangapple go评论72阅读模式
英文:

Normalizing deeply nested Json data for pandas dataframe

问题

我是新手使用pandas,我在尝试从一系列化妆品产品中提取颜色变化的数据时遇到了困难。

我的目标是创建一个DataFrame,其中包含每个产品的所有颜色变化,每个颜色变化都在自己的列表中。

类似这样的表格:

Name Type URL Price Description Images Shades Hex
product1 lipstick ... 27.00 ... [.,.,.] [.,.,.] [.,.]

我试图展平这个数据,但我一直收到关键错误。

这是最初的请求。

import requests
import pandas as pd

headers = {
    'authority': 'ncsa.sdapi.io',
    'accept': 'application/json',
    'accept-language': 'en-US,en;q=0.9',
    'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
    'business-unit': '2-mc-us-en-ecommv1',
    'cache-control': 'no-cache',
    'clientid': 'stardust-fe-client',
    'content-type': 'application/json',
    'origin': 'https://m.maccosmetics.com',
    'referer': 'https://m.maccosmetics.com/',
    'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
    'sec-ch-ua-mobile': '?1',
    'sec-ch-ua-platform': '"Android"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'cross-site',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}

json_data = {
    'query': '{\n                products(environment: {prod:true},\n                    filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n                    sort: [{tags:{product_display_order:ASCENDING}}]\n                ) {\n                    \n        ... product__collection \n        \n        items {\n            ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n            skus {\n                total\n                items {\n                    ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n                }\n            }\n        }\n    \n    \n                }\n            }\n\nfragment product__collection \n    on product_collection {\n        items {\n            product_id\n            skus {\n                items {\n                    inventory_status\n                    sku_id\n                }\n            }\n        }\n    }\n\n\nfragment product_default \n    on product {\n        default_category {\n            id\n            value\n        }\n        description\n        display_name\n        is_hazmat\n        meta {\n            description\n        }\n        product_badge\n        product_id\n        product_url\n        short_description\n        tags {\n            total\n            items {\n                id\n                value\n                key\n            }\n        }\n    }\n\n\nfragment product_productSkinType \n    on product {\n        skin {\n            type {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_form \n    on product {\n        form {\n            key\n            value\n        }\n    }\n\n\nfragment product_productCoverage \n    on product {\n        coverage {\n            key\n            value\n        }\n    }\n\n\nfragment product_benefit \n    on product {\n        benefit {\n            benefits {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_productReview \n    on product {\n        reviews {\n            average_rating\n            number_of_reviews\n        }\n    }\n\n\nfragment product_skinConcern \n    on product {\n        skin {\n            concern {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_usage \n    on product {\n        usage {\n            content\n            label\n            type\n        }\n    }\n\n\nfragment product_productFinish \n    on product {\n        finish {\n            key\n            value\n        }\n    }\n\n\nfragment product_usageOptions \n    on product {\n        usage_options {\n            key\n            value\n        }\n    }\n\n\nfragment product_brushTypes \n    on product {\n        brush {\n            types {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product_brushShapes \n    on product {\n        brush {\n            shapes {\n                key\n                value\n            }\n        }\n    }\n\n\nfragment product__skus_default \n    on product__skus {\n        is_default_sku\n        is_discountable\n        is_giftwrap\n        is_under_weight_hazmat\n        iln_listing\n        iln_version_number\n        inventory_status\n        material_code\n        prices {\n            currency\n            is_discounted\n            include_tax {\n                price\n                original_price\n                price_per_unit\n                price_formatted\n                original_price_formatted\n                price_per_unit_formatted\n            }\n        }\n        sizes {\n            value\n            key\n        }\n        shades {\n            name\n            description\n            hex_val\n        }\n        sku_id\n        sku_badge\n        unit_size_formatted\n        upc\n    }\n\n\nfragment product__skus_autoReplenish \n    on product__skus {\n        is_replenishable\n    }\n\n\nfragment product__skus_colorFamily \n    on product__skus {\n        color_family {\n            key\n            value\n        }\n    }\n\n\nfragment product__skus_skuLargeImages \n    on product__skus {\n        media {\n            large {\n                src\n                alt\n                height\n                width\n            }\n        }\n    }\n\n\nfragment product

<details>
<summary>英文:</summary>

I am new to working with pandas, and I am having difficulty extracting the data for color variations from a series of makeup products. 

My goal is to set up a dataframe with all color variations for each product in their own lists. 

Something along these lines:

| Name     | Type     |   URL    | Price    | Description |Images |Shades | Hex  |
| -------- | -------- | -------- | -------- | ----------- |------ | ------| ---- |
| product1 | lipstick | ...      | 27.00    | ...         |[.,.,.]|[.,.,.]|[.,.] |


[I am trying to flatten this data, but I keep receiving key errors.](https://i.stack.imgur.com/ds2jU.png)

Here is the initial request.

import requests
import pandas as pd

headers = {
'authority': 'ncsa.sdapi.io',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
'business-unit': '2-mc-us-en-ecommv1',
'cache-control': 'no-cache',
'clientid': 'stardust-fe-client',
'content-type': 'application/json',
'origin': 'https://m.maccosmetics.com',
'referer': 'https://m.maccosmetics.com/',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}

json_data = {
'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuMediumImages \n on product__skus {\n media {\n medium {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuSmallImages \n on product__skus {\n media {\n small {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_vtoFoundation \n on product__skus {\n vto {\n is_foundation_experience\n }\n }\n\n\nfragment product__skus_vtoMakeup \n on product__skus {\n vto {\n is_color_experience\n }\n }\n',
'variables': {},
}

response = requests.post(
'https://ncsa.sdapi.io/stardust-prodcat-product-v3/graphql/core/v1/extension/v1',
headers=headers,
json=json_data,
)

All of these values returned as expected 

json_object = response.json()

result_items = json_object['data']['products']['items']

result_items[0]['skus']['items'][0]['prices'][0]['include_tax']['price_formatted']

result_items[0]['skus']['items'][0]['shades']

result_items[0]['skus']['items'][0]['media']['large'][0]['src']

result_items[0]['skus']['items'][0]['media']['large'][0]['alt']

result_items[0]['skus']['items'][0]['color_family'][0]['value']


I was able to access the shade names for a single product like so

shade_list = []
def get_shade_names():
items = result_items[0]['skus']['items']
for item in items:
shades = item['shades']
for shade_data in shades:
shade = shade_data['name']
shade_list.append(shade)

get_shade_names()
print(shade_list)

but several attempts at implementing the nested loop for the list of lists has just resulted in a single list or a series of errors. 
This is when I pivoted from DataFrame to json_normalize. However, I keep receiving key errors when trying to use record path and meta. 
Can someone show me how to proceed? I tried to go off of the examples in the pandas documentation, but nothing seems to be working. Any help would be greatly appreciated.
</details>
# 答案1
**得分**: 1
使用meta和record_path的示例:
```python
data = response.json()
df = pd.json_normalize(
data=data["data"]["products"]["items"],
meta="product_id",
record_path=["skus", "items", "shades"]
)

选择一个产品:

shades = df.query("product_id.eq('99908')")["name"].to_list()
print(shades)

输出:

['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']

注意:这是给定的示例代码的翻译。

英文:

An example using meta and record_path:

data = response.json()
df = pd.json_normalize(
data=data[&quot;data&quot;][&quot;products&quot;][&quot;items&quot;],
meta=&quot;product_id&quot;,
record_path=[&quot;skus&quot;, &quot;items&quot;, &quot;shades&quot;]
)

Select a product:

shades = df.query(&quot;product_id.eq(&#39;99908&#39;)&quot;)[&quot;name&quot;].to_list()
print(shades)

Output:

[&#39;Marrakesh-mere&#39;, &#39;Sheer Outrage&#39;, &#39;Dubonnet Buzz&#39;, &#39;Mull It Over&#39;, &#39;Rose Mary&#39;, &#39;Sweet Cinnamon&#39;, &#39;Devoted To Chili&#39;, &#39;Wild Rebel&#39;, &#39;Devoted To Danger&#39;, &#39;Love Clove&#39;, &#39;Ruby New&#39;, &#39;Gingerella&#39;, &#39;Stay Curious&#39;, &#39;Peppery Pink&#39;, &#39;All-Star Anise&#39;, &#39;Nice Spice&#39;, &#39;Spice World&#39;, &#39;Over The Taupe&#39;, &#39;Brickthrough&#39;, &#39;Nutmeg Ganache&#39;, &#39;Sorry Not Sorry&#39;, &#39;Pumpkin Spiced&#39;, &#39;Hot Paprika&#39;]

答案2

得分: 0

创建多个数据框
你的输出不够清晰,但你可以使用类似以下的方法:

# 从顶级记录中提取基础数据
main_cols = ['product_id', 'display_name', 'description']
main_df = pd.json_normalize(result_items)[main_cols]

# 提取子数据集
shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')

# 合并基础数据和其他子数据集
df = main_df.merge(shade_df, on='product_id')

输出:

>>> df
    product_id                        display_name                                        description            shade.name     shade.description shade.hex_val
0        99908  Powder Kiss Velvet Blur Slim Stick  体验最大限度的水润雾面效果使用我们的...        Marrakesh-mere  强烈的橙色棕色       #b0594d
1        99908  Powder Kiss Velvet Blur Slim Stick  体验最大限度的水润雾面效果使用我们的...         Sheer Outrage       葡萄柚粉色       #ca5a5a
2        99908  Powder Kiss Velvet Blur Slim Stick  体验最大限度的水润雾面效果使用我们的...         Dubonnet Buzz         深红葡萄酒       #c95c54
3        99908  Powder Kiss Velvet Blur Slim Stick  体验最大限度的水润雾面效果使用我们的...          Mull It Over           肮脏的桃子       #a45f51
4        99908  Powder Kiss Velvet Blur Slim Stick  体验最大限度的水润雾面效果使用我们的...             Rose Mary            柔和的紫色       #b96161
..         ...                                 ...                                                ...                   ...                   ...           ...
329      19393                      Lipmix / Satin  Lipmix是化妆师的工具就像...                 亚光                             #dedede
330       1625                           Lip Erase  M·A·C专业唇部修复产品...                  苍白                   N27       #e3bd92
331      19392                      Lipmix / Gloss  Lipmix是化妆师的工具就像...                 光泽                             #dfdbcb
332      82134         Lipstick / Frosted Firework  五款节日专享LustreFros...  Once Bitten, Ice Shy  珠光的白色       #eae8df
333      52596                     Lustre Lipstick  M·A·C口红-标志性的产品...              Lady Bug         黄番茄       #b23532
英文:

Create multiple dataframes
Your output is not really clear but you can use something like:

# Extract base data from top level records
main_cols = [&#39;product_id&#39;, &#39;display_name&#39;, &#39;description&#39;]
main_df = pd.json_normalize(result_items)[main_cols]
# Extract sub dataset
shade_df = pd.json_normalize(result_items, [&#39;skus&#39;, &#39;items&#39;, &#39;shades&#39;], &#39;product_id&#39;, record_prefix=&#39;shade.&#39;)
# Merge base and other sub dataset
df = main_df.merge(shade_df, on=&#39;product_id&#39;)

Output:

&gt;&gt;&gt; df
product_id                        display_name                                        description            shade.name     shade.description shade.hex_val
0        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...        Marrakesh-mere  Intense orange brown       #b0594d
1        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...         Sheer Outrage       Grapefruit pink       #ca5a5a
2        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...         Dubonnet Buzz         Deep red wine       #c95c54
3        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...          Mull It Over           Dirty peach       #a45f51
4        99908  Powder Kiss Velvet Blur Slim Stick  Experience moisture-matte to the max with our ...             Rose Mary            Soft mauve       #b96161
..         ...                                 ...                                                ...                   ...                   ...           ...
329      19393                      Lipmix / Satin  Lipmix is to the makeup artist as tubes of pai...                 Satin                             #dedede
330       1625                           Lip Erase  M&#183;A&#183;C Pro Lip Erase is a professional product ...                  Pale                   N27       #e3bd92
331      19392                      Lipmix / Gloss  Lipmix is to the makeup artist as tubes of pai...                 Gloss                             #dfdbcb
332      82134         Lipstick / Frosted Firework  A blast of five holiday-exclusive Lustre, Fros...  Once Bitten, Ice Shy  Sheer white w/ pearl       #eae8df
333      52596                     Lustre Lipstick  M&#183;A&#183;C Lipstick – the iconic product that made ...              Lady Bug         Yellow tomato       #b23532

huangapple
  • 本文由 发表于 2023年2月19日 07:01:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496910.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定