pandas – json: 父序列 ID

huangapple go评论64阅读模式
英文:

pandas - json: parent sequence ID

问题

subscription_boo subscription_ref_prefix no_of_products product_id product_seq parent_product_seq
True S_ 1.0 1.00 1 0
True S_ 1.0 1.10 2 1
True S_ 1.0 1.20 3 1
True S_ 1.0 1.11 4 2
True B_ 1.0 2.00 5 0
True B_ 1.0 2.10 6 5
True B_ 1.0 2.20 7 5
True B_ 1.0 2.11 8 6
True B_ 1.0 2.11 9 6

英文:

I have a dataframe with a column of nested json objects I wouls like them to be normalised and they must also get a parent ID.
example data:

{"subscription":[{"subscription_boo":true,"subscription_ref_prefix":"S_","product":[{"no_of_products":1,"product_id":1,"product":[{"no_of_products":1,"product_id":1.1,"product":[{"no_of_products":1,"product_id":1.11}]},{"no_of_products":1,"product_id":1.2}]}]},{"subscription_boo":true,"subscription_ref_prefix":"B_","product":[{"no_of_products":1,"product_id":2,"product":[{"no_of_products":1,"product_id":2.1,"product":[{"no_of_products":1,"product_id":2.11,"product":[{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.2}]}]}]}

I now want to add product_seq stating the sequence of the product as well as the parent product sequence while normalizing this JSON.

desired output is:

subscription_boo subscription_ref_prefix no_of_products product_id product_seq parent_product_seq
	True	S_		1.0	   1.00		1    0
	True	S_		1.0	   1.10		2    1
	True	S_		1.0	   1.20		3    1
	True	S_		1.0	   1.11		4    2
	True	B_		1.0	   2.00		5    0
	True	B_		1.0	   2.10		6    5
	True	B_		1.0	   2.20		7    5
	True	B_		1.0	   2.11		8    6
	True	B_		1.0	   2.11		9    6

答案1

得分: 1

你嵌套的产品结构非常复杂。我建议在使用 json_normalize 之前,使用递归函数将其展平:

import pandas as pd
import json

data = """{...your JSON data...}"""

input_rec = json.loads(data)["subscription"]

product_seq = 0

def flatten_products(prod_list):
    flattened_prod = []

    def flatten_product(prod_list, parent=0):
        global product_seq
        for prod in prod_list:
            product_seq += 1
            flattened_prod.append({
                "no_of_products": prod["no_of_products"],
                "product_id": prod["product_id"],
                "product_seq": product_seq,
                "parent_product_seq": parent
            })
            flatten_product(prod.get("product", []), product_seq)

    flatten_product(prod_list)
    return flattened_prod

for sub in input_rec:
    sub["product"] = flatten_products(sub["product"])

df = pd.json_normalize(input_rec, record_path=["product"], meta=["subscription_boo", "subscription_ref_prefix"])

输出:

   no_of_products  product_id  product_seq  parent_product_seq subscription_boo subscription_ref_prefix
0               1        1.00            1                   0             True                      S_
1               1        1.10            2                   1             True                      S_
2               1        1.11            3                   2             True                      S_
3               1        1.20            4                   1             True                      S_
4               1        2.00            5                   0             True                      B_
5               1        2.10            6                   5             True                      B_
6               1        2.11            7                   6             True                      B_
7               1        2.11            8                   7             True                      B_
8               1        2.11            9                   6             True                      B_
9               1        2.20           10                   5             True                      B_
英文:

Your nested products are very intricated. I suggest to flatten them with a recursive function before using json_normalize:

import pandas as pd
import json

data = """{"subscription":[{"subscription_boo":true,"subscription_ref_prefix":"S_","product":[{"no_of_products":1,"product_id":1,"product":[{"no_of_products":1,"product_id":1.1,"product":[{"no_of_products":1,"product_id":1.11}]},{"no_of_products":1,"product_id":1.2}]}]},{"subscription_boo":true,"subscription_ref_prefix":"B_","product":[{"no_of_products":1,"product_id":2,"product":[{"no_of_products":1,"product_id":2.1,"product":[{"no_of_products":1,"product_id":2.11,"product":[{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.2}]}]}]}"""

input_rec = json.loads(data)["subscription"]

product_seq = 0
def flatten_products(prod_list):
    flattened_prod = []

    def flatten_product(prod_list, parent=0):
        global product_seq
        for prod in prod_list:
            product_seq +=1
            flattened_prod.append({
                "no_of_products": prod["no_of_products"],
                "product_id": prod["product_id"],
                "product_seq": product_seq,
                "parent_product_seq": parent
            })
            flatten_product(prod.get("product", []), product_seq)

    flatten_product(prod_list)  
    return flattened_prod

for sub in input_rec:
    sub["product"] = flatten_products(sub["product"])

df = pd.json_normalize(input_rec, record_path=["product"], meta=["subscription_boo", "subscription_ref_prefix"])

Output:

   no_of_products  product_id  product_seq  parent_product_seq subscription_boo subscription_ref_prefix
0               1        1.00            1                   0             True                      S_
1               1        1.10            2                   1             True                      S_
2               1        1.11            3                   2             True                      S_
3               1        1.20            4                   1             True                      S_
4               1        2.00            5                   0             True                      B_
5               1        2.10            6                   5             True                      B_
6               1        2.11            7                   6             True                      B_
7               1        2.11            8                   7             True                      B_
8               1        2.11            9                   6             True                      B_
9               1        2.20           10                   5             True                      B_

huangapple
  • 本文由 发表于 2023年2月24日 15:18:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553586.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定