英文:
pandas - json: parent sequence ID
问题
subscription_boo subscription_ref_prefix no_of_products product_id product_seq parent_product_seq
True S_ 1.0 1.00 1 0
True S_ 1.0 1.10 2 1
True S_ 1.0 1.20 3 1
True S_ 1.0 1.11 4 2
True B_ 1.0 2.00 5 0
True B_ 1.0 2.10 6 5
True B_ 1.0 2.20 7 5
True B_ 1.0 2.11 8 6
True B_ 1.0 2.11 9 6
英文:
I have a dataframe with a column of nested json objects I wouls like them to be normalised and they must also get a parent ID.
example data:
{"subscription":[{"subscription_boo":true,"subscription_ref_prefix":"S_","product":[{"no_of_products":1,"product_id":1,"product":[{"no_of_products":1,"product_id":1.1,"product":[{"no_of_products":1,"product_id":1.11}]},{"no_of_products":1,"product_id":1.2}]}]},{"subscription_boo":true,"subscription_ref_prefix":"B_","product":[{"no_of_products":1,"product_id":2,"product":[{"no_of_products":1,"product_id":2.1,"product":[{"no_of_products":1,"product_id":2.11,"product":[{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.2}]}]}]}
I now want to add product_seq stating the sequence of the product as well as the parent product sequence while normalizing this JSON.
desired output is:
subscription_boo subscription_ref_prefix no_of_products product_id product_seq parent_product_seq
True S_ 1.0 1.00 1 0
True S_ 1.0 1.10 2 1
True S_ 1.0 1.20 3 1
True S_ 1.0 1.11 4 2
True B_ 1.0 2.00 5 0
True B_ 1.0 2.10 6 5
True B_ 1.0 2.20 7 5
True B_ 1.0 2.11 8 6
True B_ 1.0 2.11 9 6
答案1
得分: 1
你嵌套的产品结构非常复杂。我建议在使用 json_normalize
之前,使用递归函数将其展平:
import pandas as pd
import json
data = """{...your JSON data...}"""
input_rec = json.loads(data)["subscription"]
product_seq = 0
def flatten_products(prod_list):
flattened_prod = []
def flatten_product(prod_list, parent=0):
global product_seq
for prod in prod_list:
product_seq += 1
flattened_prod.append({
"no_of_products": prod["no_of_products"],
"product_id": prod["product_id"],
"product_seq": product_seq,
"parent_product_seq": parent
})
flatten_product(prod.get("product", []), product_seq)
flatten_product(prod_list)
return flattened_prod
for sub in input_rec:
sub["product"] = flatten_products(sub["product"])
df = pd.json_normalize(input_rec, record_path=["product"], meta=["subscription_boo", "subscription_ref_prefix"])
输出:
no_of_products product_id product_seq parent_product_seq subscription_boo subscription_ref_prefix
0 1 1.00 1 0 True S_
1 1 1.10 2 1 True S_
2 1 1.11 3 2 True S_
3 1 1.20 4 1 True S_
4 1 2.00 5 0 True B_
5 1 2.10 6 5 True B_
6 1 2.11 7 6 True B_
7 1 2.11 8 7 True B_
8 1 2.11 9 6 True B_
9 1 2.20 10 5 True B_
英文:
Your nested products are very intricated. I suggest to flatten them with a recursive function before using json_normalize
:
import pandas as pd
import json
data = """{"subscription":[{"subscription_boo":true,"subscription_ref_prefix":"S_","product":[{"no_of_products":1,"product_id":1,"product":[{"no_of_products":1,"product_id":1.1,"product":[{"no_of_products":1,"product_id":1.11}]},{"no_of_products":1,"product_id":1.2}]}]},{"subscription_boo":true,"subscription_ref_prefix":"B_","product":[{"no_of_products":1,"product_id":2,"product":[{"no_of_products":1,"product_id":2.1,"product":[{"no_of_products":1,"product_id":2.11,"product":[{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.11}]},{"no_of_products":1,"product_id":2.2}]}]}]}"""
input_rec = json.loads(data)["subscription"]
product_seq = 0
def flatten_products(prod_list):
flattened_prod = []
def flatten_product(prod_list, parent=0):
global product_seq
for prod in prod_list:
product_seq +=1
flattened_prod.append({
"no_of_products": prod["no_of_products"],
"product_id": prod["product_id"],
"product_seq": product_seq,
"parent_product_seq": parent
})
flatten_product(prod.get("product", []), product_seq)
flatten_product(prod_list)
return flattened_prod
for sub in input_rec:
sub["product"] = flatten_products(sub["product"])
df = pd.json_normalize(input_rec, record_path=["product"], meta=["subscription_boo", "subscription_ref_prefix"])
Output:
no_of_products product_id product_seq parent_product_seq subscription_boo subscription_ref_prefix
0 1 1.00 1 0 True S_
1 1 1.10 2 1 True S_
2 1 1.11 3 2 True S_
3 1 1.20 4 1 True S_
4 1 2.00 5 0 True B_
5 1 2.10 6 5 True B_
6 1 2.11 7 6 True B_
7 1 2.11 8 7 True B_
8 1 2.11 9 6 True B_
9 1 2.20 10 5 True B_
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论