英文:
How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?
问题
我从JSON生成CSV文件。如何获得与一个字典列表的确切行数相同的扁平JSON?
行中的数据几乎相同,除了一些列会有变化。
示例:我有
{
"transportOrder": {
"customerId": "877299",
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"info1": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich"
}
},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
我想要的结果如附图所示:
customerId customerOrder customerReference creationDateTime info1 info2 name1 name2 amount code
877299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 7 EUP
838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 8 ENP
838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 17 ERP
import flatdict
data = {
"Order": {
"customerId": "877299",
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"info1": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich"
}
},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
flat = flatdict.FlatDict(data, delimiter='.')
result_list = []
j=0
for line in flat['Order.orderLines']:
temp = flat.copy()
temp['amount'] = line['amount']
temp['code'] = line['code']
result_list.append(temp)
print(result_list)
但我认为这不是最好的解决方案。还有什么其他方法?
英文:
I'm generating a CSV file from JSON.
How can I get flatter JSON with the exact rows as one list of dict has?
The data in rows will be almost the same, except for a few columns which will be variate.
Example: I have
{
"transportOrder": {
"customerId":"877299" ,
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"info1": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich",
}},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
And I want to as attached in the image:
customerId customerOrder customerReference creationDateTime info1 info2 name1 name2 amount code
877299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 7 EUP
838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 8 ENP
838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 17 ERP
import flatdict
data = {
"Order": {
"customerId":"877299" ,
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"info1": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich",
}},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
flat = flatdict.FlatDict(data, delimiter='.')
result_list = []
j=0
for line in flat['Order.orderLines']:
temp = flat
temp['amount'] = line['amount']
temp['code'] = line['code']
result_list.append(temp)
print(result_list)
But I think it is not the best solution. What could it be?
答案1
得分: 0
我会非常明确地构建这样一个DataFrame。
import pandas as pd
# 定义最终DataFrame的形状和列
df = pd.DataFrame({
"customerId":[],
"customerOrder":[],
"customerReference":[],
"creationDateTime":[],
"info1":[],
"info2":[],
"name1":[],
"name2":[],
"amount":[],
"code":[]
})
# 遍历字典并填充数据框
for k, order in data.items():
for line in order["orderLines"]:
entry = pd.DataFrame({
"customerId":[order["customerId"]],
"customerOrder":[order["customerOrder"]],
"customerReference":[order["customerReference"]],
"creationDateTime":[order["creationDateTime"]],
"info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
"info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
"name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
"name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
"amount":],
"code":]
})
df = pd.concat([df, entry])
print(df)
输出:
customerId customerOrder customerReference creationDateTime info1 \
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
info2 name1 name2 amount code
0 cds Jonathan Grulich 7.0 EUP
0 cds Jonathan Grulich 8.0 ENP
0 cds Jonathan Grulich 17.0 ERP
英文:
I would be very explicit in building such a DataFrame.
import pandas as pd
##Define the shape and columns of the final DataFrame
df = pd.DataFrame({
"customerId":[],
"customerOrder":[],
"customerReference":[],
"creationDateTime":[],
"info1":[],
"info2":[],
"name1":[],
"name2":[],
"amount":[],
"code":[]
})
#Iterate over the dict and populate the dataframe
for k, order in data.items():
for line in order["orderLines"]:
entry = pd.DataFrame({
"customerId":[order["customerId"]],
"customerOrder":[order["customerOrder"]],
"customerReference":[order["customerReference"]],
"creationDateTime":[order["creationDateTime"]],
"info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
"info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
"name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
"name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
"amount":],
"code":]
})
df = pd.concat([df, entry])
print(df)
Output:
customerId customerOrder customerReference creationDateTime info1 \
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
info2 name1 name2 amount code
0 cds Jonathan Grulich 7.0 EUP
0 cds Jonathan Grulich 8.0 ENP
0 cds Jonathan Grulich 17.0 ERP
答案2
得分: 0
我找到了这个解决方案:
import pandas as pd
data = {
"Order": {
"customerId":"877299" ,
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"Order": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich",
}},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
df = pd.json_normalize(
data['Order'],
record_path=['orderLines'],
meta=['customerId','customerReference',['orderDetail','AdditionalInfo','info2']]
)
print(df)
输出:
amount code customerId customerReference orderDetail.AdditionalInfo.info2
0 7 EUP 877299 reference2 cds
1 8 ENP 877299 reference2 cds
2 17 ERP 877299 reference2 cds
英文:
Also I found this solution:
import pandas as pd
data = {
"Order": {
"customerId":"877299" ,
"customerOrder": "155564649",
"customerReference": "reference2",
"creationDateTime": "2022-08-26T16:30:56.000Z",
"orderDetail": {
"AdditionalInfo": {
"Order": "abc",
"info2": "cds",
"name1": "Jonathan",
"name2": "Grulich",
}},
"orderLines": [
{
"amount": 7,
"code": "EUP"
},
{
"amount": 8,
"code": "ENP"
},
{
"amount": 17,
"code": "ERP"
}
]
}
}
df = pd.json_normalize(
data['Order'],
record_path=['orderLines'],
meta=['customerId','customerReference' ,['orderDetail','AdditionalInfo','info2']]
)
print(df)
Output:
amount code customerId customerReference orderDetail.AdditionalInfo.info2
0 7 EUP 877299 reference2 cds
1 8 ENP 877299 reference2 cds
2 17 ERP 877299 reference2 cds
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论