How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

huangapple go评论62阅读模式
英文:

How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

问题

我从JSON生成CSV文件。如何获得与一个字典列表的确切行数相同的扁平JSON?

行中的数据几乎相同,除了一些列会有变化。

示例:我有

{
  "transportOrder": {
    "customerId": "877299",
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich"
      }
    },
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

我想要的结果如附图所示:

customerId    customerOrder    customerReference    creationDateTime    info1    info2    name1    name2    amount    code
877299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    7    EUP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    8    ENP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    17    ERP
import flatdict

data = {
  "Order": {
    "customerId": "877299",
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich"
      }
    },
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

flat = flatdict.FlatDict(data, delimiter='.')

result_list = []
j=0

for line in flat['Order.orderLines']:
    temp = flat.copy()
    temp['amount'] = line['amount']
    temp['code'] = line['code']
    result_list.append(temp)

print(result_list)

但我认为这不是最好的解决方案。还有什么其他方法?

英文:

I'm generating a CSV file from JSON.
How can I get flatter JSON with the exact rows as one list of dict has?

The data in rows will be almost the same, except for a few columns which will be variate.

Example: I have

{
  "transportOrder": {
    "customerId":"877299" ,
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich",
      }},
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

And I want to as attached in the image:

customerId    customerOrder    customerReference    creationDateTime    info1    info2    name1    name2    amount    code
877299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    7    EUP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    8    ENP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    17    ERP
import flatdict

data = {
  "Order": {
    "customerId":"877299" ,
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich",
      }},
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}
flat = flatdict.FlatDict(data, delimiter='.')

result_list = []
j=0

for line in flat['Order.orderLines']:
    temp = flat
    temp['amount'] = line['amount']
    temp['code'] = line['code']
    result_list.append(temp)

print(result_list)

But I think it is not the best solution. What could it be?

答案1

得分: 0

我会非常明确地构建这样一个DataFrame。

import pandas as pd

# 定义最终DataFrame的形状和列
df = pd.DataFrame({
    "customerId":[],
    "customerOrder":[],
    "customerReference":[],
    "creationDateTime":[],
    "info1":[],
    "info2":[],
    "name1":[],
    "name2":[],
    "amount":[],
    "code":[]
})

# 遍历字典并填充数据框
for k, order in data.items():
    for line in order["orderLines"]:
        entry = pd.DataFrame({
            "customerId":[order["customerId"]],
            "customerOrder":[order["customerOrder"]],
            "customerReference":[order["customerReference"]],
            "creationDateTime":[order["creationDateTime"]],
            "info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
            "info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
            "name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
            "name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
            "amount":
],
"code":
]
}) df = pd.concat([df, entry]) print(df)

输出:

  customerId customerOrder customerReference          creationDateTime info1  \
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc

  info2     name1    name2  amount code
0   cds  Jonathan  Grulich     7.0  EUP
0   cds  Jonathan  Grulich     8.0  ENP
0   cds  Jonathan  Grulich    17.0  ERP
英文:

I would be very explicit in building such a DataFrame.

import pandas as pd

##Define the shape and columns of the final DataFrame
df = pd.DataFrame({
    "customerId":[],
    "customerOrder":[],
    "customerReference":[],
    "creationDateTime":[],
    "info1":[],
    "info2":[],
    "name1":[],
    "name2":[],
    "amount":[],
    "code":[]
})

#Iterate over the dict and populate the dataframe
for k, order in data.items():
    for line in order["orderLines"]:
        entry = pd.DataFrame({
            "customerId":[order["customerId"]],
            "customerOrder":[order["customerOrder"]],
            "customerReference":[order["customerReference"]],
            "creationDateTime":[order["creationDateTime"]],
            "info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
            "info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
            "name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
            "name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
            "amount":
], "code":
] }) df = pd.concat([df, entry]) print(df)

Output:

  customerId customerOrder customerReference          creationDateTime info1  \
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc

  info2     name1    name2  amount code
0   cds  Jonathan  Grulich     7.0  EUP
0   cds  Jonathan  Grulich     8.0  ENP
0   cds  Jonathan  Grulich    17.0  ERP

答案2

得分: 0

我找到了这个解决方案:

import pandas as pd
data = {
  "Order": {
    "customerId":"877299" ,
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "Order": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich",
      }},
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

df = pd.json_normalize(
    data['Order'],
    record_path=['orderLines'],
    meta=['customerId','customerReference',['orderDetail','AdditionalInfo','info2']]
)

print(df)

输出:

   amount code customerId customerReference orderDetail.AdditionalInfo.info2
0       7  EUP     877299        reference2                              cds
1       8  ENP     877299        reference2                              cds
2      17  ERP     877299        reference2                              cds
英文:

Also I found this solution:

import pandas as pd
data = {
  "Order": {
    "customerId":"877299" ,
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "Order": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich",
      }},
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

df = pd.json_normalize(
    data['Order'],
    record_path=['orderLines'],
    meta=['customerId','customerReference' ,['orderDetail','AdditionalInfo','info2']]
)


print(df)

Output:

 amount code customerId customerReference orderDetail.AdditionalInfo.info2
0       7  EUP     877299        reference2                              cds
1       8  ENP     877299        reference2                              cds
2      17  ERP     877299        reference2                              cds

huangapple
  • 本文由 发表于 2023年6月5日 16:32:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76404699.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定