2023年6月5日 16:32:24go评论92阅读模式

英文:

How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

问题

我从JSON生成CSV文件。如何获得与一个字典列表的确切行数相同的扁平JSON？

行中的数据几乎相同，除了一些列会有变化。

示例：我有

{
  "transportOrder": {
    "customerId": "877299",
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich"
      }
    },
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}

我想要的结果如附图所示：

customerId    customerOrder    customerReference    creationDateTime    info1    info2    name1    name2    amount    code
877299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    7    EUP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    8    ENP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    17    ERP

import flatdict
data = {
  "Order": {
    "customerId": "877299",
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "info1": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich"
      }
    },
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}
flat = flatdict.FlatDict(data, delimiter='.')
result_list = []
j=0
for line in flat['Order.orderLines']:
    temp = flat.copy()
    temp['amount'] = line['amount']
    temp['code'] = line['code']
    result_list.append(temp)
print(result_list)

但我认为这不是最好的解决方案。还有什么其他方法？

英文:

I'm generating a CSV file from JSON.
How can I get flatter JSON with the exact rows as one list of dict has?

The data in rows will be almost the same, except for a few columns which will be variate.

Example: I have

{
  &quot;transportOrder&quot;: {
    &quot;customerId&quot;:&quot;877299&quot; ,
    &quot;customerOrder&quot;: &quot;155564649&quot;,
    &quot;customerReference&quot;: &quot;reference2&quot;,
    &quot;creationDateTime&quot;: &quot;2022-08-26T16:30:56.000Z&quot;,
    &quot;orderDetail&quot;: {
      &quot;AdditionalInfo&quot;: {
        &quot;info1&quot;: &quot;abc&quot;,
        &quot;info2&quot;: &quot;cds&quot;,
        &quot;name1&quot;: &quot;Jonathan&quot;,
        &quot;name2&quot;: &quot;Grulich&quot;,
      }},
    &quot;orderLines&quot;: [
      {
        &quot;amount&quot;: 7,
        &quot;code&quot;: &quot;EUP&quot;
      },
      {
        &quot;amount&quot;: 8,
        &quot;code&quot;: &quot;ENP&quot;
      },
      {
        &quot;amount&quot;: 17,
        &quot;code&quot;: &quot;ERP&quot;
      }
    ]
  }
}

And I want to as attached in the image:

customerId    customerOrder    customerReference    creationDateTime    info1    info2    name1    name2    amount    code
877299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    7    EUP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    8    ENP
838299    155564649    reference2    26.08.2022    abc    cds    Jonathan    Grulich    17    ERP

import flatdict
data = {
  &quot;Order&quot;: {
    &quot;customerId&quot;:&quot;877299&quot; ,
    &quot;customerOrder&quot;: &quot;155564649&quot;,
    &quot;customerReference&quot;: &quot;reference2&quot;,
    &quot;creationDateTime&quot;: &quot;2022-08-26T16:30:56.000Z&quot;,
    &quot;orderDetail&quot;: {
      &quot;AdditionalInfo&quot;: {
        &quot;info1&quot;: &quot;abc&quot;,
        &quot;info2&quot;: &quot;cds&quot;,
        &quot;name1&quot;: &quot;Jonathan&quot;,
        &quot;name2&quot;: &quot;Grulich&quot;,
      }},
    &quot;orderLines&quot;: [
      {
        &quot;amount&quot;: 7,
        &quot;code&quot;: &quot;EUP&quot;
      },
      {
        &quot;amount&quot;: 8,
        &quot;code&quot;: &quot;ENP&quot;
      },
      {
        &quot;amount&quot;: 17,
        &quot;code&quot;: &quot;ERP&quot;
      }
    ]
  }
}
flat = flatdict.FlatDict(data, delimiter=&#39;.&#39;)
result_list = []
j=0
for line in flat[&#39;Order.orderLines&#39;]:
    temp = flat
    temp[&#39;amount&#39;] = line[&#39;amount&#39;]
    temp[&#39;code&#39;] = line[&#39;code&#39;]
    result_list.append(temp)
print(result_list)

But I think it is not the best solution. What could it be?

答案1

得分: 0

我会非常明确地构建这样一个DataFrame。

import pandas as pd
# 定义最终DataFrame的形状和列
df = pd.DataFrame({
    "customerId":[],
    "customerOrder":[],
    "customerReference":[],
    "creationDateTime":[],
    "info1":[],
    "info2":[],
    "name1":[],
    "name2":[],
    "amount":[],
    "code":[]
})
# 遍历字典并填充数据框
for k, order in data.items():
    for line in order["orderLines"]:
        entry = pd.DataFrame({
            "customerId":[order["customerId"]],
            "customerOrder":[order["customerOrder"]],
            "customerReference":[order["customerReference"]],
            "creationDateTime":[order["creationDateTime"]],
            "info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
            "info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
            "name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
            "name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
            "amount":
],
            "code":
]
        })
        df = pd.concat([df, entry])
print(df)

输出：

  customerId customerOrder customerReference          creationDateTime info1  \
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
  info2     name1    name2  amount code
0   cds  Jonathan  Grulich     7.0  EUP
0   cds  Jonathan  Grulich     8.0  ENP
0   cds  Jonathan  Grulich    17.0  ERP

英文:

I would be very explicit in building such a DataFrame.

import pandas as pd
##Define the shape and columns of the final DataFrame
df = pd.DataFrame({
    &quot;customerId&quot;:[],
    &quot;customerOrder&quot;:[],
    &quot;customerReference&quot;:[],
    &quot;creationDateTime&quot;:[],
    &quot;info1&quot;:[],
    &quot;info2&quot;:[],
    &quot;name1&quot;:[],
    &quot;name2&quot;:[],
    &quot;amount&quot;:[],
    &quot;code&quot;:[]
})
#Iterate over the dict and populate the dataframe
for k, order in data.items():
    for line in order[&quot;orderLines&quot;]:
        entry = pd.DataFrame({
            &quot;customerId&quot;:[order[&quot;customerId&quot;]],
            &quot;customerOrder&quot;:[order[&quot;customerOrder&quot;]],
            &quot;customerReference&quot;:[order[&quot;customerReference&quot;]],
            &quot;creationDateTime&quot;:[order[&quot;creationDateTime&quot;]],
            &quot;info1&quot;:[order[&quot;orderDetail&quot;][&quot;AdditionalInfo&quot;][&quot;info1&quot;]],
            &quot;info2&quot;:[order[&quot;orderDetail&quot;][&quot;AdditionalInfo&quot;][&quot;info2&quot;]],
            &quot;name1&quot;:[order[&quot;orderDetail&quot;][&quot;AdditionalInfo&quot;][&quot;name1&quot;]],
            &quot;name2&quot;:[order[&quot;orderDetail&quot;][&quot;AdditionalInfo&quot;][&quot;name2&quot;]],
            &quot;amount&quot;:
],
            &quot;code&quot;:
]
        })
        df = pd.concat([df, entry])
print(df)

Output:

  customerId customerOrder customerReference          creationDateTime info1  \
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
0     877299     155564649        reference2  2022-08-26T16:30:56.000Z   abc
  info2     name1    name2  amount code
0   cds  Jonathan  Grulich     7.0  EUP
0   cds  Jonathan  Grulich     8.0  ENP
0   cds  Jonathan  Grulich    17.0  ERP

答案2

得分: 0

我找到了这个解决方案：

import pandas as pd
data = {
  "Order": {
    "customerId":"877299" ,
    "customerOrder": "155564649",
    "customerReference": "reference2",
    "creationDateTime": "2022-08-26T16:30:56.000Z",
    "orderDetail": {
      "AdditionalInfo": {
        "Order": "abc",
        "info2": "cds",
        "name1": "Jonathan",
        "name2": "Grulich",
      }},
    "orderLines": [
      {
        "amount": 7,
        "code": "EUP"
      },
      {
        "amount": 8,
        "code": "ENP"
      },
      {
        "amount": 17,
        "code": "ERP"
      }
    ]
  }
}
df = pd.json_normalize(
    data['Order'],
    record_path=['orderLines'],
    meta=['customerId','customerReference',['orderDetail','AdditionalInfo','info2']]
)
print(df)

输出：

   amount code customerId customerReference orderDetail.AdditionalInfo.info2
0       7  EUP     877299        reference2                              cds
1       8  ENP     877299        reference2                              cds
2      17  ERP     877299        reference2                              cds

英文:

Also I found this solution:

import pandas as pd
data = {
  &quot;Order&quot;: {
    &quot;customerId&quot;:&quot;877299&quot; ,
    &quot;customerOrder&quot;: &quot;155564649&quot;,
    &quot;customerReference&quot;: &quot;reference2&quot;,
    &quot;creationDateTime&quot;: &quot;2022-08-26T16:30:56.000Z&quot;,
    &quot;orderDetail&quot;: {
      &quot;AdditionalInfo&quot;: {
        &quot;Order&quot;: &quot;abc&quot;,
        &quot;info2&quot;: &quot;cds&quot;,
        &quot;name1&quot;: &quot;Jonathan&quot;,
        &quot;name2&quot;: &quot;Grulich&quot;,
      }},
    &quot;orderLines&quot;: [
      {
        &quot;amount&quot;: 7,
        &quot;code&quot;: &quot;EUP&quot;
      },
      {
        &quot;amount&quot;: 8,
        &quot;code&quot;: &quot;ENP&quot;
      },
      {
        &quot;amount&quot;: 17,
        &quot;code&quot;: &quot;ERP&quot;
      }
    ]
  }
}
df = pd.json_normalize(
    data[&#39;Order&#39;],
    record_path=[&#39;orderLines&#39;],
    meta=[&#39;customerId&#39;,&#39;customerReference&#39; ,[&#39;orderDetail&#39;,&#39;AdditionalInfo&#39;,&#39;info2&#39;]]
)
print(df)

Output:

 amount code customerId customerReference orderDetail.AdditionalInfo.info2
0       7  EUP     877299        reference2                              cds
1       8  ENP     877299        reference2                              cds
2      17  ERP     877299        reference2                              cds

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

问题

答案1

答案2

在Python中跨继承使用类变量

正则表达式来查找源和目标

如何在pandas中合并交叉表的类别，其中一些类别是共同的？

如何使用 Java 字符串方法返回所选数量的字符？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。