How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

huangapple go评论92阅读模式
英文:

How to generate a flattened JSON with exact rows as one list of dict has for a CSV file from a JSON in Python?

问题

我从JSON生成CSV文件。如何获得与一个字典列表的确切行数相同的扁平JSON?

行中的数据几乎相同,除了一些列会有变化。

示例:我有

  1. {
  2. "transportOrder": {
  3. "customerId": "877299",
  4. "customerOrder": "155564649",
  5. "customerReference": "reference2",
  6. "creationDateTime": "2022-08-26T16:30:56.000Z",
  7. "orderDetail": {
  8. "AdditionalInfo": {
  9. "info1": "abc",
  10. "info2": "cds",
  11. "name1": "Jonathan",
  12. "name2": "Grulich"
  13. }
  14. },
  15. "orderLines": [
  16. {
  17. "amount": 7,
  18. "code": "EUP"
  19. },
  20. {
  21. "amount": 8,
  22. "code": "ENP"
  23. },
  24. {
  25. "amount": 17,
  26. "code": "ERP"
  27. }
  28. ]
  29. }
  30. }

我想要的结果如附图所示:

  1. customerId customerOrder customerReference creationDateTime info1 info2 name1 name2 amount code
  2. 877299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 7 EUP
  3. 838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 8 ENP
  4. 838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 17 ERP
  1. import flatdict
  2. data = {
  3. "Order": {
  4. "customerId": "877299",
  5. "customerOrder": "155564649",
  6. "customerReference": "reference2",
  7. "creationDateTime": "2022-08-26T16:30:56.000Z",
  8. "orderDetail": {
  9. "AdditionalInfo": {
  10. "info1": "abc",
  11. "info2": "cds",
  12. "name1": "Jonathan",
  13. "name2": "Grulich"
  14. }
  15. },
  16. "orderLines": [
  17. {
  18. "amount": 7,
  19. "code": "EUP"
  20. },
  21. {
  22. "amount": 8,
  23. "code": "ENP"
  24. },
  25. {
  26. "amount": 17,
  27. "code": "ERP"
  28. }
  29. ]
  30. }
  31. }
  32. flat = flatdict.FlatDict(data, delimiter='.')
  33. result_list = []
  34. j=0
  35. for line in flat['Order.orderLines']:
  36. temp = flat.copy()
  37. temp['amount'] = line['amount']
  38. temp['code'] = line['code']
  39. result_list.append(temp)
  40. print(result_list)

但我认为这不是最好的解决方案。还有什么其他方法?

英文:

I'm generating a CSV file from JSON.
How can I get flatter JSON with the exact rows as one list of dict has?

The data in rows will be almost the same, except for a few columns which will be variate.

Example: I have

  1. {
  2. "transportOrder": {
  3. "customerId":"877299" ,
  4. "customerOrder": "155564649",
  5. "customerReference": "reference2",
  6. "creationDateTime": "2022-08-26T16:30:56.000Z",
  7. "orderDetail": {
  8. "AdditionalInfo": {
  9. "info1": "abc",
  10. "info2": "cds",
  11. "name1": "Jonathan",
  12. "name2": "Grulich",
  13. }},
  14. "orderLines": [
  15. {
  16. "amount": 7,
  17. "code": "EUP"
  18. },
  19. {
  20. "amount": 8,
  21. "code": "ENP"
  22. },
  23. {
  24. "amount": 17,
  25. "code": "ERP"
  26. }
  27. ]
  28. }
  29. }

And I want to as attached in the image:

  1. customerId customerOrder customerReference creationDateTime info1 info2 name1 name2 amount code
  2. 877299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 7 EUP
  3. 838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 8 ENP
  4. 838299 155564649 reference2 26.08.2022 abc cds Jonathan Grulich 17 ERP
  1. import flatdict
  2. data = {
  3. "Order": {
  4. "customerId":"877299" ,
  5. "customerOrder": "155564649",
  6. "customerReference": "reference2",
  7. "creationDateTime": "2022-08-26T16:30:56.000Z",
  8. "orderDetail": {
  9. "AdditionalInfo": {
  10. "info1": "abc",
  11. "info2": "cds",
  12. "name1": "Jonathan",
  13. "name2": "Grulich",
  14. }},
  15. "orderLines": [
  16. {
  17. "amount": 7,
  18. "code": "EUP"
  19. },
  20. {
  21. "amount": 8,
  22. "code": "ENP"
  23. },
  24. {
  25. "amount": 17,
  26. "code": "ERP"
  27. }
  28. ]
  29. }
  30. }
  31. flat = flatdict.FlatDict(data, delimiter='.')
  32. result_list = []
  33. j=0
  34. for line in flat['Order.orderLines']:
  35. temp = flat
  36. temp['amount'] = line['amount']
  37. temp['code'] = line['code']
  38. result_list.append(temp)
  39. print(result_list)

But I think it is not the best solution. What could it be?

答案1

得分: 0

我会非常明确地构建这样一个DataFrame。

  1. import pandas as pd
  2. # 定义最终DataFrame的形状和列
  3. df = pd.DataFrame({
  4. "customerId":[],
  5. "customerOrder":[],
  6. "customerReference":[],
  7. "creationDateTime":[],
  8. "info1":[],
  9. "info2":[],
  10. "name1":[],
  11. "name2":[],
  12. "amount":[],
  13. "code":[]
  14. })
  15. # 遍历字典并填充数据框
  16. for k, order in data.items():
  17. for line in order["orderLines"]:
  18. entry = pd.DataFrame({
  19. "customerId":[order["customerId"]],
  20. "customerOrder":[order["customerOrder"]],
  21. "customerReference":[order["customerReference"]],
  22. "creationDateTime":[order["creationDateTime"]],
  23. "info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
  24. "info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
  25. "name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
  26. "name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
  27. "amount":
    ],
  28. "code":
    ]
  29. })
  30. df = pd.concat([df, entry])
  31. print(df)

输出:

  1. customerId customerOrder customerReference creationDateTime info1 \
  2. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  3. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  4. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  5. info2 name1 name2 amount code
  6. 0 cds Jonathan Grulich 7.0 EUP
  7. 0 cds Jonathan Grulich 8.0 ENP
  8. 0 cds Jonathan Grulich 17.0 ERP
英文:

I would be very explicit in building such a DataFrame.

  1. import pandas as pd
  2. ##Define the shape and columns of the final DataFrame
  3. df = pd.DataFrame({
  4. "customerId":[],
  5. "customerOrder":[],
  6. "customerReference":[],
  7. "creationDateTime":[],
  8. "info1":[],
  9. "info2":[],
  10. "name1":[],
  11. "name2":[],
  12. "amount":[],
  13. "code":[]
  14. })
  15. #Iterate over the dict and populate the dataframe
  16. for k, order in data.items():
  17. for line in order["orderLines"]:
  18. entry = pd.DataFrame({
  19. "customerId":[order["customerId"]],
  20. "customerOrder":[order["customerOrder"]],
  21. "customerReference":[order["customerReference"]],
  22. "creationDateTime":[order["creationDateTime"]],
  23. "info1":[order["orderDetail"]["AdditionalInfo"]["info1"]],
  24. "info2":[order["orderDetail"]["AdditionalInfo"]["info2"]],
  25. "name1":[order["orderDetail"]["AdditionalInfo"]["name1"]],
  26. "name2":[order["orderDetail"]["AdditionalInfo"]["name2"]],
  27. "amount":
    ],
  28. "code":
    ]
  29. })
  30. df = pd.concat([df, entry])
  31. print(df)

Output:

  1. customerId customerOrder customerReference creationDateTime info1 \
  2. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  3. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  4. 0 877299 155564649 reference2 2022-08-26T16:30:56.000Z abc
  5. info2 name1 name2 amount code
  6. 0 cds Jonathan Grulich 7.0 EUP
  7. 0 cds Jonathan Grulich 8.0 ENP
  8. 0 cds Jonathan Grulich 17.0 ERP

答案2

得分: 0

我找到了这个解决方案:

  1. import pandas as pd
  2. data = {
  3. "Order": {
  4. "customerId":"877299" ,
  5. "customerOrder": "155564649",
  6. "customerReference": "reference2",
  7. "creationDateTime": "2022-08-26T16:30:56.000Z",
  8. "orderDetail": {
  9. "AdditionalInfo": {
  10. "Order": "abc",
  11. "info2": "cds",
  12. "name1": "Jonathan",
  13. "name2": "Grulich",
  14. }},
  15. "orderLines": [
  16. {
  17. "amount": 7,
  18. "code": "EUP"
  19. },
  20. {
  21. "amount": 8,
  22. "code": "ENP"
  23. },
  24. {
  25. "amount": 17,
  26. "code": "ERP"
  27. }
  28. ]
  29. }
  30. }
  31. df = pd.json_normalize(
  32. data['Order'],
  33. record_path=['orderLines'],
  34. meta=['customerId','customerReference',['orderDetail','AdditionalInfo','info2']]
  35. )
  36. print(df)

输出:

  1. amount code customerId customerReference orderDetail.AdditionalInfo.info2
  2. 0 7 EUP 877299 reference2 cds
  3. 1 8 ENP 877299 reference2 cds
  4. 2 17 ERP 877299 reference2 cds
英文:

Also I found this solution:

  1. import pandas as pd
  2. data = {
  3. "Order": {
  4. "customerId":"877299" ,
  5. "customerOrder": "155564649",
  6. "customerReference": "reference2",
  7. "creationDateTime": "2022-08-26T16:30:56.000Z",
  8. "orderDetail": {
  9. "AdditionalInfo": {
  10. "Order": "abc",
  11. "info2": "cds",
  12. "name1": "Jonathan",
  13. "name2": "Grulich",
  14. }},
  15. "orderLines": [
  16. {
  17. "amount": 7,
  18. "code": "EUP"
  19. },
  20. {
  21. "amount": 8,
  22. "code": "ENP"
  23. },
  24. {
  25. "amount": 17,
  26. "code": "ERP"
  27. }
  28. ]
  29. }
  30. }
  31. df = pd.json_normalize(
  32. data['Order'],
  33. record_path=['orderLines'],
  34. meta=['customerId','customerReference' ,['orderDetail','AdditionalInfo','info2']]
  35. )
  36. print(df)

Output:

  1. amount code customerId customerReference orderDetail.AdditionalInfo.info2
  2. 0 7 EUP 877299 reference2 cds
  3. 1 8 ENP 877299 reference2 cds
  4. 2 17 ERP 877299 reference2 cds

huangapple
  • 本文由 发表于 2023年6月5日 16:32:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76404699.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定