在Python中规范化嵌套的JSON并将其转换为Pandas数据框。

huangapple go评论100阅读模式
英文:

Normalizing a Nested JSON in Python and Converting it to a Pandas Dataframe

问题

我已经创建了一个更简化的一些JSON数据版本,我一直在处理这里:

  1. [
  2. {
  3. "id": 1,
  4. "city": "Philadelphia",
  5. "Retaillocations": { "subLocation": [
  6. {
  7. "address": "1235 Passyunk Ave",
  8. "district": "South"
  9. },
  10. {
  11. "address": "900 Market St",
  12. "district": "Center City"
  13. },
  14. {
  15. "address": "2300 Roosevelt Blvd",
  16. "district": "North"
  17. }
  18. ]
  19. },
  20. "distributionLocations": {"subLocation": [{
  21. "address": "3000 Broad St",
  22. "district": "North"
  23. },
  24. {
  25. "address": "3000 Essington Blvd",
  26. "district": "Cargo City"
  27. },
  28. {
  29. "address": "4300 City Ave",
  30. "district": "West"
  31. }
  32. ]
  33. }
  34. }
  35. ]

我的目标是将其规范化为一个数据帧(是的,上面的JSON只会创建一行,但我希望掌握步骤,然后将其概括为一个更大的集合)。

首先,我使用jsob_obj = json.loads(inputData)加载文件,将其转换为字典。问题是,其中一些字典可能包含列表,并且嵌套得很奇怪,如上所示。我尝试使用pd.json_normalize(json_obj, record_path='retailLocations'),但会出现类型错误,提示列表索引必须是整数或切片,而不是字符串。如何处理上面的JSON文件并将其转换为pandas数据帧中的单个记录?

英文:

I have created a simpler version of some JSON data I've been working with here:

  1. [
  2. {
  3. "id": 1,
  4. "city": "Philadelphia",
  5. "Retaillocations": { "subLocation": [
  6. {
  7. "address": "1235 Passyunk Ave",
  8. "district": "South"
  9. },
  10. {
  11. "address": "900 Market St",
  12. "district": "Center City"
  13. },
  14. {
  15. "address": "2300 Roosevelt Blvd",
  16. "district": "North"
  17. }
  18. ]
  19. },
  20. "distributionLocations": {"subLocation": [{
  21. "address": "3000 Broad St",
  22. "district": "North"
  23. },
  24. {
  25. "address": "3000 Essington Blvd",
  26. "district": "Cargo City"
  27. },
  28. {
  29. "address": "4300 City Ave",
  30. "district": "West"
  31. }
  32. ]
  33. }
  34. }
  35. ]

My goal is to normalize this into a data frame (yes, the above json will only create one row, but I am hoping to get the steps down and then generalize it to a larger set).

First, I loaded the file with jsob_obj = json.loads(inputData) which turns this into a dictionary. The problem is that some of the dictionaries can have lists and are nested oddly as shown above. I've tried using pd.json_normalize(json_obj, record_path = 'retailLocations'), I get a type error saying that list indices must be integers or slices, not str. How can I handle the above JSON file and convert it into a single record in a pandas data frame?

答案1

得分: 1

使用.json_normalize()函数来展平数据,猜测所需的输出:

  1. retail = pd.json_normalize(
  2. data=jsob_obj,
  3. meta=["id", "city"],
  4. record_path=["Retaillocations", "subLocation"]
  5. ).assign(source="retail")
  6. distribution = pd.json_normalize(
  7. data=jsob_obj,
  8. meta=["id", "city"],
  9. record_path=["distributionLocations", "subLocation"]
  10. ).assign(source="distribution")
  11. final = pd.concat([retail, distribution]).reset_index(drop=True)
  12. print(final)

输出:

  1. address district id city source
  2. 0 1235 Passyunk Ave South 1 Philadelphia retail
  3. 1 900 Market St Center City 1 Philadelphia retail
  4. 2 2300 Roosevelt Blvd North 1 Philadelphia retail
  5. 3 3000 Broad St North 1 Philadelphia distribution
  6. 4 3000 Essington Blvd Cargo City 1 Philadelphia distribution
  7. 5 4300 City Ave West 1 Philadelphia distribution
英文:

Guessing on the desired output, using .json_normalize() to flatten:

  1. retail = pd.json_normalize(
  2. data=jsob_obj,
  3. meta=["id", "city"],
  4. record_path=["Retaillocations", "subLocation"]
  5. ).assign(source="retail")
  6. distribution = pd.json_normalize(
  7. data=jsob_obj,
  8. meta=["id", "city"],
  9. record_path=["distributionLocations", "subLocation"]
  10. ).assign(source="distribution")
  11. final = pd.concat([retail, distribution]).reset_index(drop=True)
  12. print(final)

Output:

  1. address district id city source
  2. 0 1235 Passyunk Ave South 1 Philadelphia retail
  3. 1 900 Market St Center City 1 Philadelphia retail
  4. 2 2300 Roosevelt Blvd North 1 Philadelphia retail
  5. 3 3000 Broad St North 1 Philadelphia distribution
  6. 4 3000 Essington Blvd Cargo City 1 Philadelphia distribution
  7. 5 4300 City Ave West 1 Philadelphia distribution

huangapple
  • 本文由 发表于 2023年2月23日 22:46:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546385.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定