Python中展开(扁平化)字典列

huangapple go评论123阅读模式
英文:

Python flatten a dictionary column

问题

以下是翻译好的部分:

原始数据框:

Python中展开(扁平化)字典列

  1. df['addresses'][0]
  2. [{'addressLine1': '124 Main Street',
  3. 'addressLine2': '',
  4. 'addressLine3': '',
  5. 'city': 'Portland',
  6. 'region': 'ME',
  7. 'postalCode': '04019',
  8. 'country': 'USA'}]
  1. test = pd.json_normalize(result['addresses'][0])
  2. test

Python中展开(扁平化)字典列

到目前为止,一切都正常,但当我使用该函数并将其应用于整个列时,生成的数据框如下所示。

  1. test = pd.json_normalize(result['addresses'])
  2. test

Python中展开(扁平化)字典列

以下是一些列数据:

  1. [[{'addressLine1': '124 Main Street',
  2. 'addressLine2': '',
  3. 'addressLine3': '',
  4. 'city': 'Portland',
  5. 'region': 'ME',
  6. 'postalCode': '04019',
  7. 'country': 'USA'}],
  8. ...
  9. ...
  10. ]
英文:

It should be a simple line of code using pd.json_normalize function but it's working only with a single string and it's not batch processing my whole column

Orginial dataframe

Python中展开(扁平化)字典列

  1. df['addresses'][0]
  2. [{'addressLine1': '124 Main Street',
  3. 'addressLine2': '',
  4. 'addressLine3': '',
  5. 'city': 'Portland',
  6. 'region': 'ME',
  7. 'postalCode': '04019',
  8. 'country': 'USA'}]
  1. test = pd.json_normalize(result['addresses'][0])
  2. test

Python中展开(扁平化)字典列

Everything up to this point works, but when I use the function and apply to the whole column, the resulting dataframe turned out to look like this.

  1. test = pd.json_normalize(result['addresses'])
  2. test

Python中展开(扁平化)字典列

Here are some column data:

  1. [[{'addressLine1': '124 Main Street',
  2. 'addressLine2': '',
  3. 'addressLine3': '',
  4. 'city': 'Portland',
  5. 'region': 'ME',
  6. 'postalCode': '04019',
  7. 'country': 'USA'}],
  8. [{'addressLine1': '1234 Main Street',
  9. 'addressLine2': '',
  10. 'addressLine3': '',
  11. 'city': 'Chattanooga',
  12. 'region': 'TN',
  13. 'postalCode': '37402',
  14. 'country': 'USA'}],
  15. [{'addressLine1': '1684151 Chair Street',
  16. 'addressLine2': '',
  17. 'addressLine3': '',
  18. 'city': 'Notaplace',
  19. 'region': 'AL',
  20. 'postalCode': '48835',
  21. 'country': 'USA'}],
  22. [{'addressLine1': '136 Main Street',
  23. 'addressLine2': '',
  24. 'addressLine3': '',
  25. 'city': 'Portland',
  26. 'region': 'ME',
  27. 'postalCode': '22118',
  28. 'country': 'USA'}],
  29. [{'addressLine1': '123452 HoneyDo LN',
  30. 'addressLine2': '',
  31. 'addressLine3': '',
  32. 'city': 'Portland',
  33. 'region': 'ME',
  34. 'postalCode': '04019',
  35. 'country': 'USA'}],
  36. [{'addressLine1': '123 Main Street',
  37. 'addressLine2': 'Apt 2B',
  38. 'addressLine3': 'Building B',
  39. 'city': 'Portland',
  40. 'region': 'ME',
  41. 'postalCode': '04019',
  42. 'country': 'USA'}],
  43. [{'addressLine1': '123 Main Street',
  44. 'addressLine2': 'Apt 2B',
  45. 'addressLine3': 'Building B',
  46. 'city': 'New York City',
  47. 'region': 'NY',
  48. 'postalCode': '10001',
  49. 'country': 'USA'}],
  50. [{'addressLine1': '123 Main Street',
  51. 'addressLine2': 'Apt 2B',
  52. 'addressLine3': 'Building B',
  53. 'city': 'Portland',
  54. 'region': 'ME',
  55. 'postalCode': '04019',
  56. 'country': 'USA'}],
  57. [{'addressLine1': '4578 Shiver Me Timbers Road',
  58. 'addressLine2': '',
  59. 'addressLine3': '',
  60. 'city': 'Portland',
  61. 'region': 'ME',
  62. 'postalCode': '04019',
  63. 'country': 'USA'}],
  64. [{'addressLine1': '124 Main ST',
  65. 'addressLine2': '',
  66. 'addressLine3': '',
  67. 'city': 'PORTLAND',
  68. 'region': 'ME',
  69. 'postalCode': '04019',
  70. 'country': 'USA'}]]

答案1

得分: 1

如果我理解您的意思,您可以使用以下示例将您的数据帧 df 转换为包含 dict 数据的形式:

  1. df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
  2. print(df)

打印结果:

  1. addressLine1 addressLine2 addressLine3 city region postalCode country
  2. 0 124 Main Street Portland ME 04019 USA
  3. 1 1234 Main Street Chattanooga TN 37402 USA
  4. 2 1684151 Chair Street Notaplace AL 48835 USA
  5. 3 136 Main Street Portland ME 22118 USA
  6. 4 123452 HoneyDo LN Portland ME 04019 USA
  7. 5 123 Main Street Apt 2B Building B Portland ME 04019 USA
  8. 6 123 Main Street Apt 2B Building B New York City NY 10001 USA
  9. 7 123 Main Street Apt 2B Building B Portland ME 04019 USA
  10. 8 4578 Shiver Me Timbers Road Portland ME 04019 USA
  11. 9 124 Main ST PORTLAND ME 04019 USA
英文:

If I understand you correctly, you can transform your dataframe df with dict data with following example:

  1. df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
  2. print(df)

Prints:

  1. addressLine1 addressLine2 addressLine3 city region postalCode country
  2. 0 124 Main Street Portland ME 04019 USA
  3. 1 1234 Main Street Chattanooga TN 37402 USA
  4. 2 1684151 Chair Street Notaplace AL 48835 USA
  5. 3 136 Main Street Portland ME 22118 USA
  6. 4 123452 HoneyDo LN Portland ME 04019 USA
  7. 5 123 Main Street Apt 2B Building B Portland ME 04019 USA
  8. 6 123 Main Street Apt 2B Building B New York City NY 10001 USA
  9. 7 123 Main Street Apt 2B Building B Portland ME 04019 USA
  10. 8 4578 Shiver Me Timbers Road Portland ME 04019 USA
  11. 9 124 Main ST PORTLAND ME 04019 USA

答案2

得分: 1

以下是您要翻译的内容:

"It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

  1. pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

  1. pd.json_normalize(result["addresses"].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

  1. pd.concat(
  2. [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
  3. axis=1
  4. )"
英文:

It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

  1. pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

  1. pd.json_normalize(result["addresses"].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

  1. pd.concat(
  2. [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
  3. axis=1
  4. )

huangapple
  • 本文由 发表于 2023年3月1日 08:42:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75598632.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定