Python中展开(扁平化)字典列

huangapple go评论74阅读模式
英文:

Python flatten a dictionary column

问题

以下是翻译好的部分:

原始数据框:

Python中展开(扁平化)字典列

df['addresses'][0]

[{'addressLine1': '124 Main Street',
  'addressLine2': '',
  'addressLine3': '',
  'city': 'Portland',
  'region': 'ME',
  'postalCode': '04019',
  'country': 'USA'}]
test = pd.json_normalize(result['addresses'][0])
test

Python中展开(扁平化)字典列

到目前为止,一切都正常,但当我使用该函数并将其应用于整个列时,生成的数据框如下所示。

test = pd.json_normalize(result['addresses'])
test

Python中展开(扁平化)字典列

以下是一些列数据:

[[{'addressLine1': '124 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
  ...
  ...
]
英文:

It should be a simple line of code using pd.json_normalize function but it's working only with a single string and it's not batch processing my whole column

Orginial dataframe

Python中展开(扁平化)字典列

df['addresses'][0]

[{'addressLine1': '124 Main Street',
  'addressLine2': '',
  'addressLine3': '',
  'city': 'Portland',
  'region': 'ME',
  'postalCode': '04019',
  'country': 'USA'}]
test = pd.json_normalize(result['addresses'][0])
test

Python中展开(扁平化)字典列

Everything up to this point works, but when I use the function and apply to the whole column, the resulting dataframe turned out to look like this.

test = pd.json_normalize(result['addresses'])
test

Python中展开(扁平化)字典列

Here are some column data:

[[{'addressLine1': '124 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '1234 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Chattanooga',
   'region': 'TN',
   'postalCode': '37402',
   'country': 'USA'}],
 [{'addressLine1': '1684151 Chair Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Notaplace',
   'region': 'AL',
   'postalCode': '48835',
   'country': 'USA'}],
 [{'addressLine1': '136 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '22118',
   'country': 'USA'}],
 [{'addressLine1': '123452 HoneyDo LN',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'New York City',
   'region': 'NY',
   'postalCode': '10001',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '4578 Shiver Me Timbers Road',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '124 Main ST',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'PORTLAND',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}]]

答案1

得分: 1

如果我理解您的意思,您可以使用以下示例将您的数据帧 df 转换为包含 dict 数据的形式:

df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
print(df)

打印结果:

                  addressLine1 addressLine2 addressLine3           city region postalCode country
0              124 Main Street                                 Portland     ME      04019     USA
1             1234 Main Street                              Chattanooga     TN      37402     USA
2         1684151 Chair Street                                Notaplace     AL      48835     USA
3              136 Main Street                                 Portland     ME      22118     USA
4            123452 HoneyDo LN                                 Portland     ME      04019     USA
5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
9                  124 Main ST                                 PORTLAND     ME      04019     USA
英文:

If I understand you correctly, you can transform your dataframe df with dict data with following example:

df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
print(df)

Prints:

                  addressLine1 addressLine2 addressLine3           city region postalCode country
0              124 Main Street                                 Portland     ME      04019     USA
1             1234 Main Street                              Chattanooga     TN      37402     USA
2         1684151 Chair Street                                Notaplace     AL      48835     USA
3              136 Main Street                                 Portland     ME      22118     USA
4            123452 HoneyDo LN                                 Portland     ME      04019     USA
5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
9                  124 Main ST                                 PORTLAND     ME      04019     USA

答案2

得分: 1

以下是您要翻译的内容:

"It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

pd.json_normalize(result["addresses"].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

pd.concat(
    [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
    axis=1
)"
英文:

It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

pd.json_normalize(result["addresses"].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

pd.concat(
    [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
    axis=1
)

huangapple
  • 本文由 发表于 2023年3月1日 08:42:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75598632.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定