2023年3月1日 08:42:02go评论123阅读模式

英文:

Python flatten a dictionary column

问题

以下是翻译好的部分：

原始数据框：

Python中展开（扁平化）字典列

df['addresses'][0]
[{'addressLine1': '124 Main Street',
  'addressLine2': '',
  'addressLine3': '',
  'city': 'Portland',
  'region': 'ME',
  'postalCode': '04019',
  'country': 'USA'}]

test = pd.json_normalize(result['addresses'][0])
test

Python中展开（扁平化）字典列

到目前为止，一切都正常，但当我使用该函数并将其应用于整个列时，生成的数据框如下所示。

test = pd.json_normalize(result['addresses'])
test

Python中展开（扁平化）字典列

以下是一些列数据：

[[{'addressLine1': '124 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
  ...
  ...
]

英文:

It should be a simple line of code using pd.json_normalize function but it's working only with a single string and it's not batch processing my whole column

Orginial dataframe

df[&#39;addresses&#39;][0]
[{&#39;addressLine1&#39;: &#39;124 Main Street&#39;,
  &#39;addressLine2&#39;: &#39;&#39;,
  &#39;addressLine3&#39;: &#39;&#39;,
  &#39;city&#39;: &#39;Portland&#39;,
  &#39;region&#39;: &#39;ME&#39;,
  &#39;postalCode&#39;: &#39;04019&#39;,
  &#39;country&#39;: &#39;USA&#39;}]

test = pd.json_normalize(result[&#39;addresses&#39;][0])
test

Everything up to this point works, but when I use the function and apply to the whole column, the resulting dataframe turned out to look like this.

test = pd.json_normalize(result[&#39;addresses&#39;])
test

Here are some column data:

[[{&#39;addressLine1&#39;: &#39;124 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;1234 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Chattanooga&#39;,
   &#39;region&#39;: &#39;TN&#39;,
   &#39;postalCode&#39;: &#39;37402&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;1684151 Chair Street&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Notaplace&#39;,
   &#39;region&#39;: &#39;AL&#39;,
   &#39;postalCode&#39;: &#39;48835&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;136 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;22118&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;123452 HoneyDo LN&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;123 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;Apt 2B&#39;,
   &#39;addressLine3&#39;: &#39;Building B&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;123 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;Apt 2B&#39;,
   &#39;addressLine3&#39;: &#39;Building B&#39;,
   &#39;city&#39;: &#39;New York City&#39;,
   &#39;region&#39;: &#39;NY&#39;,
   &#39;postalCode&#39;: &#39;10001&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;123 Main Street&#39;,
   &#39;addressLine2&#39;: &#39;Apt 2B&#39;,
   &#39;addressLine3&#39;: &#39;Building B&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;4578 Shiver Me Timbers Road&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;Portland&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}],
 [{&#39;addressLine1&#39;: &#39;124 Main ST&#39;,
   &#39;addressLine2&#39;: &#39;&#39;,
   &#39;addressLine3&#39;: &#39;&#39;,
   &#39;city&#39;: &#39;PORTLAND&#39;,
   &#39;region&#39;: &#39;ME&#39;,
   &#39;postalCode&#39;: &#39;04019&#39;,
   &#39;country&#39;: &#39;USA&#39;}]]

答案1

得分: 1

如果我理解您的意思，您可以使用以下示例将您的数据帧 df 转换为包含 dict 数据的形式：

df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
print(df)

打印结果：

                  addressLine1 addressLine2 addressLine3           city region postalCode country
0              124 Main Street                                 Portland     ME      04019     USA
1             1234 Main Street                              Chattanooga     TN      37402     USA
2         1684151 Chair Street                                Notaplace     AL      48835     USA
3              136 Main Street                                 Portland     ME      22118     USA
4            123452 HoneyDo LN                                 Portland     ME      04019     USA
5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
9                  124 Main ST                                 PORTLAND     ME      04019     USA

英文:

If I understand you correctly, you can transform your dataframe df with dict data with following example:

df = pd.concat([df, df.pop(&#39;addresses&#39;).str[0].apply(pd.Series)], axis=1)
print(df)

Prints:

                  addressLine1 addressLine2 addressLine3           city region postalCode country
0              124 Main Street                                 Portland     ME      04019     USA
1             1234 Main Street                              Chattanooga     TN      37402     USA
2         1684151 Chair Street                                Notaplace     AL      48835     USA
3              136 Main Street                                 Portland     ME      22118     USA
4            123452 HoneyDo LN                                 Portland     ME      04019     USA
5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
9                  124 Main ST                                 PORTLAND     ME      04019     USA

答案2

得分: 1

以下是您要翻译的内容：

"It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

pd.json_normalize(result[&quot;addresses&quot;].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

pd.concat(
    [result.drop(column=&quot;addresses&quot;), pd.json_normalize(result[&quot;addresses&quot;].str[0])],
    axis=1
)"

英文:

It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize

pd.json_normalize([e[0] for e in address_list])

If the test data that you posted is actually a column then just use:

pd.json_normalize(result[&quot;addresses&quot;].str[0])

Or if you have other columns in addition to addresses in your result dataframe:

pd.concat(
    [result.drop(column=&quot;addresses&quot;), pd.json_normalize(result[&quot;addresses&quot;].str[0])],
    axis=1
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python中展开（扁平化）字典列

问题

答案1

答案2

通用类型在 Rocket 的路由处理程序中未找到。

使用字典映射列中的值。

替换列中的字符串部分

正在本地处理Azure表存储的批处理时遇到困扰。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论