Pandas从嵌套记录列表创建多级索引

huangapple go评论61阅读模式
英文:

Pandas multiindex from list of nested records

问题

我有一些数据,结构如下:

data = [
    {"name": "Jack", "last_name": "Black", "sizes": {"shoes": 43, "waist": 48, "chest": 52}},
    {"name": "Mario", "last_name": "Green", "sizes": {"shoes": 42, "waist": 53, "chest": 63}}
]

如何轻松地获得一个类似这样的数据框:

  name  last_name sizes
  name  last_name shoes waist chest
Jack  Black      43     48    52
Mario Green      42     53    63

我知道我可以使用

pd.json_normalize(data)

但它不完全相同。

如果数据是这样的记录字典:

data = {
    12345: {"name": "Jack", "last_name": "Black", "sizes": {"shoes": 43, "waist": 48, "chest": 52}},
    78910: {"name": "Mario", "last_name": "Green", "sizes": {"shoes": 42, "waist": 53, "chest": 63}}
}

我想要获得:

      name  last_name sizes
      name  last_name shoes waist chest
12345 Jack  Black      43     48    52
78910 Mario Green      42     53    63

非常感谢!

英文:

I have some data in a structure like this:

data = [
        {"name": "Jack", "last_name": "Black", "sizes": {"shoes": 43, "waist": 48, "chest":52}},
        {"name": "Mario", "last_name": "Green", "sizes": {"shoes": 42, "waist": 53, "chest":63}}
]

how can i get a dataframe that looks like this easily?:

name  last_name sizes
name  last_name shoes waist chest
Jack  Black      43     48    52
Mario Green      42     53    63

i know that i can use

pd.json_normalize(data)

but it's not exactly the same

and how could i do it if the data was a dict of records like this:

data = {
        12345: {"name": "Jack", "last_name": "Black", "sizes": {"shoes": 43, "waist": 48, "chest":52}},
        78910: {"name": "Mario", "last_name": "Green", "sizes": {"shoes": 42, "waist": 53, "chest":63}}
}

and i wanted to get:

      name  last_name sizes
      name  last_name shoes waist chest
12345 Jack  Black      43     48    52
78910 Mario Green      42     53    63

Many thanks

答案1

得分: 1

以下是翻译好的部分:

创建一个包含 JSON 格式数据的数据框,然后将索引设置为名字和姓氏,然后拆分剩余的列以转换为多重索引。

import pandas as pd

df = pd.json_normalize(data)
df = df.set_index(['name', 'last_name'])
df.columns = df.columns.str.split('.', expand=True)
                    sizes            
                    shoes waist chest
name  last_name                   
Jack  Black        43    48    52
Mario Green        42    53    63
英文:

Create a dataframe with json normalize then set the index to first and last name now split the remaining columns to convert to multiindex

df = pd.json_normalize(data)
df = df.set_index(['name', 'last_name'])
df.columns = df.columns.str.split('.', expand=True)

                sizes            
                shoes waist chest
name  last_name                  
Jack  Black        43    48    52
Mario Green        42    53    63

huangapple
  • 本文由 发表于 2023年7月27日 22:56:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781003.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定