分割Pandas列表列

huangapple go评论108阅读模式
英文:

Split Column of list Panda

问题

我有一个带有这列的Pandas DataFrame:
这是从Mongo数据库中提取的,但我不知道如何处理同时包含[]和{}的列:

分割Pandas列表列

如何将这列拆分成两列?

期望结果:

分割Pandas列表列

谢谢你的帮助!

英文:

I have a Pandas DataFrame with this column:
This is an extraction from a database in Mongo but I don't know how to handle a column containing both [] and {}:

分割Pandas列表列

How can split this column into two columns?

Desired result:

分割Pandas列表列

Thanks for your help !

答案1

得分: 1

你可以创建一个字典的列表(而不是一个带有字典的列表),然后创建一个数据框并将其与原始数据框连接。

import pandas as pd

data = {"coeff":[[{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0652, "year":2005}]]}

df = pd.DataFrame(data)

df = df.join(pd.DataFrame([x[0] for x in df.coeff]))

这将帮助你实现所需的数据框连接操作。

英文:

You can create a list of dictionaries (instead of a list of lists with dictionaries), then a dataframe and join this to the original df.

import pandas as pd

data = {"coeff":[[{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0652, "year":2005}]]}

df = pd.DataFrame(data)
#                                coeff
# 0  [{'value': 0.0641, 'year': 2000}]
# 1  [{'value': 0.0641, 'year': 2000}]
# 2  [{'value': 0.0641, 'year': 2000}]
# 3  [{'value': 0.0652, 'year': 2005}]

df = df.join(pd.DataFrame([x[0] for x in df.coeff]))
#                                coeff   value  year
# 0  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
# 1  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
# 2  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
# 3  [{'value': 0.0652, 'year': 2005}]  0.0652  2005

答案2

得分: 1

pandas有一个从字典构建数据框的函数

import pandas as pd

my_data = {"coeff":[[{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0652, "year":2005}]]
           }

df = pd.DataFrame(my_data)

df2 = pd.DataFrame.from_records(d[0] for d in df['coeff'])

print(df2)

输出:

    value  year
0  0.0641  2000
1  0.0641  2000
2  0.0641  2000
3  0.0652  2005
英文:

pandas has a function to construct DF from dictionaries

import pandas as pd

my_data = {"coeff":[[{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0652, "year":2005}]]
           }

df = pd.DataFrame(my_data)

df2 = pd.DataFrame.from_records(d[0] for d in df['coeff'])

print(df2)

gives:

    value  year
0  0.0641  2000
1  0.0641  2000
2  0.0641  2000
3  0.0652  2005

答案3

得分: 0

# 使用`explode`和`json_normalize`结合:
out = pd.json_normalize(df['coeff'].explode())

# 或者,如果每个列表只有一个字典:
out = pd.json_normalize(df['coeff'].str[0])

# 或者使用`from_records`:
out = pd.DataFrame.from_records(df['coeff'].str[0])

# 输出:
"""
    value  year
0  0.0641  2000
1  0.0641  2000
2  0.0641  2000
3  0.0652  2005
"""
英文:

Combine explode and json_normalize:

out = pd.json_normalize(df['coeff'].explode())

Or, if you have only one dictionary per list:

out = pd.json_normalize(df['coeff'].str[0])

Or usig from_records:

out = pd.DataFrame.from_records(df['coeff'].str[0])

Output:

    value  year
0  0.0641  2000
1  0.0641  2000
2  0.0641  2000
3  0.0652  2005

答案4

得分: 0

创建一个基于你的基本数据的数据框(df):

data = {"coeff":[[{"value": 0.0641, "year":2000}],
             [{"value": 0.0641, "year":2000}],
             [{"value": 0.0641, "year":2000}],
             [{"value": 0.0652, "year":2005}]]}

每个df中的元素都是列表中的字典。使用apply方法和lambda函数来隔离列表中的字典的第一个元素。

使用.values()来获取字典的值(年份和数值),这将作为一个dtype为dict_values的对象存在。

dict_values的dtype相当受限制,所以将其包装在一个列表函数中以将其转换为列表,以便您可以使用切片和索引:

df2 = df.coeff.apply(lambda x: list(x[0].values()))

使用apply方法和lambda函数以及索引位置来获取年份和值,将它们分别分配给一个字典中的相应列名,并将其作为参数传递给pd.DataFrame类以创建一个新的数据框:

pd.DataFrame(data = {'year': df2.apply(lambda y: y[1]),
                 'value':df2.apply(lambda y: y[0])})
英文:

Create a df out of your base data:

data = {"coeff":[[{"value": 0.0641, "year":2000}],
             [{"value": 0.0641, "year":2000}],
             [{"value": 0.0641, "year":2000}],
             [{"value": 0.0652, "year":2005}]]}

分割Pandas列表列

Each element in the df is a dictionary within a list. Isolate the dictionary within the list using the apply method and a lambda function to access the first element in the list (the dictionary).

Use .values() to retrieve the dictionary values (year and value) which will exist as an object with dtype dict_values.

The dytpe of dict_values is pretty limiting so wrap it in a list function to convert to a list so you can use slicing and inxdexing:

df2 = df.coeff.apply(lambda x: list(x[0].values())) 

分割Pandas列表列

Use the apply method with a lambda function and index positions to retrieve the years and values respectively, assign these to their respective column names within a dictionary and pass this as an argument into the pd.DataFrame class to create a new dataframe:

pd.DataFrame(data = {'year': df2.apply(lambda y: y[1]),
                 'value':df2.apply(lambda y: y[0])})

分割Pandas列表列

huangapple
  • 本文由 发表于 2023年6月22日 00:26:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525373.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定