英文:
Split Column of list Panda
问题
我有一个带有这列的Pandas DataFrame:
这是从Mongo数据库中提取的,但我不知道如何处理同时包含[]和{}的列:
如何将这列拆分成两列?
期望结果:
谢谢你的帮助!
英文:
I have a Pandas DataFrame with this column:
This is an extraction from a database in Mongo but I don't know how to handle a column containing both [] and {}:
How can split this column into two columns?
Desired result:
Thanks for your help !
答案1
得分: 1
你可以创建一个字典的列表(而不是一个带有字典的列表),然后创建一个数据框并将其与原始数据框连接。
import pandas as pd
data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]}
df = pd.DataFrame(data)
df = df.join(pd.DataFrame([x[0] for x in df.coeff]))
这将帮助你实现所需的数据框连接操作。
英文:
You can create a list of dictionaries (instead of a list of lists with dictionaries), then a dataframe and join this to the original df.
import pandas as pd
data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]}
df = pd.DataFrame(data)
# coeff
# 0 [{'value': 0.0641, 'year': 2000}]
# 1 [{'value': 0.0641, 'year': 2000}]
# 2 [{'value': 0.0641, 'year': 2000}]
# 3 [{'value': 0.0652, 'year': 2005}]
df = df.join(pd.DataFrame([x[0] for x in df.coeff]))
# coeff value year
# 0 [{'value': 0.0641, 'year': 2000}] 0.0641 2000
# 1 [{'value': 0.0641, 'year': 2000}] 0.0641 2000
# 2 [{'value': 0.0641, 'year': 2000}] 0.0641 2000
# 3 [{'value': 0.0652, 'year': 2005}] 0.0652 2005
答案2
得分: 1
pandas有一个从字典构建数据框的函数
import pandas as pd
my_data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]
}
df = pd.DataFrame(my_data)
df2 = pd.DataFrame.from_records(d[0] for d in df['coeff'])
print(df2)
输出:
value year
0 0.0641 2000
1 0.0641 2000
2 0.0641 2000
3 0.0652 2005
英文:
pandas has a function to construct DF from dictionaries
import pandas as pd
my_data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]
}
df = pd.DataFrame(my_data)
df2 = pd.DataFrame.from_records(d[0] for d in df['coeff'])
print(df2)
gives:
value year
0 0.0641 2000
1 0.0641 2000
2 0.0641 2000
3 0.0652 2005
答案3
得分: 0
# 使用`explode`和`json_normalize`结合:
out = pd.json_normalize(df['coeff'].explode())
# 或者,如果每个列表只有一个字典:
out = pd.json_normalize(df['coeff'].str[0])
# 或者使用`from_records`:
out = pd.DataFrame.from_records(df['coeff'].str[0])
# 输出:
"""
value year
0 0.0641 2000
1 0.0641 2000
2 0.0641 2000
3 0.0652 2005
"""
英文:
Combine explode
and json_normalize
:
out = pd.json_normalize(df['coeff'].explode())
Or, if you have only one dictionary per list:
out = pd.json_normalize(df['coeff'].str[0])
Or usig from_records
:
out = pd.DataFrame.from_records(df['coeff'].str[0])
Output:
value year
0 0.0641 2000
1 0.0641 2000
2 0.0641 2000
3 0.0652 2005
答案4
得分: 0
创建一个基于你的基本数据的数据框(df):
data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]}
每个df中的元素都是列表中的字典。使用apply方法和lambda函数来隔离列表中的字典的第一个元素。
使用.values()来获取字典的值(年份和数值),这将作为一个dtype为dict_values的对象存在。
dict_values的dtype相当受限制,所以将其包装在一个列表函数中以将其转换为列表,以便您可以使用切片和索引:
df2 = df.coeff.apply(lambda x: list(x[0].values()))
使用apply方法和lambda函数以及索引位置来获取年份和值,将它们分别分配给一个字典中的相应列名,并将其作为参数传递给pd.DataFrame类以创建一个新的数据框:
pd.DataFrame(data = {'year': df2.apply(lambda y: y[1]),
'value':df2.apply(lambda y: y[0])})
英文:
Create a df out of your base data:
data = {"coeff":[[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0641, "year":2000}],
[{"value": 0.0652, "year":2005}]]}
Each element in the df is a dictionary within a list. Isolate the dictionary within the list using the apply method and a lambda function to access the first element in the list (the dictionary).
Use .values() to retrieve the dictionary values (year and value) which will exist as an object with dtype dict_values.
The dytpe of dict_values is pretty limiting so wrap it in a list function to convert to a list so you can use slicing and inxdexing:
df2 = df.coeff.apply(lambda x: list(x[0].values()))
Use the apply method with a lambda function and index positions to retrieve the years and values respectively, assign these to their respective column names within a dictionary and pass this as an argument into the pd.DataFrame class to create a new dataframe:
pd.DataFrame(data = {'year': df2.apply(lambda y: y[1]),
'value':df2.apply(lambda y: y[0])})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论