英文:
Mapping recursively from a dataframe to python dictionary
问题
我在尝试找到递归映射以获取最终结果时遇到了困难。以下是输入数据框 df
:
## 映射递归
import pandas as pd
data = {
"group1": ["A", "A", "B", "B"],
"group2": ["grp1", "grp2", "grp1", "grp2"],
"hc": [50, 40, 45, 90],
"response": [12, 30, 43, 80]
}
# 将数据加载到 DataFrame 对象中:
df = pd.DataFrame(data)
df
我想要递归映射以将 df
转换为 Python 字典。每个数字都在一个 details
列表中,并且通过聚合的数据框递归映射。例如,级别 A 的 total_hc
是 group1
为 A
的 hc 总和为 90。
以下是期望的输出:
## 期望的输出
output = {
"rows":[
{
"details": [{
"level": "A",
"total_hc": 90,
"response_total": 42
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 50,
"response_total": 12
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 40,
"response_total": 30
}],
}
]
},
{
"details": [{
"level": "B",
"total_hc": 135,
"response_total": 123
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 45,
"response_total": 43
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 90,
"response_total": 80
}],
}
]
}
]
}
我尝试了对数据框进行分组:
## 使用 groupby 函数进行分组
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")
然后我在尝试找到递归映射以获取最终结果。感谢任何能够提供帮助的人。
英文:
I am struggling to find a recursive mapping to get the end result. Here is the input df
## mapping recursive
import pandas as pd
data = {
"group1": ["A", "A", "B", "B"],
"group2": ["grp1", "grp2", "grp1", "grp2"],
"hc": [50, 40, 45, 90],
"response": [12, 30, 43, 80]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
df
I would like to map recursively to convert the df
into a Python dictionary. Each number is in a details
list, and it is recursively mapped through the aggregated data frame. For example, level A total_hc
is the sum of hc of group1
is A
.
## desired output
output = {
"rows":[
{
"details": [{
"level": "A",
"total_hc": 90,
"response_total": 42
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 50,
"response_total": 12
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 40,
"response_total": 30
}],
}
]
},
{
"details": [{
"level": "B",
"total_hc": 135,
"response_total": 123
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 45,
"response_total": 43
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 90,
"response_total": 80
}],
}
]
}
]
}
I tried to group the df
## group by function
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")
Then I am struggling to find a recursive mapping to get the end result. Appreciate anyone who can help out.
答案1
得分: 2
如果数据框确实如此简单,那么您可以尝试以下代码:
new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
"rows": [
{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
for key, sdf in df.groupby("group1")
]
}
如果数据框有更多要分组的列(不仅是 group1
和 group2
),那么您可以考虑使用递归函数,例如:
def grouping(df, by, val_cols, start=True):
if start:
new_cols = {c: f"{c}_total" for c in val_cols}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
if len(by) == 1:
df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
return [{"details": [record]} for record in df.to_dict(orient="records")]
return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
for key, sdf in df.groupby(by[0])]
out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}
如果您还想按 group2
或函数中 by
的最右侧项进行分组(从您的示例中不太清楚),则需要进行以下调整:
...
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.groupby("group2", as_index=False).sum()
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
...
或者
...
if len(by) == 1:
df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
.rename(columns={by[0]: "level"}))
return [{"details": [record]} for record in df.to_dict(orient="records")]
...
英文:
If the dataframe is acutally that simple then you could try
new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
"rows": [
{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
for key, sdf in df.groupby("group1")
]
}
to get
{'rows': [{'details': [{'level': 'A', 'hc_total': 90, 'response_total': 42}],
'rows': [{'details': [{'level': 'grp1',
'hc_total': 50,
'response_total': 12}]},
{'details': [{'level': 'grp2',
'hc_total': 40,
'response_total': 30}]}]},
{'details': [{'level': 'B', 'hc_total': 135, 'response_total': 123}],
'rows': [{'details': [{'level': 'grp1',
'hc_total': 45,
'response_total': 43}]},
{'details': [{'level': 'grp2',
'hc_total': 90,
'response_total': 80}]}]}]}
If the dataframe has more columns to group over (not only group1
and group2
) then it might be a good idea to use a recursive function like:
def grouping(df, by, val_cols, start=True):
if start:
new_cols = {c: f"{c}_total" for c in val_cols}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
if len(by) == 1:
df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
return [{"details": [record]} for record in df.to_dict(orient="records")]
return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
for key, sdf in df.groupby(by[0])]
out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}
In case you also want to group by group2
or the right-most item of by
in the function (it's not clear to me from your sample) you have to make the following adjustment:
...
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.groupby("group2", as_index=False).sum()
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
...
or
...
if len(by) == 1:
df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
.rename(columns={by[0]: "level"}))
return [{"details": [record]} for record in df.to_dict(orient="records")]
...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论