从数据框递归映射到Python字典

huangapple go评论53阅读模式
英文:

Mapping recursively from a dataframe to python dictionary

问题

我在尝试找到递归映射以获取最终结果时遇到了困难。以下是输入数据框 df

## 映射递归
import pandas as pd

data = {
  "group1": ["A", "A", "B", "B"],
  "group2": ["grp1", "grp2", "grp1", "grp2"],
  "hc": [50, 40, 45, 90],
  "response": [12, 30, 43, 80]
}

# 将数据加载到 DataFrame 对象中:
df = pd.DataFrame(data)
df

我想要递归映射以将 df 转换为 Python 字典。每个数字都在一个 details 列表中,并且通过聚合的数据框递归映射。例如,级别 A 的 total_hcgroup1A 的 hc 总和为 90。

以下是期望的输出:

## 期望的输出
output = {
    "rows":[
        {
        "details": [{        
            "level": "A",
            "total_hc": 90,
            "response_total": 42
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 50,
                    "response_total": 12
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 40,
                    "response_total": 30
                }],
                }
            ]
        },
        {
        "details": [{        
            "level": "B",
            "total_hc": 135,
            "response_total": 123
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 45,
                    "response_total": 43
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 90,
                    "response_total": 80
                }],
                }
            ]
        }
    ]
}

我尝试了对数据框进行分组:

## 使用 groupby 函数进行分组
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")

然后我在尝试找到递归映射以获取最终结果。感谢任何能够提供帮助的人。

英文:

I am struggling to find a recursive mapping to get the end result. Here is the input df

## mapping recursive
import pandas as pd

data = {
  "group1": ["A", "A", "B", "B"],
  "group2": ["grp1", "grp2", "grp1", "grp2"],
  "hc": [50, 40, 45, 90],
  "response": [12, 30, 43, 80]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
df

I would like to map recursively to convert the df into a Python dictionary. Each number is in a details list, and it is recursively mapped through the aggregated data frame. For example, level A total_hc is the sum of hc of group1 is A.

## desired output
output = {
    "rows":[
        {
        "details": [{        
            "level": "A",
            "total_hc": 90,
            "response_total": 42
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 50,
                    "response_total": 12
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 40,
                    "response_total": 30
                }],
                }
            ]
        },
        {
        "details": [{        
            "level": "B",
            "total_hc": 135,
            "response_total": 123
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 45,
                    "response_total": 43
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 90,
                    "response_total": 80
                }],
                }
            ]
        }
    ]
}

I tried to group the df

## group by function
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")

Then I am struggling to find a recursive mapping to get the end result. Appreciate anyone who can help out.

答案1

得分: 2

如果数据框确实如此简单,那么您可以尝试以下代码:

new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
    "rows": [
        {"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
        for key, sdf in df.groupby("group1")
    ]
}

如果数据框有更多要分组的列(不仅是 group1group2 ),那么您可以考虑使用递归函数,例如:

def grouping(df, by, val_cols, start=True):
    if start:
        new_cols = {c: f"{c}_total" for c in val_cols}
        df = df.rename(columns=new_cols)
        val_cols = list(new_cols.values())
    if len(by) == 1:
        df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
             "rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
            for key, sdf in df.groupby(by[0])]

out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}

如果您还想按 group2 或函数中 by 的最右侧项进行分组(从您的示例中不太清楚),则需要进行以下调整:

         ...
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .groupby("group2", as_index=False).sum()
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
    ...

或者

    ...
    if len(by) == 1:
        df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
              .rename(columns={by[0]: "level"}))
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    ...
英文:

If the dataframe is acutally that simple then you could try

new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
    "rows": [
        {"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
        for key, sdf in df.groupby("group1")
    ]
}

to get

{'rows': [{'details': [{'level': 'A', 'hc_total': 90, 'response_total': 42}],
           'rows': [{'details': [{'level': 'grp1',
                                  'hc_total': 50,
                                  'response_total': 12}]},
                    {'details': [{'level': 'grp2',
                                  'hc_total': 40,
                                  'response_total': 30}]}]},
          {'details': [{'level': 'B', 'hc_total': 135, 'response_total': 123}],
           'rows': [{'details': [{'level': 'grp1',
                                  'hc_total': 45,
                                  'response_total': 43}]},
                    {'details': [{'level': 'grp2',
                                  'hc_total': 90,
                                  'response_total': 80}]}]}]}

If the dataframe has more columns to group over (not only group1 and group2) then it might be a good idea to use a recursive function like:

def grouping(df, by, val_cols, start=True):
    if start:
        new_cols = {c: f"{c}_total" for c in val_cols}
        df = df.rename(columns=new_cols)
        val_cols = list(new_cols.values())
    if len(by) == 1:
        df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
             "rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
            for key, sdf in df.groupby(by[0])]

out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}

In case you also want to group by group2 or the right-most item of by in the function (it's not clear to me from your sample) you have to make the following adjustment:

         ...
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .groupby("group2", as_index=False).sum()
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
    ...

or

    ...
    if len(by) == 1:
        df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
              .rename(columns={by[0]: "level"}))
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    ...

huangapple
  • 本文由 发表于 2023年6月12日 05:00:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452501.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定