2023年6月12日 05:00:52go评论64阅读模式

英文:

Mapping recursively from a dataframe to python dictionary

问题

我在尝试找到递归映射以获取最终结果时遇到了困难。以下是输入数据框 df：

## 映射递归
import pandas as pd

data = {
  "group1": ["A", "A", "B", "B"],
  "group2": ["grp1", "grp2", "grp1", "grp2"],
  "hc": [50, 40, 45, 90],
  "response": [12, 30, 43, 80]
}

# 将数据加载到 DataFrame 对象中：
df = pd.DataFrame(data)
df

我想要递归映射以将 df 转换为 Python 字典。每个数字都在一个 details 列表中，并且通过聚合的数据框递归映射。例如，级别 A 的 total_hc 是 group1 为 A 的 hc 总和为 90。

以下是期望的输出：

## 期望的输出
output = {
    "rows":[
        {
        "details": [{        
            "level": "A",
            "total_hc": 90,
            "response_total": 42
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 50,
                    "response_total": 12
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 40,
                    "response_total": 30
                }],
                }
            ]
        },
        {
        "details": [{        
            "level": "B",
            "total_hc": 135,
            "response_total": 123
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 45,
                    "response_total": 43
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 90,
                    "response_total": 80
                }],
                }
            ]
        }
    ]
}

我尝试了对数据框进行分组：

## 使用 groupby 函数进行分组
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")

然后我在尝试找到递归映射以获取最终结果。感谢任何能够提供帮助的人。

英文:

I am struggling to find a recursive mapping to get the end result. Here is the input df

## mapping recursive
import pandas as pd

data = {
  &quot;group1&quot;: [&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;],
  &quot;group2&quot;: [&quot;grp1&quot;, &quot;grp2&quot;, &quot;grp1&quot;, &quot;grp2&quot;],
  &quot;hc&quot;: [50, 40, 45, 90],
  &quot;response&quot;: [12, 30, 43, 80]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
df

I would like to map recursively to convert the df into a Python dictionary. Each number is in a details list, and it is recursively mapped through the aggregated data frame. For example, level A total_hc is the sum of hc of group1 is A.

## desired output
output = {
    &quot;rows&quot;:[
        {
        &quot;details&quot;: [{        
            &quot;level&quot;: &quot;A&quot;,
            &quot;total_hc&quot;: 90,
            &quot;response_total&quot;: 42
        }],
            &quot;rows&quot;:[
                {
                &quot;details&quot;: [{        
                    &quot;level&quot;: &quot;grp1&quot;,
                    &quot;total_hc&quot;: 50,
                    &quot;response_total&quot;: 12
                }]
                },
                {
                &quot;details&quot;: [{        
                    &quot;level&quot;: &quot;grp2&quot;,
                    &quot;total_hc&quot;: 40,
                    &quot;response_total&quot;: 30
                }],
                }
            ]
        },
        {
        &quot;details&quot;: [{        
            &quot;level&quot;: &quot;B&quot;,
            &quot;total_hc&quot;: 135,
            &quot;response_total&quot;: 123
        }],
            &quot;rows&quot;:[
                {
                &quot;details&quot;: [{        
                    &quot;level&quot;: &quot;grp1&quot;,
                    &quot;total_hc&quot;: 45,
                    &quot;response_total&quot;: 43
                }]
                },
                {
                &quot;details&quot;: [{        
                    &quot;level&quot;: &quot;grp2&quot;,
                    &quot;total_hc&quot;: 90,
                    &quot;response_total&quot;: 80
                }],
                }
            ]
        }
    ]
}

I tried to group the df

## group by function
group_df = df.groupby([&quot;group1&quot;, &quot;group2&quot;]).sum()
group_df.to_dict(&quot;index&quot;)

Then I am struggling to find a recursive mapping to get the end result. Appreciate anyone who can help out.

答案1

得分: 2

如果数据框确实如此简单，那么您可以尝试以下代码：

new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
    "rows": [
        {"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
        for key, sdf in df.groupby("group1")
    ]
}

如果数据框有更多要分组的列（不仅是 group1 和 group2 ），那么您可以考虑使用递归函数，例如：

def grouping(df, by, val_cols, start=True):
    if start:
        new_cols = {c: f"{c}_total" for c in val_cols}
        df = df.rename(columns=new_cols)
        val_cols = list(new_cols.values())
    if len(by) == 1:
        df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
             "rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
            for key, sdf in df.groupby(by[0])]

out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}

如果您还想按 group2 或函数中 by 的最右侧项进行分组（从您的示例中不太清楚），则需要进行以下调整：

         ...
         "rows": [{"details": [record]}
                  for record in sdf[["group2"] + val_cols]
                                .groupby("group2", as_index=False).sum()
                                .rename(columns={"group2": "level"})
                                .to_dict(orient="records")]}
    ...

或者

    ...
    if len(by) == 1:
        df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
              .rename(columns={by[0]: "level"}))
        return [{"details": [record]} for record in df.to_dict(orient="records")]
    ...

英文:

If the dataframe is acutally that simple then you could try

new_cols = {c: f&quot;{c}_total&quot; for c in [&quot;hc&quot;, &quot;response&quot;]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
    &quot;rows&quot;: [
        {&quot;details&quot;: [{&quot;level&quot;: key} | sdf[val_cols].sum().to_dict()],
         &quot;rows&quot;: [{&quot;details&quot;: [record]}
                  for record in sdf[[&quot;group2&quot;] + val_cols]
                                .rename(columns={&quot;group2&quot;: &quot;level&quot;})
                                .to_dict(orient=&quot;records&quot;)]}
        for key, sdf in df.groupby(&quot;group1&quot;)
    ]
}

to get

{&#39;rows&#39;: [{&#39;details&#39;: [{&#39;level&#39;: &#39;A&#39;, &#39;hc_total&#39;: 90, &#39;response_total&#39;: 42}],
           &#39;rows&#39;: [{&#39;details&#39;: [{&#39;level&#39;: &#39;grp1&#39;,
                                  &#39;hc_total&#39;: 50,
                                  &#39;response_total&#39;: 12}]},
                    {&#39;details&#39;: [{&#39;level&#39;: &#39;grp2&#39;,
                                  &#39;hc_total&#39;: 40,
                                  &#39;response_total&#39;: 30}]}]},
          {&#39;details&#39;: [{&#39;level&#39;: &#39;B&#39;, &#39;hc_total&#39;: 135, &#39;response_total&#39;: 123}],
           &#39;rows&#39;: [{&#39;details&#39;: [{&#39;level&#39;: &#39;grp1&#39;,
                                  &#39;hc_total&#39;: 45,
                                  &#39;response_total&#39;: 43}]},
                    {&#39;details&#39;: [{&#39;level&#39;: &#39;grp2&#39;,
                                  &#39;hc_total&#39;: 90,
                                  &#39;response_total&#39;: 80}]}]}]}

If the dataframe has more columns to group over (not only group1 and group2) then it might be a good idea to use a recursive function like:

def grouping(df, by, val_cols, start=True):
    if start:
        new_cols = {c: f&quot;{c}_total&quot; for c in val_cols}
        df = df.rename(columns=new_cols)
        val_cols = list(new_cols.values())
    if len(by) == 1:
        df = df[[by[0]] + val_cols].rename(columns={by[0]: &quot;level&quot;})
        return [{&quot;details&quot;: [record]} for record in df.to_dict(orient=&quot;records&quot;)]
    return [{&quot;details&quot;: [{&quot;level&quot;: key} | sdf[val_cols].sum().to_dict()],
             &quot;rows&quot;: grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
            for key, sdf in df.groupby(by[0])]

out = {&quot;rows&quot;: grouping(df, [&quot;group1&quot;, &quot;group2&quot;], [&quot;hc&quot;, &quot;response&quot;])}

In case you also want to group by group2 or the right-most item of by in the function (it's not clear to me from your sample) you have to make the following adjustment:

         ...
         &quot;rows&quot;: [{&quot;details&quot;: [record]}
                  for record in sdf[[&quot;group2&quot;] + val_cols]
                                .groupby(&quot;group2&quot;, as_index=False).sum()
                                .rename(columns={&quot;group2&quot;: &quot;level&quot;})
                                .to_dict(orient=&quot;records&quot;)]}
    ...

    ...
    if len(by) == 1:
        df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
              .rename(columns={by[0]: &quot;level&quot;}))
        return [{&quot;details&quot;: [record]} for record in df.to_dict(orient=&quot;records&quot;)]
    ...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从数据框递归映射到Python字典

问题

答案1

如何使用未知列名的 f-string 模板在 pandas 数据帧中创建新列？

无法在Pandas中基于子字符串进行筛选。

传递命令行参数给已经参数化的 pytest 测试。

使用Python-SymPy在给定条件下分析计算函数的积分。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论