2023年6月6日 07:41:16go评论75阅读模式

英文:

Tricky groupby several columns of a similar prefix while taking the sum based off of categorical values within a column (Pandas)

问题

我想对几列进行分组，如果前缀相似的话，并根据列内的分类值进行求和。

数据

name      type    size
AA:3400            5
AA:3401   FALSE    1
AA:3402   FALSE    2
AA:3404   FALSE    0
AA:3409   FALSE    1
AA:3410   FALSE    8
AA:3412   FALSE    9
BB:3400   TRUE     4
BB:3401   FALSE    7

期望结果

name    type    size
AA      TRUE    0
AA      FALSE   21
AA              5
BB      TRUE    4
BB      FALSE   7
BB

正在进行的操作

df.groupby(['name', 'type'], dropna=False, as_index=False)['size'].sum()

但是，如果值具有相同的前缀，我该如何分组呢？欢迎任何建议。

英文:

I am looking to groupby several columns if the prefix is similar and take the sum based off of categorical values within a column.

Data

name      type    size
AA:3400            5
AA:3401   FALSE    1
AA:3402   FALSE    2
AA:3404   FALSE    0
AA:3409   FALSE    1
AA:3410   FALSE    8
AA:3412   FALSE    9
BB:3400   TRUE     4
BB:3401   FALSE    7

Desired

name    type    size
AA      TRUE    0
AA      FALSE   21
AA              5
BB      TRUE    4
BB      FALSE   7
BB

Doing

df.groupby([&#39;name&#39;, &#39;type&#39;], dropna=False, as_index=False)[&#39;size&#39;].sum()

However, how can I group if the value has the same prefix? Any suggestion is appreciated.

答案1

得分: 3

以下是您要翻译的代码部分：

out = (
    df.assign(type=df["type"].astype(
        pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
      .groupby([df["name"].str.split(":").str[0], "type"],
               dropna=False, group_keys=False)["size"].sum().reset_index()
)

输出结果：

print(out)

  name   type  size
0   AA   TRUE     0
1   AA  FALSE    21
2   AA    NaN     5
3   BB   TRUE     4
4   BB  FALSE     7
5   BB    NaN     0

请注意，翻译结果仅包括代码和输出的部分，没有其他内容。

英文:

You can try:

out = (
    df.assign(type= df[&quot;type&quot;].astype(
        pd.CategoricalDtype([&quot;TRUE&quot;, &quot;FALSE&quot;], ordered=True)))
      .groupby([df[&quot;name&quot;].str.split(&quot;:&quot;).str[0], &quot;type&quot;],
               dropna=False, group_keys=False)[&quot;size&quot;].sum().reset_index()
)

Output:

print(out)

  name   type  size
0   AA   TRUE     0
1   AA  FALSE    21
2   AA    NaN     5
3   BB   TRUE     4
4   BB  FALSE     7
5   BB    NaN     0

答案2

得分: 3

以下是您要翻译的内容：

就像 @Timeless 的解决方案一样，我会这样做：

df['type'] = df['type'].astype('category')
df_out = df.groupby([df['name'].str[:2], 'type'], 
                    dropna=False, 
                    observed=False)['size'].sum().reset_index()
print(df_out)

输出：

      name   type  size
    0   AA  False    21
    1   AA   True     0
    2   AA    NaN     5
    3   BB  False     7
    4   BB   True     4
    5   BB    NaN     0

英文:

Much like @Timeless solution, I'd do it like this:

df[&#39;type&#39;] = df[&#39;type&#39;].astype(&#39;category&#39;)
df_out = df.groupby([df[&#39;name&#39;].str[:2], &#39;type&#39;], 
                    dropna=False, 
                    observed=False)[&#39;size&#39;].sum().reset_index()
print(df_out)

Output:

  name   type  size
0   AA  False    21
1   AA   True     0
2   AA    NaN     5
3   BB  False     7
4   BB   True     4
5   BB    NaN     0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据Pandas中一列的分类值，对具有相似前缀的多列进行分组，并进行求和。

问题

答案1

答案2

根据条件在 pandas 中填充 NaN 值

将日期转换为天数，使用numpy的时间戳和datetime64。

if (item 在 array 中) { Java }

迭代日期选择器中的日期，找到可用日期，然后选择其中之一。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论