2023年5月25日 22:56:20go评论67阅读模式

英文:

How can I find redundant groups in a pandas dataframe using groupby in Python 3.x?

问题

Here's the translation of the code portion you provided:

groups = [set(i[1]['b']) for i in df.groupby('a')]
covered_groups = [groups[0]]

counter = 0
for i in groups[1:]:
    for k in covered_groups:
        if i.issubset(k):
            counter += 1
            break
    covered_groups.append(i)

print(counter)

Translation:

组 = [set(i[1]['b']) for i in df.groupby('a')]
已覆盖的组 = [组[0]]

计数器 = 0
for i in 组[1:]:
    for k in 已覆盖的组:
        if i.issubset(k):
            计数器 += 1
            break
    已覆盖的组.append(i)

print(计数器)

I've provided the translated code without additional content. If you have any more code or specific questions related to this code, please feel free to ask.

英文:

Below is the example dataframe:

df=pd.DataFrame({&#39;a&#39;:[1,1,1,2,2,3,4,4,4,5,5,5,5], &#39;b&#39;:[1001,1002,1232,1001,1002,3002,1021,2021,4000,1002,1002,2031,1200]})

I grouped the dataframe by column 'a', so that each group contains a set of values; like group 1 contains {1001,1002,1232}, group 2 - {1001,1002}, etc.:

df.groupby(&#39;a&#39;)

Going from top to bottom, let's call a group redundant if it does not contain any new value (every value it contains is already included by some earlier groups in this dataframe).
I need to write a code to find how many groups in the dataframe are redundant?

This is what I tried:

groups=[set(i[1][&#39;b&#39;]) for i in df.groupby(&#39;a&#39;)]
covered_groups=[groups[0]]

counter=0
for i in groups[1:]:
    for k in covered_groups:
        if i.issubset(k):
            counter+=1
            break
    covered_groups.append(i)
        
print(counter)

Output is 4, instead of 1. Not sure whats' wrong here.
Also maybe preferably there're pandas built in methods to achieve same result?

答案1

得分: 0

以下是翻译好的部分：

# 以集合形式汇总并按大小降序排序
tmp = (df.groupby('a')['b'].agg(set)
         .sort_values(key=lambda s: s.str.len(), ascending=False)
      )

# 将集合与较大的集合进行比较
keep = []
drop = []
for s in tmp:
    if any(s.issubset(s2) for s2 in keep):
        drop.append(s)
        continue
    else:
        keep.append(s)

keep
# [{1001, 1002, 1232}, {1021, 2021, 4000}, {1002, 1200, 2031}, {3002}]

drop
# [{1001, 1002}]

英文:

You could try:

# aggregate as set and sort by decreasing size
tmp = (df.groupby(&#39;a&#39;)[&#39;b&#39;].agg(set)
         .sort_values(key=lambda s: s.str.len(), ascending=False)
      )

# compare set to larger ones
keep = []
drop = []
for s in tmp:
    if any(s.issubset(s2) for s2 in keep):
        drop.append(s)
        continue
    else:
        keep.append(s)

keep
# [{1001, 1002, 1232}, {1021, 2021, 4000}, {1002, 1200, 2031}, {3002}]

drop
# [{1001, 1002}]

答案2

得分: 0

我无法找到一个纯粹的pandas解决方案，但你可以尝试类似以下的方法：

values = set()
count = 0

for group in df.groupby('a')['b']:
    are_in = group[1].isin(values)

    if are_in.all():
        count += 1

    values = values.union(group[1])

print(count) #1

英文:

I couldn't figure it out a pure pandas solution, but you can try something like this:

values = set()                     
count = 0
                              
for group in df.groupby(&#39;a&#39;)[&#39;b&#39;]: 
    are_in = group[1].isin(values) 
                                                   
    if are_in.all():               
       count += 1
             
    values = values.union(group[1])

print(count) #1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用Python 3.x中的groupby方法来查找pandas dataframe中的冗余分组。

问题

答案1

答案2

Pandas：更改对象的值

How to create a new column in a Pandas dataframe based on conditions using two existing columns i.e., multiple and/or operators in each condition?

检查一个数字是否在另一个数字的百分之一范围内的Python代码。

如何为 pandas 数据透视表 DataFrame 添加样式？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论