2023年4月4日 17:26:48go评论71阅读模式

英文:

Groupby if at least one element is in common

问题

以下是您要翻译的代码部分：

import pandas as pd
d1 = {'id': ["car", "car", "bus", "plane", "plane", "plane"], 'value': [["ab", "b"], ["b", "ab"], ["ab", "b"], ["cd", "df"], ["d", "cd"], ["df", "df"]]}
df = pd.DataFrame(data=d1)
df

     id     value
0    car    [ab, b]
1    car    [b, ab]
2    bus    [ab, b]
3    plane  [cd, df]
4    plane  [d, cd]
5    plane  [df, df]

我会将其翻译为英文：

I have the following data frame
```python
import pandas as pd
d1 = {'id': ["car", "car", "bus", "plane", "plane", "plane"], 'value': [["ab", "b"], ["b", "ab"], ["ab", "b"], ["cd", "df"], ["d", "cd"], ["df", "df"]]}
df = pd.DataFrame(data=d1)
df

     id     value
0    car    [ab, b]
1    car    [b, ab]
2    bus    [ab, b]
3    plane  [cd, df]
4    plane  [d, cd]
5    plane  [df, df]


<details>
<summary>英文:</summary>

I have the following data frame
```python
import pandas as pd
d1 = {&#39;id&#39;: [&quot;car&quot;, &quot;car&quot;, &quot;bus&quot;, &quot;plane&quot;, &quot;plane&quot;, &quot;plane&quot;], &#39;value&#39;: [[&quot;ab&quot;,&quot;b&quot;], [&quot;b&quot;,&quot;ab&quot;], [&quot;ab&quot;,&quot;b&quot;], [&quot;cd&quot;,&quot;df&quot;], [&quot;d&quot;,&quot;cd&quot;], [&quot;df&quot;,&quot;df&quot;]]}
df = pd.DataFrame(data=d1)
df

     id	     value
0	car	    [ab, b]
1	car	    [b, ab]
2	bus	    [ab, b]
3	plane	[cd, df]
4	plane	[d, cd]
5	plane	[df, df]

I would like to group my ids if they have atleast one element from the value column in common. The desired output would look like this:


     id	 value
0	car	[ab, b]
1	car	[b, ab]
2	bus	[ab, b]

      id	 value
0	plane	[cd, df]
1	plane	[d, cd]

      id	 value
0	plane	[cd, df]
1	plane	[df, df]

I tried using groupby, but the problem is that some ids should be included in mutliple data frames, like

plane   [cd, df]

答案1

得分: 1

你可以使用集合操作：

keep = (df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)
          .loc[lambda s: s.str.len()>1].unique()
       )

for idx in keep:
    print(df.loc[idx])

输出：

    id    value
0  car  [ab, b]
1  car  [b, ab]
2  bus  [ab, b]
      id     value
3  plane  [cd, df]
4  plane   [d, cd]
      id     value
3  plane  [cd, df]
5  plane  [df, df]

工作原理

首先获取每个值对应的匹配索引：

df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)

value
ab    (0, 1, 2)
b     (0, 1, 2)
cd       (3, 4)
d           (4)
df       (3, 5)
Name: index, dtype: object

去除重复项，仅保留具有多个成员的组：

keep = (df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)
          .loc[lambda s: s.str.len()>1].unique()
       )

[frozenset({0, 1, 2}), frozenset({3, 4}), frozenset({3, 5})]

最后，遍历这些组。

替代语法（相同逻辑）

s = df['value'].explode()
keep = dict.fromkeys(frozenset(x) for x in s.index.groupby(s).values() if len(x)>1)

for idx in keep:
    print(df.loc[idx])

英文:

You can use set operations:

keep = (df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)
          .loc[lambda s: s.str.len()&gt;1].unique()
       )

for idx in keep:
    print(df.loc[idx])

Output:

    id    value
0  car  [ab, b]
1  car  [b, ab]
2  bus  [ab, b]
      id     value
3  plane  [cd, df]
4  plane   [d, cd]
      id     value
3  plane  [cd, df]
5  plane  [df, df]

How it works

first get the matching indices per value

df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)

value
ab    (0, 1, 2)
b     (0, 1, 2)
cd       (3, 4)
d           (4)
df       (3, 5)
Name: index, dtype: object

Remove duplicates, keep only groups of more than 1 member:

keep = (df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)
          .loc[lambda s: s.str.len()&gt;1].unique()
       )

[frozenset({0, 1, 2}), frozenset({3, 4}), frozenset({3, 5})]

Finally, loop over the groups.

alternative syntax (same logic)

s = df[&#39;value&#39;].explode()
keep = dict.fromkeys(frozenset(x) for x in s.index.groupby(s).values() if len(x)&gt;1)

for idx in keep:
    print(df.loc[idx])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按照是否至少有一个共同元素分组

问题

答案1

工作原理

替代语法（相同逻辑）

How it works

alternative syntax (same logic)

如何在Python中使用条件语句将不在特定范围内的列值替换为null值

找到 N*N 矩阵的最大成本路径，从 [0,0] 到 [N-1,N-1]，并优先考虑一个方向。

如何在Jenkins日志中对测试用例失败进行分类。

Curve_fit Scipy Python 中的 RuntimeWarning 消息

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论