按照是否至少有一个共同元素分组

huangapple go评论62阅读模式
英文:

Groupby if at least one element is in common

问题

以下是您要翻译的代码部分:

import pandas as pd
d1 = {'id': ["car", "car", "bus", "plane", "plane", "plane"], 'value': [["ab", "b"], ["b", "ab"], ["ab", "b"], ["cd", "df"], ["d", "cd"], ["df", "df"]]}
df = pd.DataFrame(data=d1)
df
     id     value
0    car    [ab, b]
1    car    [b, ab]
2    bus    [ab, b]
3    plane  [cd, df]
4    plane  [d, cd]
5    plane  [df, df]

我会将其翻译为英文:

I have the following data frame
```python
import pandas as pd
d1 = {'id': ["car", "car", "bus", "plane", "plane", "plane"], 'value': [["ab", "b"], ["b", "ab"], ["ab", "b"], ["cd", "df"], ["d", "cd"], ["df", "df"]]}
df = pd.DataFrame(data=d1)
df
     id     value
0    car    [ab, b]
1    car    [b, ab]
2    bus    [ab, b]
3    plane  [cd, df]
4    plane  [d, cd]
5    plane  [df, df]

<details>
<summary>英文:</summary>

I have the following data frame
```python
import pandas as pd
d1 = {&#39;id&#39;: [&quot;car&quot;, &quot;car&quot;, &quot;bus&quot;, &quot;plane&quot;, &quot;plane&quot;, &quot;plane&quot;], &#39;value&#39;: [[&quot;ab&quot;,&quot;b&quot;], [&quot;b&quot;,&quot;ab&quot;], [&quot;ab&quot;,&quot;b&quot;], [&quot;cd&quot;,&quot;df&quot;], [&quot;d&quot;,&quot;cd&quot;], [&quot;df&quot;,&quot;df&quot;]]}
df = pd.DataFrame(data=d1)
df
     id	     value
0	car	    [ab, b]
1	car	    [b, ab]
2	bus	    [ab, b]
3	plane	[cd, df]
4	plane	[d, cd]
5	plane	[df, df]

I would like to group my ids if they have atleast one element from the value column in common. The desired output would look like this:


     id	 value
0	car	[ab, b]
1	car	[b, ab]
2	bus	[ab, b]
      id	 value
0	plane	[cd, df]
1	plane	[d, cd]
      id	 value
0	plane	[cd, df]
1	plane	[df, df]

I tried using groupby, but the problem is that some ids should be included in mutliple data frames, like

plane   [cd, df]

答案1

得分: 1

你可以使用集合操作:

keep = (df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)
          .loc[lambda s: s.str.len()>1].unique()
       )

for idx in keep:
    print(df.loc[idx])

输出:

    id    value
0  car  [ab, b]
1  car  [b, ab]
2  bus  [ab, b]
      id     value
3  plane  [cd, df]
4  plane   [d, cd]
      id     value
3  plane  [cd, df]
5  plane  [df, df]

工作原理

首先获取每个值对应的匹配索引:

df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)

value
ab    (0, 1, 2)
b     (0, 1, 2)
cd       (3, 4)
d           (4)
df       (3, 5)
Name: index, dtype: object

去除重复项,仅保留具有多个成员的组:

keep = (df.explode('value').reset_index().groupby('value')['index'].agg(frozenset)
          .loc[lambda s: s.str.len()>1].unique()
       )

[frozenset({0, 1, 2}), frozenset({3, 4}), frozenset({3, 5})]

最后,遍历这些组。

替代语法(相同逻辑)

s = df['value'].explode()
keep = dict.fromkeys(frozenset(x) for x in s.index.groupby(s).values() if len(x)>1)

for idx in keep:
    print(df.loc[idx])
英文:

You can use set operations:

keep = (df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)
          .loc[lambda s: s.str.len()&gt;1].unique()
       )

for idx in keep:
    print(df.loc[idx])

Output:

    id    value
0  car  [ab, b]
1  car  [b, ab]
2  bus  [ab, b]
      id     value
3  plane  [cd, df]
4  plane   [d, cd]
      id     value
3  plane  [cd, df]
5  plane  [df, df]

How it works

first get the matching indices per value

df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)

value
ab    (0, 1, 2)
b     (0, 1, 2)
cd       (3, 4)
d           (4)
df       (3, 5)
Name: index, dtype: object

Remove duplicates, keep only groups of more than 1 member:

keep = (df.explode(&#39;value&#39;).reset_index().groupby(&#39;value&#39;)[&#39;index&#39;].agg(frozenset)
          .loc[lambda s: s.str.len()&gt;1].unique()
       )

[frozenset({0, 1, 2}), frozenset({3, 4}), frozenset({3, 5})]

Finally, loop over the groups.

alternative syntax (same logic)

s = df[&#39;value&#39;].explode()
keep = dict.fromkeys(frozenset(x) for x in s.index.groupby(s).values() if len(x)&gt;1)

for idx in keep:
    print(df.loc[idx])

huangapple
  • 本文由 发表于 2023年4月4日 17:26:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927712.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定