计算数据框中的唯一值,然后在分组时将该值附加在字符串前面

huangapple go评论62阅读模式
英文:

Counting the unique values in a data frame and then appending the value in front of the string when grouped

问题

Name id Model
Alice alice_1 (A_01), (A_02)
Bob bob_1 (B_01)
Alice alice_2 (A_01), (A_05)
Alice alice_3 (A_01), (A_05)
Bob bob_2 (B_01)
Bob bob_3 (B_01)

我想要计算括号内的唯一模型值的数量,并将数量附加在括号前面,如下所示:

Name Model
Alice 3x (A_01), 2x (A_05), 1x (A_02)
Bob 3x(B_01)
英文:

I have the following data frame:

Name id Model
Alice alice_1 (A_01), (A_02)
Bob bob_1 (B_01)
Alice alice_2 (A_01), (A_05)
Alice alice_3 (A_01), (A_05)
Bob bob_2 (B_01)
Bob bob_3 (B_01)

I would like to count the unique model values inside the brackets and append the count in front of the bracket like that:

Name Model
Alice 3x (A_01), 2x (A_05), 1x (A_02)
Bob 3x(B_01)

I tried to use different approaches with group by and aggregate functions but could not find a way. Also I can use value counts and count each Model but then I don't know how to append the resulting number the the whole data frame.

答案1

得分: 1

使用Series.str.splitDataFrame.explode进行新行的拆分,然后使用, 连接值,然后通过GroupBy.size获取计数,排序并添加到Model列,最后聚合join

df = (df.assign(Model = df['Model'].str.split(', '))
          .explode('Model')
          .groupby(['Name','Model'])
          .size()
          .sort_values(ascending=False)
          .astype(str)
          .add('x')
          .reset_index(level=1)
          .assign(Model = lambda x: x[0].str.cat(x['Model']))
          .groupby('Name')['Model']
          .agg(', '.join)
          .reset_index())
print (df)

输出结果如下:

     Name                        Model
0  Alice  3x(A_01), 2x(A_05), 1x(A_02)
1    Bob                      3x(B_01)
英文:

Use Series.str.split with DataFrame.explode for new rows by joinjed values by , , then get counts by GroupBy.size, sorting and add to Model column, last aggregate join:

df = (df.assign(Model = df['Model'].str.split(', '))
          .explode('Model')
          .groupby(['Name','Model'])
          .size()
          .sort_values(ascending=False)
          .astype(str)
          .add('x')
          .reset_index(level=1)
          .assign(Model = lambda x: x[0].str.cat(x['Model']))
          .groupby('Name')['Model']
          .agg(', '.join)
          .reset_index())
print (df)
     Name                         Model
0  Alice   3x(A_01), 2x(A_05), 1x(A_02)
1    Bob                       3x(B_01)

答案2

得分: 0

以下是您要翻译的代码部分:

After a split+explode, use a custom aggregation with help of groupby.agg and collections.Counter:

from collections import Counter

out = (df
   .assign(Model=df['Model'].str.split(',\s*'))
   .explode('Model')
   .groupby('Name', as_index=False)['Model']
   .agg(lambda g: ', '.join([f'{i}x {x}' for x, i in Counter(g).items()]))
)

Output:

    Name                            Model
0  Alice  3x (A_01), 1x (A_02), 2x (A_05)
1    Bob                        3x (B_01)

If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common() in place of Counter(g).items():

Output:

    Name                            Model
0  Alice  3x (A_01), 2x (A_05), 1x (A_02)
1    Bob                        3x (B_01)
英文:

After a split+explode, use a custom aggregation with help of groupby.agg and collections.Counter:

from collections import Counter

out = (df
   .assign(Model=df['Model'].str.split(',\s*'))
   .explode('Model')
   .groupby('Name', as_index=False)['Model']
   .agg(lambda g: ', '.join([f'{i}x {x}' for x, i in Counter(g).items()]))
)

Output:

    Name                            Model
0  Alice  3x (A_01), 1x (A_02), 2x (A_05)
1    Bob                        3x (B_01)

If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common() in place of Counter(g).items():

Output:

    Name                            Model
0  Alice  3x (A_01), 2x (A_05), 1x (A_02)
1    Bob                        3x (B_01)

huangapple
  • 本文由 发表于 2023年2月6日 19:26:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75360696.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定