英文:
Counting the unique values in a data frame and then appending the value in front of the string when grouped
问题
Name | id | Model |
---|---|---|
Alice | alice_1 | (A_01), (A_02) |
Bob | bob_1 | (B_01) |
Alice | alice_2 | (A_01), (A_05) |
Alice | alice_3 | (A_01), (A_05) |
Bob | bob_2 | (B_01) |
Bob | bob_3 | (B_01) |
我想要计算括号内的唯一模型值的数量,并将数量附加在括号前面,如下所示:
Name | Model |
---|---|
Alice | 3x (A_01), 2x (A_05), 1x (A_02) |
Bob | 3x(B_01) |
英文:
I have the following data frame:
Name | id | Model |
---|---|---|
Alice | alice_1 | (A_01), (A_02) |
Bob | bob_1 | (B_01) |
Alice | alice_2 | (A_01), (A_05) |
Alice | alice_3 | (A_01), (A_05) |
Bob | bob_2 | (B_01) |
Bob | bob_3 | (B_01) |
I would like to count the unique model values inside the brackets and append the count in front of the bracket like that:
Name | Model |
---|---|
Alice | 3x (A_01), 2x (A_05), 1x (A_02) |
Bob | 3x(B_01) |
I tried to use different approaches with group by and aggregate functions but could not find a way. Also I can use value counts and count each Model but then I don't know how to append the resulting number the the whole data frame.
答案1
得分: 1
使用Series.str.split
和DataFrame.explode
进行新行的拆分,然后使用,
连接值,然后通过GroupBy.size
获取计数,排序并添加到Model
列,最后聚合join
:
df = (df.assign(Model = df['Model'].str.split(', '))
.explode('Model')
.groupby(['Name','Model'])
.size()
.sort_values(ascending=False)
.astype(str)
.add('x')
.reset_index(level=1)
.assign(Model = lambda x: x[0].str.cat(x['Model']))
.groupby('Name')['Model']
.agg(', '.join)
.reset_index())
print (df)
输出结果如下:
Name Model
0 Alice 3x(A_01), 2x(A_05), 1x(A_02)
1 Bob 3x(B_01)
英文:
Use Series.str.split
with DataFrame.explode
for new rows by joinjed values by ,
, then get counts by GroupBy.size
, sorting and add to Model
column, last aggregate join
:
df = (df.assign(Model = df['Model'].str.split(', '))
.explode('Model')
.groupby(['Name','Model'])
.size()
.sort_values(ascending=False)
.astype(str)
.add('x')
.reset_index(level=1)
.assign(Model = lambda x: x[0].str.cat(x['Model']))
.groupby('Name')['Model']
.agg(', '.join)
.reset_index())
print (df)
Name Model
0 Alice 3x(A_01), 2x(A_05), 1x(A_02)
1 Bob 3x(B_01)
答案2
得分: 0
以下是您要翻译的代码部分:
After a split
+explode
, use a custom aggregation with help of groupby.agg
and collections.Counter
:
from collections import Counter
out = (df
.assign(Model=df['Model'].str.split(',\s*'))
.explode('Model')
.groupby('Name', as_index=False)['Model']
.agg(lambda g: ', '.join([f'{i}x {x}' for x, i in Counter(g).items()]))
)
Output:
Name Model
0 Alice 3x (A_01), 1x (A_02), 2x (A_05)
1 Bob 3x (B_01)
If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common()
in place of Counter(g).items()
:
Output:
Name Model
0 Alice 3x (A_01), 2x (A_05), 1x (A_02)
1 Bob 3x (B_01)
英文:
After a split
+explode
, use a custom aggregation with help of groupby.agg
and collections.Counter
:
from collections import Counter
out = (df
.assign(Model=df['Model'].str.split(',\s*'))
.explode('Model')
.groupby('Name', as_index=False)['Model']
.agg(lambda g: ', '.join([f'{i}x {x}' for x, i in Counter(g).items()]))
)
Output:
Name Model
0 Alice 3x (A_01), 1x (A_02), 2x (A_05)
1 Bob 3x (B_01)
If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common()
in place of Counter(g).items()
:
Output:
Name Model
0 Alice 3x (A_01), 2x (A_05), 1x (A_02)
1 Bob 3x (B_01)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论