2023年2月6日 19:26:25go评论90阅读模式

英文:

Counting the unique values in a data frame and then appending the value in front of the string when grouped

问题

Name	id	Model
Alice	alice_1	(A_01), (A_02)
Bob	bob_1	(B_01)
Alice	alice_2	(A_01), (A_05)
Alice	alice_3	(A_01), (A_05)
Bob	bob_2	(B_01)
Bob	bob_3	(B_01)

我想要计算括号内的唯一模型值的数量，并将数量附加在括号前面，如下所示：

Name	Model
Alice	3x (A_01), 2x (A_05), 1x (A_02)
Bob	3x(B_01)

英文:

I have the following data frame:

Name	id	Model
Alice	alice_1	(A_01), (A_02)
Bob	bob_1	(B_01)
Alice	alice_2	(A_01), (A_05)
Alice	alice_3	(A_01), (A_05)
Bob	bob_2	(B_01)
Bob	bob_3	(B_01)

I would like to count the unique model values inside the brackets and append the count in front of the bracket like that:

Name	Model
Alice	3x (A_01), 2x (A_05), 1x (A_02)
Bob	3x(B_01)

I tried to use different approaches with group by and aggregate functions but could not find a way. Also I can use value counts and count each Model but then I don't know how to append the resulting number the the whole data frame.

答案1

得分: 1

使用Series.str.split和DataFrame.explode进行新行的拆分，然后使用, 连接值，然后通过GroupBy.size获取计数，排序并添加到Model列，最后聚合join：

df = (df.assign(Model = df['Model'].str.split(', '))
          .explode('Model')
          .groupby(['Name','Model'])
          .size()
          .sort_values(ascending=False)
          .astype(str)
          .add('x')
          .reset_index(level=1)
          .assign(Model = lambda x: x[0].str.cat(x['Model']))
          .groupby('Name')['Model']
          .agg(', '.join)
          .reset_index())
print (df)

输出结果如下：

     Name                        Model
0  Alice  3x(A_01), 2x(A_05), 1x(A_02)
1    Bob                      3x(B_01)

英文:

Use Series.str.split with DataFrame.explode for new rows by joinjed values by , , then get counts by GroupBy.size, sorting and add to Model column, last aggregate join:

df = (df.assign(Model = df[&#39;Model&#39;].str.split(&#39;, &#39;))
          .explode(&#39;Model&#39;)
          .groupby([&#39;Name&#39;,&#39;Model&#39;])
          .size()
          .sort_values(ascending=False)
          .astype(str)
          .add(&#39;x&#39;)
          .reset_index(level=1)
          .assign(Model = lambda x: x[0].str.cat(x[&#39;Model&#39;]))
          .groupby(&#39;Name&#39;)[&#39;Model&#39;]
          .agg(&#39;, &#39;.join)
          .reset_index())
print (df)
     Name                         Model
0  Alice   3x(A_01), 2x(A_05), 1x(A_02)
1    Bob                       3x(B_01)

答案2

得分: 0

以下是您要翻译的代码部分：

After a split+explode, use a custom aggregation with help of groupby.agg and collections.Counter:

from collections import Counter
out = (df
   .assign(Model=df['Model'].str.split(',\s*'))
   .explode('Model')
   .groupby('Name', as_index=False)['Model']
   .agg(lambda g: ', '.join([f'{i}x {x}' for x, i in Counter(g).items()]))
)

Output:

    Name                            Model
0  Alice  3x (A_01), 1x (A_02), 2x (A_05)
1    Bob                        3x (B_01)

If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common() in place of Counter(g).items():

Output:

    Name                            Model
0  Alice  3x (A_01), 2x (A_05), 1x (A_02)
1    Bob                        3x (B_01)

英文:

After a split+explode, use a custom aggregation with help of groupby.agg and collections.Counter:

from collections import Counter
out = (df
   .assign(Model=df[&#39;Model&#39;].str.split(&#39;,\s*&#39;))
   .explode(&#39;Model&#39;)
   .groupby(&#39;Name&#39;, as_index=False)[&#39;Model&#39;]
   .agg(lambda g: &#39;, &#39;.join([f&#39;{i}x {x}&#39; for x, i in Counter(g).items()]))
)

Output:

    Name                            Model
0  Alice  3x (A_01), 1x (A_02), 2x (A_05)
1    Bob                        3x (B_01)

If you want the values sorted by frequencies (instead of the seen order), use Counter(g).most_common() in place of Counter(g).items():

Output:

    Name                            Model
0  Alice  3x (A_01), 2x (A_05), 1x (A_02)
1    Bob                        3x (B_01)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算数据框中的唯一值，然后在分组时将该值附加在字符串前面

问题

答案1

答案2

如何创建时间线图

使用gekko的已安装模型的predict()方法。

将数据框从长格式转换为宽格式。

测试 ZIP 文件不需要 zip64 支持

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论