2023年5月24日 22:52:02go评论94阅读模式

英文:

dataframe column's aggregate based on simple majority

问题

以下是您要求的代码部分的翻译：

我有一个来自我的模型预测的`dataframe`，类似于下面的示例：
```python
df = pd.DataFrame({
  'trip-id': [8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,4,4,4],
  'segment-id': [1,1,1,1,1,1,1,1,0,0,0,0,0,0,5,5,5,5,5,5],
  'true_label': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
  'prediction': [3, 3, 3, 1, 2, 4, 0, 0, 3, 3, 3, 0, 1, 2, 3, 3, 1, 1, 2, 2]})
df
 trip-id	segment-id	true_label	prediction
0	8	        1	       3	        3
1	8	        1	       3	        3
2	8	        1	       3	        3
3	8	        1	       3	        1
4	8	        1	       3	        2
5	8	        1	       3	        4
6	8	        1	       3	        0
7	8	        1	       3	        0
8	4	        0	       3	        3
9	4	        0	       3	        3
10	4	        0	       3	        3
11	4	        0	       3	        0
12	4  	        0	       3	        1
13	4	        0	       3	        2
14	4	        5	       3	        3
15	4	        5	       3	        3
16	4	        5	       3	        1
17	4	        5	       3	        1
18	4	        5	       3	        2
19	4	        5	       3	        2

在给定的示例中，是对旅行的段落的预测和真实标签进行的，其中包括[0,1,..4]的实例。

我想基于简单的多数生成段的预测摘要。

将段的预测值视为具有简单多数的预测实例[0,1,..4]的值。
如果有多数预测实例的平局，将考虑匹配true_label的值作为段的预测。
如果存在多数的平局，并且没有实例与true_label匹配，则从平局中首先出现在df中的实例将被视为段的预测值。

目前我可以这样做：

segments_summary = (
     df['true_label'].eq(df['prediction'])
       .groupby([df['true_label'],df['trip-id'], df['segment-id']]).mean()
       .ge(0.5)
       .groupby(level='true_label').agg(['size','sum'])
       .rename(columns={'size':'total-segments','sum':'correctly-predicted'})\
       .assign(recall = lambda x: round(x['correctly-predicted']/x['total-segments'], 2))
       .reindex(range(5), fill_value='-')
       .reset_index())

它产生了以下结果：

segments_summary
  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                1	        0.33
4	 4	            -	                -	          -

但这不是我想要的。根据我上面的条件，所有3个段应该被正确预测。

trip 8, segment 1：3具有简单多数，因此该段应该被预测为3。
trip 4, segment 0：3具有简单多数，该段被预测为3。
trip 4, segment 5：存在平局，因此匹配true_label的预测应该是段的预测->3。

预期结果：

  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                3	         1.0
4	 4	            -	                -	          -


希望这个翻译对您有帮助。如果您有其他问题，请随时提出。
<details>
<summary>英文:</summary>
I have a `dataframe` from my model&#39;s prediction similar to the one below:
```python
df = pd.DataFrame({
&#39;trip-id&#39;: [8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,4,4,4],
&#39;segment-id&#39;: [1,1,1,1,1,1,1,1,0,0,0,0,0,0,5,5,5,5,5,5],
&#39;true_label&#39;: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
&#39;prediction&#39;: [3, 3, 3, 1, 2, 4, 0, 0, 3, 3, 3, 0, 1, 2, 3, 3, 1, 1, 2, 2]})
df
trip-id	segment-id	true_label	prediction
0	8	        1	       3	        3
1	8	        1	       3	        3
2	8	        1	       3	        3
3	8	        1	       3	        1
4	8	        1	       3	        2
5	8	        1	       3	        4
6	8	        1	       3	        0
7	8	        1	       3	        0
8	4	        0	       3	        3
9	4	        0	       3	        3
10	4	        0	       3	        3
11	4	        0	       3	        0
12	4  	        0	       3	        1
13	4	        0	       3	        2
14	4	        5	       3	        3
15	4	        5	       3	        3
16	4	        5	       3	        1
17	4	        5 	       3	        1
18	4	        5	       3	        2
19	4	        5	       3	        2

In the given sample are predictions and true label for the instances [0,1,..4] of trips' segments.

I would like to generate a summary of segment's predictions based on simple majority.

to consider as segment's predicted value, the value of that predicted instance [0,1,..4] of the segment having simple majority.
where there's a tie for the majority predicted instances, the value matching the true_label is considered the segment's prediction.
if there's a tie of majority, and none of the instances in the tie matches the true_label, then from those in the tie, the instance coming first in the df is regarded the segment's predicted value.

Currently I can do this:

segments_summary = (
     df[&#39;true_label&#39;].eq(df[&#39;prediction&#39;])
       .groupby([df[&#39;true_label&#39;],df[&#39;trip-id&#39;], df[&#39;segment-id&#39;]]).mean()
       .ge(0.5)
       .groupby(level=&#39;true_label&#39;).agg([&#39;size&#39;,&#39;sum&#39;])
       .rename(columns={&#39;size&#39;:&#39;total-segments&#39;,&#39;sum&#39;:&#39;correctly-predicted&#39;})\
       .assign(recall = lambda x: round(x[&#39;correctly-predicted&#39;]/x[&#39;total-segments&#39;], 2))
       .reindex(range(5), fill_value=&#39;-&#39;)
       .reset_index())

Which produces:

segments_summary
  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                1	        0.33
4	 4	            -	                -	          -

But this is not exactly what I wanted. Going by the conditions I above, all the 3 segments should have been predicted correctly.

trip 8, segment 1: 3 has the simple majority, so that segment should considered as predicted 3
trip 4, segment 0: 3 has simple majority, that segment is predicted as 3.
trip 4, segment 5: is s tie, so the prediction matching true_label should be the segment's prediction -> 3.

Expected result:

  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                3	         1.0
4	 4	            -	                -	          -

答案1

得分: 2

以下是您提供的代码的中文翻译结果：

我会使用：

out = (df

获取顶部预测

.value_counts(sort=False).reset_index(name='count')
.assign(flag=lambda d: d['true_label'].eq(d['prediction']))
.sort_values(by=['trip-id', 'segment-id', 'count', 'flag'],
ascending=[True, True, False, False],
kind='stable'
)
.groupby(['trip-id', 'segment-id']).first()

检查是否正确预测

.assign(**{'correctly-predicted': lambda d: d['true_label'].eq(d['prediction'])})

按预测聚合

.groupby('prediction')
.agg({'total-segments': ('prediction', 'count'),
'correctly-predicted': ('correctly-predicted', 'sum')
})
.assign({'recall': lambda d: d['correctly-predicted'].div(d['total-segments'])})
.reindex(range(5), fill_value='-')
.reset_index()
)


输出：

prediction total-segments correctly-predicted recall
0 0 - - -
1 1 - - -
2 2 - - -
3 3 3 3 1.0
4 4 - - -

希望这对您有所帮助。如果您有任何其他翻译需求，请随时告诉我。

英文:

I would use:

out = (df
# get the top prediction
.value_counts(sort=False).reset_index(name=&#39;count&#39;)
.assign(flag=lambda d: d[&#39;true_label&#39;].eq(d[&#39;prediction&#39;]))
.sort_values(by=[&#39;trip-id&#39;, &#39;segment-id&#39;, &#39;count&#39;, &#39;flag&#39;],
ascending=[True, True, False, False],
kind=&#39;stable&#39;
)
.groupby([&#39;trip-id&#39;, &#39;segment-id&#39;]).first()
# check if correctly predicted
.assign(**{&#39;correctly-predicted&#39;: lambda d: d[&#39;true_label&#39;].eq(d[&#39;prediction&#39;])})
# aggregate per prediction
.groupby(&#39;prediction&#39;)
.agg(**{&#39;total-segments&#39;: (&#39;prediction&#39;, &#39;count&#39;),
&#39;correctly-predicted&#39;: (&#39;correctly-predicted&#39;, &#39;sum&#39;)
})
.assign(**{&#39;recall&#39;: lambda d: d[&#39;correctly-predicted&#39;].div(d[&#39;total-segments&#39;])})
.reindex(range(5), fill_value=&#39;-&#39;)
.reset_index()
)

Output:

   prediction total-segments correctly-predicted recall
0           0              -                   -      -
1           1              -                   -      -
2           2              -                   -      -
3           3              3                   3    1.0
4           4              -                   -      -

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

数据框列基于简单多数进行聚合。

问题

答案1

获取顶部预测

检查是否正确预测

按预测聚合

更新PyQt5中的PyqtGraph绘图。

创建一个新的数据框，其中较少行的数值是唯一的，并总结结果。

有办法从输入数字中找到字典中的“Name”：数字对中的两个外部数字吗？

有没有办法使用 go.Scatterpolar() 获得完整线条的 Plotly 雷达图？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。