数据框列基于简单多数进行聚合。

huangapple go评论66阅读模式
英文:

dataframe column's aggregate based on simple majority

问题

以下是您要求的代码部分的翻译:

我有一个来自我的模型预测的`dataframe`,类似于下面的示例
```python
df = pd.DataFrame({
  'trip-id': [8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,4,4,4],
  'segment-id': [1,1,1,1,1,1,1,1,0,0,0,0,0,0,5,5,5,5,5,5],
  'true_label': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
  'prediction': [3, 3, 3, 1, 2, 4, 0, 0, 3, 3, 3, 0, 1, 2, 3, 3, 1, 1, 2, 2]})

df
 trip-id	segment-id	true_label	prediction
0	8	        1	       3	        3
1	8	        1	       3	        3
2	8	        1	       3	        3
3	8	        1	       3	        1
4	8	        1	       3	        2
5	8	        1	       3	        4
6	8	        1	       3	        0
7	8	        1	       3	        0
8	4	        0	       3	        3
9	4	        0	       3	        3
10	4	        0	       3	        3
11	4	        0	       3	        0
12	4  	        0	       3	        1
13	4	        0	       3	        2
14	4	        5	       3	        3
15	4	        5	       3	        3
16	4	        5	       3	        1
17	4	        5	       3	        1
18	4	        5	       3	        2
19	4	        5	       3	        2

在给定的示例中,是对旅行的段落的预测和真实标签进行的,其中包括[0,1,..4]的实例。

我想基于简单的多数生成段的预测摘要。

  • 将段的预测值视为具有简单多数的预测实例[0,1,..4]的值。
  • 如果有多数预测实例的平局,将考虑匹配true_label的值作为段的预测。
  • 如果存在多数的平局,并且没有实例与true_label匹配,则从平局中首先出现在df中的实例将被视为段的预测值。

目前我可以这样做:

segments_summary = (
     df['true_label'].eq(df['prediction'])
       .groupby([df['true_label'],df['trip-id'], df['segment-id']]).mean()
       .ge(0.5)
       .groupby(level='true_label').agg(['size','sum'])
       .rename(columns={'size':'total-segments','sum':'correctly-predicted'})\
       .assign(recall = lambda x: round(x['correctly-predicted']/x['total-segments'], 2))
       .reindex(range(5), fill_value='-')
       .reset_index())

它产生了以下结果:

segments_summary
  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                1	        0.33
4	 4	            -	                -	          -

但这不是我想要的。根据我上面的条件,所有3个段应该被正确预测。

  • trip 8, segment 13具有简单多数,因此该段应该被预测为3
  • trip 4, segment 03具有简单多数,该段被预测为3
  • trip 4, segment 5:存在平局,因此匹配true_label的预测应该是段的预测->3

预期结果:

  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                3	         1.0
4	 4	            -	                -	          -

希望这个翻译对您有帮助。如果您有其他问题,请随时提出。
<details>
<summary>英文:</summary>
I have a `dataframe` from my model&#39;s prediction similar to the one below:
```python
df = pd.DataFrame({
&#39;trip-id&#39;: [8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,4,4,4],
&#39;segment-id&#39;: [1,1,1,1,1,1,1,1,0,0,0,0,0,0,5,5,5,5,5,5],
&#39;true_label&#39;: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
&#39;prediction&#39;: [3, 3, 3, 1, 2, 4, 0, 0, 3, 3, 3, 0, 1, 2, 3, 3, 1, 1, 2, 2]})
df
trip-id	segment-id	true_label	prediction
0	8	        1	       3	        3
1	8	        1	       3	        3
2	8	        1	       3	        3
3	8	        1	       3	        1
4	8	        1	       3	        2
5	8	        1	       3	        4
6	8	        1	       3	        0
7	8	        1	       3	        0
8	4	        0	       3	        3
9	4	        0	       3	        3
10	4	        0	       3	        3
11	4	        0	       3	        0
12	4  	        0	       3	        1
13	4	        0	       3	        2
14	4	        5	       3	        3
15	4	        5	       3	        3
16	4	        5	       3	        1
17	4	        5 	       3	        1
18	4	        5	       3	        2
19	4	        5	       3	        2

In the given sample are predictions and true label for the instances [0,1,..4] of trips' segments.

I would like to generate a summary of segment's predictions based on simple majority.

  • to consider as segment's predicted value, the value of that predicted instance [0,1,..4] of the segment having simple majority.
  • where there's a tie for the majority predicted instances, the value matching the true_label is considered the segment's prediction.
  • if there's a tie of majority, and none of the instances in the tie matches the true_label, then from those in the tie, the instance coming first in the df is regarded the segment's predicted value.

Currently I can do this:

segments_summary = (
     df[&#39;true_label&#39;].eq(df[&#39;prediction&#39;])
       .groupby([df[&#39;true_label&#39;],df[&#39;trip-id&#39;], df[&#39;segment-id&#39;]]).mean()
       .ge(0.5)
       .groupby(level=&#39;true_label&#39;).agg([&#39;size&#39;,&#39;sum&#39;])
       .rename(columns={&#39;size&#39;:&#39;total-segments&#39;,&#39;sum&#39;:&#39;correctly-predicted&#39;})\
       .assign(recall = lambda x: round(x[&#39;correctly-predicted&#39;]/x[&#39;total-segments&#39;], 2))
       .reindex(range(5), fill_value=&#39;-&#39;)
       .reset_index())

Which produces:

segments_summary
  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                1	        0.33
4	 4	            -	                -	          -

But this is not exactly what I wanted. Going by the conditions I above, all the 3 segments should have been predicted correctly.

  • trip 8, segment 1: 3 has the simple majority, so that segment should considered as predicted 3
  • trip 4, segment 0: 3 has simple majority, that segment is predicted as 3.
  • trip 4, segment 5: is s tie, so the prediction matching true_label should be the segment's prediction -> 3.

Expected result:

  true_label	total-segments	correctly-predicted	recall
0	 0	            -	                -	          -
1	 1	            -	                -	          -
2	 2	            -	                -	          -
3	 3	            3	                3	         1.0
4	 4	            -	                -	          -

答案1

得分: 2

以下是您提供的代码的中文翻译结果:

我会使用

out = (df

获取顶部预测

.value_counts(sort=False).reset_index(name='count')
.assign(flag=lambda d: d['true_label'].eq(d['prediction']))
.sort_values(by=['trip-id', 'segment-id', 'count', 'flag'],
ascending=[True, True, False, False],
kind='stable'
)
.groupby(['trip-id', 'segment-id']).first()

检查是否正确预测

.assign(**{'correctly-predicted': lambda d: d['true_label'].eq(d['prediction'])})

按预测聚合

.groupby('prediction')
.agg({'total-segments': ('prediction', 'count'),
'correctly-predicted': ('correctly-predicted', 'sum')
})
.assign(
{'recall': lambda d: d['correctly-predicted'].div(d['total-segments'])})
.reindex(range(5), fill_value='-')
.reset_index()
)


输出:

prediction total-segments correctly-predicted recall
0 0 - - -
1 1 - - -
2 2 - - -
3 3 3 3 1.0
4 4 - - -

希望这对您有所帮助。如果您有任何其他翻译需求,请随时告诉我。

英文:

I would use:

out = (df
# get the top prediction
.value_counts(sort=False).reset_index(name=&#39;count&#39;)
.assign(flag=lambda d: d[&#39;true_label&#39;].eq(d[&#39;prediction&#39;]))
.sort_values(by=[&#39;trip-id&#39;, &#39;segment-id&#39;, &#39;count&#39;, &#39;flag&#39;],
ascending=[True, True, False, False],
kind=&#39;stable&#39;
)
.groupby([&#39;trip-id&#39;, &#39;segment-id&#39;]).first()
# check if correctly predicted
.assign(**{&#39;correctly-predicted&#39;: lambda d: d[&#39;true_label&#39;].eq(d[&#39;prediction&#39;])})
# aggregate per prediction
.groupby(&#39;prediction&#39;)
.agg(**{&#39;total-segments&#39;: (&#39;prediction&#39;, &#39;count&#39;),
&#39;correctly-predicted&#39;: (&#39;correctly-predicted&#39;, &#39;sum&#39;)
})
.assign(**{&#39;recall&#39;: lambda d: d[&#39;correctly-predicted&#39;].div(d[&#39;total-segments&#39;])})
.reindex(range(5), fill_value=&#39;-&#39;)
.reset_index()
)

Output:

   prediction total-segments correctly-predicted recall
0           0              -                   -      -
1           1              -                   -      -
2           2              -                   -      -
3           3              3                   3    1.0
4           4              -                   -      -

huangapple
  • 本文由 发表于 2023年5月24日 22:52:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76324830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定