有一种方法可以找到在切换到另一个索引值之前的每个最大值吗?

huangapple go评论96阅读模式
英文:

Is there a method to find each maximum value before changing to another index value?

问题

以下是您要翻译的内容:

  1. for i, row in df.iloc[:-1].iterrows():
  2. if df['index_a'][i] == 0:
  3. df['all_max'][i] = df['column_b'][i].max()
  4. else:
  5. df['all_max'][i] = df['column_b'][i].max()

注意:您提供的代码部分不需要翻译,因此我只提供了代码的翻译部分。

英文:

I want to get the max for each index before its changing to another index. As shown in below dataframe example

index_a column_b all_max
0 10 -
0 20 -
0 30 30
1 50 50
1 30 -
1 20 -
1 10 -
0 70 70
0 60 -
0 40 -

... (so on)

but instead i get results like this using the function i mentioned below

index_a column_b all_max
0 10 70
0 20 70
0 30 70
1 50 70
1 30 70
1 20 70
1 10 70
0 70 70
0 60 70
0 40 70

... (so on)

the index row is not fixed repetition, some have more 1s or 0s.

I have tried using the .max() function but it only provide me for the max value inside the Column B

  1. for i, row in df.iloc[:-1].iterrows():
  2. if df['index_a'][i] == 0:
  3. df['all_max'][i] = df['column_b'][i].max()
  4. else:
  5. df['all_max'][i] = df['column_b'][i].max()

答案1

得分: 1

使用groupby.transform在连续的分组上获取每个组的最大值作为广播系列,然后使用where来识别最大行并分配值,否则为-

  1. group = df['index_a'].ne(df['index_a'].shift()).cumsum()
  2. m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])
  3. df['all_max'] = df['column_b'].where(m, '-')

输出结果:

  1. index_a column_b all_max
  2. 0 0 10 -
  3. 1 0 20 -
  4. 2 0 30 30
  5. 3 1 50 50
  6. 4 1 30 -
  7. 5 1 20 -
  8. 6 1 10 -
  9. 7 0 70 70
  10. 8 0 60 -
  11. 9 0 40 -
英文:

Use groupby.tranform on successive groups to get the max per group as a broadcasted Series, then identify the max rows and assign the value with where, else a -:

  1. group = df['index_a'].ne(df['index_a'].shift()).cumsum()
  2. m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])
  3. df['all_max'] = df['column_b'].where(m, '-')

Output:

  1. index_a column_b all_max
  2. 0 0 10 -
  3. 1 0 20 -
  4. 2 0 30 30
  5. 3 1 50 50
  6. 4 1 30 -
  7. 5 1 20 -
  8. 6 1 10 -
  9. 7 0 70 70
  10. 8 0 60 -
  11. 9 0 40 -

答案2

得分: 0

如果需要获取每个连续分组的所有最大值,请使用GroupBy.transform来比较移位值与累积和,然后使用Series.where根据原始的column_b进行比较,如果不匹配,则赋值为-

  1. m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
  2. .transform('max')
  3. .eq(df['column_b']))
  4. df['all_max'] = df['column_b'].where(m, '-')

但如果只需要匹配每个相同分组中的第一个最大值,请使用:

  1. i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
  2. df['all_max'] = '-'
  3. df.loc[i, 'all_max'] = df['column_b']

你可以看到在更改的数据中的差异:

  1. print(df)
  2. index_a column_b
  3. 0 0 10
  4. 1 0 30 <- 第一个0组中有2个最大值
  5. 2 0 30
  6. 3 1 50
  7. 4 1 30
  8. 5 1 20
  9. 6 1 10
  10. 7 0 70
  11. 8 0 60
  12. 9 0 40

以下是两种方法的应用:

  1. m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
  2. .transform('max')
  3. .eq(df['column_b']))
  4. df['all_max1'] = df['column_b'].where(m, '-')
  5. i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
  6. df['all_max2'] = '-'
  7. df.loc[i, 'all_max2'] = df['column_b']

最终的输出如下:

  1. print(df)
  2. index_a column_b all_max1 all_max2
  3. 0 0 10 - -
  4. 1 0 30 30 30
  5. 2 0 30 30 -
  6. 3 1 50 50 50
  7. 4 1 30 - -
  8. 5 1 20 - -
  9. 6 1 10 - -
  10. 7 0 70 70 70
  11. 8 0 60 - -
英文:

If need all maximum values per consecutive groups use GroupBy.transform by groups by compare shifted values with cumulative sums, compare by original column_b and assign - if no match in Series.where:

  1. m = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
  2. .transform(&#39;max&#39;)
  3. .eq(df[&#39;column_b&#39;]))
  4. df[&#39;all_max&#39;] = df[&#39;column_b&#39;].where(m, &#39;-&#39;)
  5. print (df)
  6. index_a column_b all_max
  7. 0 0 10 -
  8. 1 0 20 -
  9. 2 0 30 30
  10. 3 1 50 50
  11. 4 1 30 -
  12. 5 1 20 -
  13. 6 1 10 -
  14. 7 0 70 70
  15. 8 0 60 -
  16. 9 0 40 -

But if need match only first maximal value per same groups use:

  1. i = df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;].idxmax()
  2. df[&#39;all_max&#39;] = &#39;-&#39;
  3. df.loc[i, &#39;all_max&#39;] = df[&#39;column_b&#39;]
  4. print (df)
  5. index_a column_b all_max
  6. 0 0 10 -
  7. 1 0 20 -
  8. 2 0 30 30
  9. 3 1 50 50
  10. 4 1 30 -
  11. 5 1 20 -
  12. 6 1 10 -
  13. 7 0 70 70
  14. 8 0 60 -
  15. 9 0 40 -

You can see difference in changed data:

  1. print (df)
  2. index_a column_b
  3. 0 0 10
  4. 1 0 30 &lt;- 2 maximums per first 0 group
  5. 2 0 30
  6. 3 1 50
  7. 4 1 30
  8. 5 1 20
  9. 6 1 10
  10. 7 0 70
  11. 8 0 60
  12. 9 0 40
  13. m = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
  14. .transform(&#39;max&#39;)
  15. .eq(df[&#39;column_b&#39;]))
  16. df[&#39;all_max1&#39;] = df[&#39;column_b&#39;].where(m, &#39;-&#39;)
  17. i = df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;].idxmax()
  18. df[&#39;all_max2&#39;] = &#39;-&#39;
  19. df.loc[i, &#39;all_max2&#39;] = df[&#39;column_b&#39;]

  1. print (df)
  2. index_a column_b all_max1 all_max2
  3. 0 0 10 - -
  4. 1 0 30 30 30
  5. 2 0 30 30 -
  6. 3 1 50 50 50
  7. 4 1 30 - -
  8. 5 1 20 - -
  9. 6 1 10 - -
  10. 7 0 70 70 70
  11. 8 0 60 - -

答案3

得分: 0

创建虚拟组以区分index_a列中相同的标识符:

  1. df['all_max'] = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
  2. .transform('max'))
  3. print(df)
  4. # 输出
  5. index_a column_b all_max
  6. 0 0 10 30
  7. 1 0 20 30
  8. 2 0 30 30
  9. 3 1 50 50
  10. 4 1 30 50
  11. 5 1 20 50
  12. 6 1 10 50
  13. 7 0 70 70
  14. 8 0 60 70
  15. 9 0 40 70
英文:

Create virtual groups to distinguish same identifier in index_a column:

  1. df[&#39;all_max&#39;] = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
  2. .transform(&#39;max&#39;))
  3. print(df)
  4. # Output
  5. index_a column_b all_max
  6. 0 0 10 30
  7. 1 0 20 30
  8. 2 0 30 30
  9. 3 1 50 50
  10. 4 1 30 50
  11. 5 1 20 50
  12. 6 1 10 50
  13. 7 0 70 70
  14. 8 0 60 70
  15. 9 0 40 70

huangapple
  • 本文由 发表于 2023年2月8日 18:43:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384622.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定