有一种方法可以找到在切换到另一个索引值之前的每个最大值吗?

huangapple go评论68阅读模式
英文:

Is there a method to find each maximum value before changing to another index value?

问题

以下是您要翻译的内容:

for i, row in df.iloc[:-1].iterrows():
   if df['index_a'][i] == 0:
     df['all_max'][i] = df['column_b'][i].max()
   else:
     df['all_max'][i] = df['column_b'][i].max()

注意:您提供的代码部分不需要翻译,因此我只提供了代码的翻译部分。

英文:

I want to get the max for each index before its changing to another index. As shown in below dataframe example

index_a column_b all_max
0 10 -
0 20 -
0 30 30
1 50 50
1 30 -
1 20 -
1 10 -
0 70 70
0 60 -
0 40 -

... (so on)

but instead i get results like this using the function i mentioned below

index_a column_b all_max
0 10 70
0 20 70
0 30 70
1 50 70
1 30 70
1 20 70
1 10 70
0 70 70
0 60 70
0 40 70

... (so on)

the index row is not fixed repetition, some have more 1s or 0s.

I have tried using the .max() function but it only provide me for the max value inside the Column B

for i, row in df.iloc[:-1].iterrows():
   if df['index_a'][i] == 0:
     df['all_max'][i] = df['column_b'][i].max()
   else:
     df['all_max'][i] = df['column_b'][i].max()

答案1

得分: 1

使用groupby.transform在连续的分组上获取每个组的最大值作为广播系列,然后使用where来识别最大行并分配值,否则为-

group = df['index_a'].ne(df['index_a'].shift()).cumsum()
m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])

df['all_max'] = df['column_b'].where(m, '-')

输出结果:

   index_a  column_b all_max
0        0        10       -
1        0        20       -
2        0        30      30
3        1        50      50
4        1        30       -
5        1        20       -
6        1        10       -
7        0        70      70
8        0        60       -
9        0        40       -
英文:

Use groupby.tranform on successive groups to get the max per group as a broadcasted Series, then identify the max rows and assign the value with where, else a -:

group = df['index_a'].ne(df['index_a'].shift()).cumsum()
m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])

df['all_max'] = df['column_b'].where(m, '-')

Output:

   index_a  column_b all_max
0        0        10       -
1        0        20       -
2        0        30      30
3        1        50      50
4        1        30       -
5        1        20       -
6        1        10       -
7        0        70      70
8        0        60       -
9        0        40       -

答案2

得分: 0

如果需要获取每个连续分组的所有最大值,请使用GroupBy.transform来比较移位值与累积和,然后使用Series.where根据原始的column_b进行比较,如果不匹配,则赋值为-

m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
       .transform('max')
       .eq(df['column_b']))

df['all_max'] = df['column_b'].where(m, '-')

但如果只需要匹配每个相同分组中的第一个最大值,请使用:

i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()

df['all_max'] = '-'
df.loc[i, 'all_max'] = df['column_b']

你可以看到在更改的数据中的差异:

print(df)
   index_a  column_b
0        0        10
1        0        30 <- 第一个0组中有2个最大值
2        0        30
3        1        50
4        1        30
5        1        20
6        1        10
7        0        70
8        0        60
9        0        40

以下是两种方法的应用:

m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
       .transform('max')
       .eq(df['column_b']))

df['all_max1'] = df['column_b'].where(m, '-')

i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()

df['all_max2'] = '-'
df.loc[i, 'all_max2'] = df['column_b']

最终的输出如下:

print(df)
   index_a  column_b all_max1 all_max2
0        0        10        -        -
1        0        30       30       30
2        0        30       30        -
3        1        50       50       50
4        1        30        -        -
5        1        20        -        -
6        1        10        -        -
7        0        70       70       70
8        0        60        -        -
英文:

If need all maximum values per consecutive groups use GroupBy.transform by groups by compare shifted values with cumulative sums, compare by original column_b and assign - if no match in Series.where:

m = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
       .transform(&#39;max&#39;)
       .eq(df[&#39;column_b&#39;]))
 
df[&#39;all_max&#39;] = df[&#39;column_b&#39;].where(m, &#39;-&#39;)

print (df)
   index_a  column_b all_max
0        0        10       -
1        0        20       -
2        0        30      30
3        1        50      50
4        1        30       -
5        1        20       -
6        1        10       -
7        0        70      70
8        0        60       -
9        0        40       -

But if need match only first maximal value per same groups use:

i = df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;].idxmax()

df[&#39;all_max&#39;] = &#39;-&#39;
df.loc[i, &#39;all_max&#39;] = df[&#39;column_b&#39;]

print (df)
   index_a  column_b all_max
0        0        10       -
1        0        20       -
2        0        30      30
3        1        50      50
4        1        30       -
5        1        20       -
6        1        10       -
7        0        70      70
8        0        60       -
9        0        40       -

You can see difference in changed data:

print (df)
   index_a  column_b
0        0        10
1        0        30 &lt;- 2 maximums per first 0 group
2        0        30
3        1        50
4        1        30
5        1        20
6        1        10
7        0        70
8        0        60
9        0        40

m = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
       .transform(&#39;max&#39;)
       .eq(df[&#39;column_b&#39;]))
 
df[&#39;all_max1&#39;] = df[&#39;column_b&#39;].where(m, &#39;-&#39;)


i = df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;].idxmax()

df[&#39;all_max2&#39;] = &#39;-&#39;
df.loc[i, &#39;all_max2&#39;] = df[&#39;column_b&#39;]

print (df)
   index_a  column_b all_max1 all_max2
0        0        10        -        -
1        0        30       30       30
2        0        30       30        -
3        1        50       50       50
4        1        30        -        -
5        1        20        -        -
6        1        10        -        -
7        0        70       70       70
8        0        60        -        -

答案3

得分: 0

创建虚拟组以区分index_a列中相同的标识符:

df['all_max'] = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
                   .transform('max'))
print(df)

# 输出
   index_a  column_b  all_max
0        0        10       30
1        0        20       30
2        0        30       30
3        1        50       50
4        1        30       50
5        1        20       50
6        1        10       50
7        0        70       70
8        0        60       70
9        0        40       70
英文:

Create virtual groups to distinguish same identifier in index_a column:

df[&#39;all_max&#39;] = (df.groupby(df[&#39;index_a&#39;].ne(df[&#39;index_a&#39;].shift()).cumsum())[&#39;column_b&#39;]
                   .transform(&#39;max&#39;))
print(df)

# Output
   index_a  column_b  all_max
0        0        10       30
1        0        20       30
2        0        30       30
3        1        50       50
4        1        30       50
5        1        20       50
6        1        10       50
7        0        70       70
8        0        60       70
9        0        40       70

huangapple
  • 本文由 发表于 2023年2月8日 18:43:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384622.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定