英文:
Is there a method to find each maximum value before changing to another index value?
问题
以下是您要翻译的内容:
for i, row in df.iloc[:-1].iterrows():
if df['index_a'][i] == 0:
df['all_max'][i] = df['column_b'][i].max()
else:
df['all_max'][i] = df['column_b'][i].max()
注意:您提供的代码部分不需要翻译,因此我只提供了代码的翻译部分。
英文:
I want to get the max for each index before its changing to another index. As shown in below dataframe example
index_a | column_b | all_max |
---|---|---|
0 | 10 | - |
0 | 20 | - |
0 | 30 | 30 |
1 | 50 | 50 |
1 | 30 | - |
1 | 20 | - |
1 | 10 | - |
0 | 70 | 70 |
0 | 60 | - |
0 | 40 | - |
... (so on)
but instead i get results like this using the function i mentioned below
index_a | column_b | all_max |
---|---|---|
0 | 10 | 70 |
0 | 20 | 70 |
0 | 30 | 70 |
1 | 50 | 70 |
1 | 30 | 70 |
1 | 20 | 70 |
1 | 10 | 70 |
0 | 70 | 70 |
0 | 60 | 70 |
0 | 40 | 70 |
... (so on)
the index row is not fixed repetition, some have more 1s or 0s.
I have tried using the .max()
function but it only provide me for the max value inside the Column B
for i, row in df.iloc[:-1].iterrows():
if df['index_a'][i] == 0:
df['all_max'][i] = df['column_b'][i].max()
else:
df['all_max'][i] = df['column_b'][i].max()
答案1
得分: 1
使用groupby.transform
在连续的分组上获取每个组的最大值作为广播系列,然后使用where
来识别最大行并分配值,否则为-
:
group = df['index_a'].ne(df['index_a'].shift()).cumsum()
m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])
df['all_max'] = df['column_b'].where(m, '-')
输出结果:
index_a column_b all_max
0 0 10 -
1 0 20 -
2 0 30 30
3 1 50 50
4 1 30 -
5 1 20 -
6 1 10 -
7 0 70 70
8 0 60 -
9 0 40 -
英文:
Use groupby.tranform
on successive groups to get the max per group as a broadcasted Series, then identify the max rows and assign the value with where
, else a -
:
group = df['index_a'].ne(df['index_a'].shift()).cumsum()
m = df.groupby(group)['column_b'].transform('max').eq(df['column_b'])
df['all_max'] = df['column_b'].where(m, '-')
Output:
index_a column_b all_max
0 0 10 -
1 0 20 -
2 0 30 30
3 1 50 50
4 1 30 -
5 1 20 -
6 1 10 -
7 0 70 70
8 0 60 -
9 0 40 -
答案2
得分: 0
如果需要获取每个连续分组的所有最大值,请使用GroupBy.transform
来比较移位值与累积和,然后使用Series.where
根据原始的column_b
进行比较,如果不匹配,则赋值为-
:
m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max')
.eq(df['column_b']))
df['all_max'] = df['column_b'].where(m, '-')
但如果只需要匹配每个相同分组中的第一个最大值,请使用:
i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
df['all_max'] = '-'
df.loc[i, 'all_max'] = df['column_b']
你可以看到在更改的数据中的差异:
print(df)
index_a column_b
0 0 10
1 0 30 <- 第一个0组中有2个最大值
2 0 30
3 1 50
4 1 30
5 1 20
6 1 10
7 0 70
8 0 60
9 0 40
以下是两种方法的应用:
m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max')
.eq(df['column_b']))
df['all_max1'] = df['column_b'].where(m, '-')
i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
df['all_max2'] = '-'
df.loc[i, 'all_max2'] = df['column_b']
最终的输出如下:
print(df)
index_a column_b all_max1 all_max2
0 0 10 - -
1 0 30 30 30
2 0 30 30 -
3 1 50 50 50
4 1 30 - -
5 1 20 - -
6 1 10 - -
7 0 70 70 70
8 0 60 - -
英文:
If need all maximum values per consecutive groups use GroupBy.transform
by groups by compare shifted values with cumulative sums, compare by original column_b
and assign -
if no match in Series.where
:
m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max')
.eq(df['column_b']))
df['all_max'] = df['column_b'].where(m, '-')
print (df)
index_a column_b all_max
0 0 10 -
1 0 20 -
2 0 30 30
3 1 50 50
4 1 30 -
5 1 20 -
6 1 10 -
7 0 70 70
8 0 60 -
9 0 40 -
But if need match only first maximal value per same groups use:
i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
df['all_max'] = '-'
df.loc[i, 'all_max'] = df['column_b']
print (df)
index_a column_b all_max
0 0 10 -
1 0 20 -
2 0 30 30
3 1 50 50
4 1 30 -
5 1 20 -
6 1 10 -
7 0 70 70
8 0 60 -
9 0 40 -
You can see difference in changed data:
print (df)
index_a column_b
0 0 10
1 0 30 <- 2 maximums per first 0 group
2 0 30
3 1 50
4 1 30
5 1 20
6 1 10
7 0 70
8 0 60
9 0 40
m = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max')
.eq(df['column_b']))
df['all_max1'] = df['column_b'].where(m, '-')
i = df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b'].idxmax()
df['all_max2'] = '-'
df.loc[i, 'all_max2'] = df['column_b']
print (df)
index_a column_b all_max1 all_max2
0 0 10 - -
1 0 30 30 30
2 0 30 30 -
3 1 50 50 50
4 1 30 - -
5 1 20 - -
6 1 10 - -
7 0 70 70 70
8 0 60 - -
答案3
得分: 0
创建虚拟组以区分index_a
列中相同的标识符:
df['all_max'] = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max'))
print(df)
# 输出
index_a column_b all_max
0 0 10 30
1 0 20 30
2 0 30 30
3 1 50 50
4 1 30 50
5 1 20 50
6 1 10 50
7 0 70 70
8 0 60 70
9 0 40 70
英文:
Create virtual groups to distinguish same identifier in index_a
column:
df['all_max'] = (df.groupby(df['index_a'].ne(df['index_a'].shift()).cumsum())['column_b']
.transform('max'))
print(df)
# Output
index_a column_b all_max
0 0 10 30
1 0 20 30
2 0 30 30
3 1 50 50
4 1 30 50
5 1 20 50
6 1 10 50
7 0 70 70
8 0 60 70
9 0 40 70
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论