Compare row wise elements of a single column. If there are 2 continuous L then select lowest from High column and ignore other. Conversly if 2 L

huangapple go评论72阅读模式
英文:

Compare row wise elements of a single column. If there are 2 continuous L then select lowest from High column and ignore other. Conversly if 2 L

问题

    High	D_HIGH	D_HIGH_H
33	46.57	0	0L
1	86.44	68	68H
34	56.58	83	83L

2	117.91	158	158H
36	94.51	186	186L

4	123.28	254	254H
37	83.20	286	286L
英文:
    High	D_HIGH	D_HIGH_H
33	46.57	0	0L
0	69.93	42	42H
1	86.44	68	68H
34	56.58	83	83L
35	67.12	125	125L
2	117.91	158	158H
36	94.51	186	186L
3	120.45	245	245H
4	123.28	254	254H
37	83.20	286	286L

In column D_HIGH_H there is L & H at end.
If there are two continuous H then the one having highest value in High column has to be selected and other has to be ignored(deleted).
If there are two continuous L then the one having lowest value in High column has to be selected and other has to be ignored(deleted).
If the sequence is H,L,H,L then no changes to be made.

Output I want is as follows:

    High	D_HIGH	D_HIGH_H
33	46.57	0	0L
1	86.44	68	68H
34	56.58	83	83L

2	117.91	158	158H
36	94.51	186	186L

4	123.28	254	254H
37	83.20	286	286L

I tried various options using list map but did not work out.Also tried with groupby but no logical conclusion.

答案1

得分: 3

以下是代码的翻译部分:

g = ((l := df['D_HIGH_H'].str[-1]) != l.shift()).cumsum()

def f(x):
    if (x['D_HIGH_H'].str[-1] == 'H').any():
        return x.nlargest(1, 'D_HIGH')
    return x.nsmallest(1, 'D_HIGH')

df.groupby(g, as_index=False).apply(f)

Output:

            High  D_HIGH D_HIGH_H
0 33   46.57       0       0L
1 1    86.44      68      68H
2 34   56.58      83      83L
3 2   117.91     158     158H
4 36   94.51     186     186L
5 4   123.28     254     254H
6 37   83.20     286     286L
英文:

Here's one way:

g = ((l := df['D_HIGH_H'].str[-1]) != l.shift()).cumsum()

def f(x):
    if (x['D_HIGH_H'].str[-1] == 'H').any():
        return x.nlargest(1, 'D_HIGH')
    return x.nsmallest(1, 'D_HIGH')

df.groupby(g, as_index=False).apply(f)

Output:

        High  D_HIGH D_HIGH_H
0 33   46.57       0       0L
1 1    86.44      68      68H
2 34   56.58      83      83L
3 2   117.91     158     158H
4 36   94.51     186     186L
5 4   123.28     254     254H
6 37   83.20     286     286L

答案2

得分: 2

你可以使用 extract 来获取字母,然后计算一个自定义组,并使用依赖于该字母的函数进行 groupby.apply

# 提取字母
s = df['D_HIGH_H'].str.extract('(\D)$', expand=False)

# 按连续字母分组
# 根据字母类型获取 idxmin/idxmax
keep = (df['High']
           .groupby([s, s.ne(s.shift()).cumsum()], sort=False)
           .apply(lambda x: x.idxmin() if x.name[0] == 'L' else x.idxmax())
           .tolist()
        )

out = df.loc[keep]

输出:

      High  D_HIGH D_HIGH_H
33   46.57       0       0L
1    86.44      68      68H
34   56.58      83      83L
2   117.91     158     158H
36   94.51     186     186L
4   123.28     254     254H
37   83.20     286     286L
英文:

You can use extract to get the letter, then compute a custom group and groupby.apply with a function that depends on the letter:

# extract letter
s = df['D_HIGH_H'].str.extract('(\D)$', expand=False)

# group by successive letters
# get the idxmin/idxmax depending on the type of letter
keep = (df['High']
           .groupby([s, s.ne(s.shift()).cumsum()], sort=False)
           .apply(lambda x: x.idxmin() if x.name[0] == 'L' else x.idxmax())
           .tolist()
        )

out = df.loc[keep]

Output:

      High  D_HIGH D_HIGH_H
33   46.57       0       0L
1    86.44      68      68H
34   56.58      83      83L
2   117.91     158     158H
36   94.51     186     186L
4   123.28     254     254H
37   83.20     286     286L

huangapple
  • 本文由 发表于 2023年2月18日 15:22:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定