Compare row wise elements of a single column. If there are 2 continuous L then select lowest from High column and ignore other. Conversly if 2 L

huangapple go评论93阅读模式
英文:

Compare row wise elements of a single column. If there are 2 continuous L then select lowest from High column and ignore other. Conversly if 2 L

问题

  1. High D_HIGH D_HIGH_H
  2. 33 46.57 0 0L
  3. 1 86.44 68 68H
  4. 34 56.58 83 83L
  5. 2 117.91 158 158H
  6. 36 94.51 186 186L
  7. 4 123.28 254 254H
  8. 37 83.20 286 286L
英文:
  1. High D_HIGH D_HIGH_H
  2. 33 46.57 0 0L
  3. 0 69.93 42 42H
  4. 1 86.44 68 68H
  5. 34 56.58 83 83L
  6. 35 67.12 125 125L
  7. 2 117.91 158 158H
  8. 36 94.51 186 186L
  9. 3 120.45 245 245H
  10. 4 123.28 254 254H
  11. 37 83.20 286 286L

In column D_HIGH_H there is L & H at end.
If there are two continuous H then the one having highest value in High column has to be selected and other has to be ignored(deleted).
If there are two continuous L then the one having lowest value in High column has to be selected and other has to be ignored(deleted).
If the sequence is H,L,H,L then no changes to be made.

Output I want is as follows:

  1. High D_HIGH D_HIGH_H
  2. 33 46.57 0 0L
  3. 1 86.44 68 68H
  4. 34 56.58 83 83L
  5. 2 117.91 158 158H
  6. 36 94.51 186 186L
  7. 4 123.28 254 254H
  8. 37 83.20 286 286L

I tried various options using list map but did not work out.Also tried with groupby but no logical conclusion.

答案1

得分: 3

以下是代码的翻译部分:

  1. g = ((l := df['D_HIGH_H'].str[-1]) != l.shift()).cumsum()
  2. def f(x):
  3. if (x['D_HIGH_H'].str[-1] == 'H').any():
  4. return x.nlargest(1, 'D_HIGH')
  5. return x.nsmallest(1, 'D_HIGH')
  6. df.groupby(g, as_index=False).apply(f)

Output:

  1. High D_HIGH D_HIGH_H
  2. 0 33 46.57 0 0L
  3. 1 1 86.44 68 68H
  4. 2 34 56.58 83 83L
  5. 3 2 117.91 158 158H
  6. 4 36 94.51 186 186L
  7. 5 4 123.28 254 254H
  8. 6 37 83.20 286 286L
英文:

Here's one way:

  1. g = ((l := df['D_HIGH_H'].str[-1]) != l.shift()).cumsum()
  2. def f(x):
  3. if (x['D_HIGH_H'].str[-1] == 'H').any():
  4. return x.nlargest(1, 'D_HIGH')
  5. return x.nsmallest(1, 'D_HIGH')
  6. df.groupby(g, as_index=False).apply(f)

Output:

  1. High D_HIGH D_HIGH_H
  2. 0 33 46.57 0 0L
  3. 1 1 86.44 68 68H
  4. 2 34 56.58 83 83L
  5. 3 2 117.91 158 158H
  6. 4 36 94.51 186 186L
  7. 5 4 123.28 254 254H
  8. 6 37 83.20 286 286L

答案2

得分: 2

你可以使用 extract 来获取字母,然后计算一个自定义组,并使用依赖于该字母的函数进行 groupby.apply

  1. # 提取字母
  2. s = df['D_HIGH_H'].str.extract('(\D)$', expand=False)
  3. # 按连续字母分组
  4. # 根据字母类型获取 idxmin/idxmax
  5. keep = (df['High']
  6. .groupby([s, s.ne(s.shift()).cumsum()], sort=False)
  7. .apply(lambda x: x.idxmin() if x.name[0] == 'L' else x.idxmax())
  8. .tolist()
  9. )
  10. out = df.loc[keep]

输出:

  1. High D_HIGH D_HIGH_H
  2. 33 46.57 0 0L
  3. 1 86.44 68 68H
  4. 34 56.58 83 83L
  5. 2 117.91 158 158H
  6. 36 94.51 186 186L
  7. 4 123.28 254 254H
  8. 37 83.20 286 286L
英文:

You can use extract to get the letter, then compute a custom group and groupby.apply with a function that depends on the letter:

  1. # extract letter
  2. s = df['D_HIGH_H'].str.extract('(\D)$', expand=False)
  3. # group by successive letters
  4. # get the idxmin/idxmax depending on the type of letter
  5. keep = (df['High']
  6. .groupby([s, s.ne(s.shift()).cumsum()], sort=False)
  7. .apply(lambda x: x.idxmin() if x.name[0] == 'L' else x.idxmax())
  8. .tolist()
  9. )
  10. out = df.loc[keep]

Output:

  1. High D_HIGH D_HIGH_H
  2. 33 46.57 0 0L
  3. 1 86.44 68 68H
  4. 34 56.58 83 83L
  5. 2 117.91 158 158H
  6. 36 94.51 186 186L
  7. 4 123.28 254 254H
  8. 37 83.20 286 286L

huangapple
  • 本文由 发表于 2023年2月18日 15:22:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定