从数据框逐行或按块选择最大/最小值

huangapple go评论87阅读模式
英文:

Select largest/smallest value from data frame line by line or chunk wise

问题

  1. 我们有以下数据框。
  2. | Val1 | Val2 | H | L |
  3. | ---- | ---- | --- | ------ |
  4. | 244.2 | 233.05 | H | |
  5. | 261.2 | 252.8 | H | |
  6. | 242.35 | 235.8 | H | |
  7. | 251.1 | 244.55 | H | |
  8. | 244.5 | 239.55 | | L |
  9. | 271.25 | 262.55 | H | |
  10. | 274 | 270.45 | H | |
  11. | 245.05 | 236.65 | | L |
  12. | 243.35 | 237.55 | | L |
  13. | 255.7 | 251.55 | H | |
  14. | 248.8 | 243.2 | | L |
  15. | 283.75 | 272.15 | H | |
  16. | 269.45 | 262.7 | | L |
  17. | 260.6 | 254.6 | H | |
  18. | 285 | 275 | | L |
  19. | 272.5 | 265.35 | | L |
  20. | 267 | 254.85 | | L |
  21. | 256.65 | 253.5 | | L |
  22. | 268 | 257.75 | H | |
  23. | 272.1 | 266.75 | H | |
  24. | 281.45 | 272.85 | | L |
  25. | 288.75 | 282.05 | H | |
  26. | 268.75 | 264.5 | | L |
  27. | 283 | 275.75 | | L |
  28. | 285.95 | 278.15 | H | |
  29. | 271.35 | 265.2 | | L |
  30. | 308.15 | 300 | H | |
  31. | 312.85 | 304.7 | H | |
  32. | 311.55 | 302.2 | | L |
  33. | 327.4 | 318.65 | H | |
  34. | 327 | 319.5 | H | |
  35. | 318.7 | 313.9 | | L |
  36. | 293.1 | 283.45 | | L |
  37. | 306 | 301.5 | H | |
  38. | 303.9 | 296.2 | | L |
  39. | 293.4 | 261.55 | | L |
  40. | 302.8 | 298.1 | H | |
  41. | 293.7 | 289.05 | | L |
  42. | 267.95 | 264.05 | | L |
  43. | 276.55 | 273.2 | H | |
  44. | 268.15 | 255 | | L |
  45. | 314 | 303.9 | | L |
  46. | 324.2 | 311.05 | H | |
  47. | 332.35 | 323.8 | H | |
  48. | 312.35 | 306.3 | H | |
  49. | 308.4 | 300.2 | | L |
  50. | 312.5 | 308.55 | | L |
  51. | 365.7 | 355.6 | H | |
  52. | 334 | 318.8 | H | |
  53. | 319.3 | 313.95 | | L |
  54. | 327.8 | 323.15 | H | |
  55. 我想要一个数据框,在 H L 值之间交替。但是如框架中所示,一些 H 值在以下时间序列数据中形成了集群。我希望在 H 字段中选择集群中的最高值,在 L 字段中选择集群中的最低值。
  56. 输出应如下所示。
  57. | Val1 | Val2 | H | L |
  58. | ------ | ------ | ------ | ------ |
  59. | 261.2 | 252.8 | H | |
  60. | 244.5 | 239.55 | | L |
  61. | 274 | 270.45 | H | |
  62. | 245.05 | 236.65 | | L |
  63. | 255.7 | 251.55 | H | |
  64. | 248.8 | 243.2 | | L |
  65. | 283.75 | 272.15 | H | |
  66. | 269.45 | 262.7 | | L |
  67. | 260.6 | 254.6 | H | |
  68. | 256.65 | 253.5 | | L |
  69. | 272.1 | 266.75 | H | |
  70. | 281.45 | 272.85 | | L |
  71. | 288.75 | 282.05 | H | |
  72. | 268.75 | 264.5 | | L |
  73. | 285.95 | 278.15 | H | |
  74. | 271.35 | 265.2 | | L |
  75. | 312.85 | 304.7 | H | |
  76. | 311.55 | 302.2 | | L |
  77. | 327.4 | 318.65 | H | |
  78. | 293.1 | 283.45 | | L |
  79. | 306 | 301.5 | H | |
  80. | 293.4 | 261.55 | | L |
  81. | 302.8 | 298.1 | H | |
  82. | 267.95 | 264.05 | | L |
  83. | 276.55 | 273.2 | H | |
  84. | 268.15 | 255 | | L |
  85. | 332.35 | 323.8 | H | |
  86. | 308.4 | 300.2 | | L |
  87. | 365.7 | 355.6 | H | |
  88. | 319.3 | 313.95 | | L |
  89. | 327.8 | 323.15 | H | |
英文:

We have the following data frame.

Val1 Val2 H L
244.2 233.05 H None
261.2 252.8 H None
242.35 235.8 H None
251.1 244.55 H None
244.5 239.55 None L
271.25 262.55 H None
274 270.45 H None
245.05 236.65 None L
243.35 237.55 None L
255.7 251.55 H None
248.8 243.2 None L
283.75 272.15 H None
269.45 262.7 None L
260.6 254.6 H None
285 275 None L
272.5 265.35 None L
267 254.85 None L
256.65 253.5 None L
268 257.75 H None
272.1 266.75 H None
281.45 272.85 None L
288.75 282.05 H None
268.75 264.5 None L
283 275.75 None L
285.95 278.15 H None
271.35 265.2 None L
308.15 300 H None
312.85 304.7 H None
311.55 302.2 None L
327.4 318.65 H None
327 319.5 H None
318.7 313.9 None L
293.1 283.45 None L
306 301.5 H None
303.9 296.2 None L
293.4 261.55 None L
302.8 298.1 H None
293.7 289.05 None L
267.95 264.05 None L
276.55 273.2 H None
268.15 255 None L
314 303.9 None L
324.2 311.05 H None
332.35 323.8 H None
312.35 306.3 H None
308.4 300.2 None L
312.5 308.55 None L
365.7 355.6 H None
334 318.8 H None
319.3 313.95 None L
327.8 323.15 H None

I want a data frame which alternates between H and L values. However as seen in the frame, few H values have formed clusters in the following time series data. I want to select the highest value amongst the cluster for H fields and lowest value amongst the cluster for L fields.

The output should look as follows.

Val1 Val2 H L
261.2 252.8 H None
244.5 239.55 None L
274 270.45 H None
245.05 236.65 None L
255.7 251.55 H None
248.8 243.2 None L
283.75 272.15 H None
269.45 262.7 None L
260.6 254.6 H None
256.65 253.5 None L
272.1 266.75 H None
281.45 272.85 None L
288.75 282.05 H None
268.75 264.5 None L
285.95 278.15 H None
271.35 265.2 None L
312.85 304.7 H None
311.55 302.2 None L
327.4 318.65 H None
293.1 283.45 None L
306 301.5 H None
293.4 261.55 None L
302.8 298.1 H None
267.95 264.05 None L
276.55 273.2 H None
268.15 255 None L
332.35 323.8 H None
308.4 300.2 None L
365.7 355.6 H None
319.3 313.95 None L
327.8 323.15 H None

I tried a variety of functions from pandas but cannot get such desired result.

答案1

得分: 1

以下是代码的中文翻译:

  1. # 确保将字符串 'None' 和 NaN 转换为真正的 None
  2. df = df.replace('None', None)
  3. # 标识空值
  4. m1 = df['H'].isna()
  5. m2 = df['L'].isna()
  6. # 创建连续非空值的分组
  7. # 获取每个 H 组的最大 Val1 的行
  8. # 和每个 L 组的最小 Val1 的行
  9. idx = (set(df['Val1'].groupby(m1.cumsum()[~m1]).idxmax())
  10. | set(df['Val1'].groupby(m2.cumsum()[~m2]).idxmin())
  11. )
  12. # 索引这些行
  13. out = df.loc[sorted(idx)]

输出结果不需要翻译,因为它已经是中文的代码输出。

英文:

You can use:

  1. # ensure real None/NaN
  2. df = df.replace('None', None)
  3. # identify null values
  4. m1 = df['H'].isna()
  5. m2 = df['L'].isna()
  6. # make groups of consecutive non-null
  7. # get rows with max Val1 per group of H
  8. # and rows with min Val1 per group of L
  9. idx = (set(df['Val1'].groupby(m1.cumsum()[~m1]).idxmax())
  10. |set(df['Val1'].groupby(m2.cumsum()[~m2]).idxmin())
  11. )
  12. # index those rows
  13. out = df.loc[sorted(idx)]

Output:

  1. Val1 Val2 H L
  2. 1 261.20 252.80 H None
  3. 4 244.50 239.55 None L
  4. 6 274.00 270.45 H None
  5. 8 243.35 237.55 None L
  6. 9 255.70 251.55 H None
  7. 10 248.80 243.20 None L
  8. 11 283.75 272.15 H None
  9. 12 269.45 262.70 None L
  10. 13 260.60 254.60 H None
  11. 17 256.65 253.50 None L
  12. 19 272.10 266.75 H None
  13. 20 281.45 272.85 None L
  14. 21 288.75 282.05 H None
  15. 22 268.75 264.50 None L
  16. 24 285.95 278.15 H None
  17. 25 271.35 265.20 None L
  18. 27 312.85 304.70 H None
  19. 28 311.55 302.20 None L
  20. 29 327.40 318.65 H None
  21. 32 293.10 283.45 None L
  22. 33 306.00 301.50 H None
  23. 35 293.40 261.55 None L
  24. 36 302.80 298.10 H None
  25. 38 267.95 264.05 None L
  26. 39 276.55 273.20 H None
  27. 40 268.15 255.00 None L
  28. 43 332.35 323.80 H None
  29. 45 308.40 300.20 None L
  30. 47 365.70 355.60 H None
  31. 49 319.30 313.95 None L
  32. 50 327.80 323.15 H None

huangapple
  • 本文由 发表于 2023年2月19日 19:11:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75499699.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定