pandas函数,用于查找第一个未来实例的索引,其中列的值小于每行的值。

huangapple go评论79阅读模式
英文:

pandas function to find index of the first future instance where column is less than each row's value

问题

我是新来的Stack Overflow用户,我有一个关于在pandas中解决问题的问题。我想要创建一个函数,该函数返回第一个未来实例的索引,其中某一列小于该列的每一行的值。

例如,考虑以下数据框:

  1. import numpy as np
  2. import pandas as pd
  3. df = pd.DataFrame({'Val': [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index=np.arange(0, 9))
  4. df

我在寻找的输出是:

  1. | Index | F(Val) |
  2. | -------- | -------- |
  3. | 0 | 4 |
  4. | 1 | 4 |
  5. | 2 | 4 |
  6. | 3 | 4 |
  7. | 4 | 6 |
  8. | 5 | 6 |
  9. | 6 | 7 |
  10. | 7 | 8 |
  11. | 8 | NaN |

或等效于F(Val)的系列/数组。

我已经能够很容易地使用for循环解决这个问题,但显然在我正在处理的大型数据集上非常慢,并且不是非常优雅或最佳的解决方案。我希望解决方案是一个使用向量化的高效pandas函数。

此外,作为一个奖励问题(如果有人能提供帮助),如何使用向量化计算每行的索引和F(Val)索引之间的最大值?输出应如下所示:

  1. | Index | G(Val) |
  2. | -------- | -------- |
  3. | 0 | 4 |
  4. | 1 | 4 |
  5. | 2 | 4 |
  6. | 3 | 4 |
  7. | 4 | 1 |
  8. | 5 | 1 |
  9. | 6 | -1 |
  10. | 7 | -2 |
  11. | 8 | NaN |

谢谢!

英文:

I'm new to Stack Overflow, and I just have a question about solving a problem in pandas. I am looking to create a function that returns the index of the first future instance where a column is less than each row's value for that column.

For example, consider the dataframe:

  1. import numpy as np
  2. import pandas as pd
  3. df = pd.DataFrame({'Val': [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index = np.arange(0,9))
  4. df
Index Val
0 1
1 2
2 3
3 4
4 0
5 1
6 -1
7 -2
8 -3

I am looking for the output:

Index F(Val)
0 4
1 4
2 4
3 4
4 6
5 6
6 7
7 8
8 NaN

Or the series/array equivalent of F(Val).

I've been able to solve this quite easily using for loops, but obviously this is extremely slow on the large dataset I am working with an not a very elegant or optimal solution. My hope is that the solution is an efficient pandas function that employs vectorization.

Also, as a bonus question (if anyone can assist), how might the maximum value between each row's index and the F(Val) index be computed using vectorization? The output should look like:

Index G(Val)
0 4
1 4
2 4
3 4
4 1
5 1
6 -1
7 -2
8 NaN

Thanks!

答案1

得分: 1

你可以使用以下代码:

  1. grp = df['Val'].lt(df['Val'].shift()).shift(fill_value=0).cumsum()
  2. df['F(Val)'] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
  3. print(df)
  4. # 输出结果
  5. Val F(Val)
  6. 0 1 4.0
  7. 1 2 4.0
  8. 2 3 4.0
  9. 3 4 4.0
  10. 4 0 6.0
  11. 5 1 6.0
  12. 6 -1 7.0
  13. 7 -2 8.0
  14. 8 -3 NaN

注意:这是代码的翻译部分。

英文:

You can use:

  1. grp = df['Val'].lt(df['Val'].shift()).shift(fill_value=0).cumsum()
  2. df['F(Val)'] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
  3. print(df)
  4. # Output
  5. Val F(Val)
  6. 0 1 4.0
  7. 1 2 4.0
  8. 2 3 4.0
  9. 3 4 4.0
  10. 4 0 6.0
  11. 5 1 6.0
  12. 6 -1 7.0
  13. 7 -2 8.0
  14. 8 -3 NaN

答案2

得分: 1

  1. # 使用 [tag:numpy] 广播和下三角矩阵:
  2. a = df['Val'].to_numpy()
  3. m = np.tril(a[:, None] <= a, k=-1)
  4. df['F(Val)'] = np.where(m.any(0), m.argmax(0), np.nan)
  5. # 与 [`expanding`]() 的相同逻辑:
  6. df['F(Val)'] = (df.loc[::-1, 'Val'].expanding()
  7. .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1] <= x.iloc[-1]))
  8. else np.nan)
  9. )
  10. # 输出(与提供的输出有差异):
  11. Val F(Val)
  12. 0 1 5.0 # 这里下一个是5
  13. 1 2 4.0
  14. 2 3 4.0
  15. 3 4 4.0
  16. 4 2 5.0
  17. 5 -2 7.0
  18. 6 -1 7.0
  19. 7 -2 8.0
  20. 8 -3 NaN
英文:

Using [tag:numpy] broadcasting and the lower triangle:

  1. a = df[&#39;Val&#39;].to_numpy()
  2. m = np.tril(a[:,None]&lt;=a, k=-1)
  3. df[&#39;F(Val)&#39;] = np.where(m.any(0), m.argmax(0), np.nan)

Same logic with expanding:

  1. df[&#39;F(Val)&#39;] = (df.loc[::-1, &#39;Val&#39;].expanding()
  2. .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1]&lt;=x.iloc[-1]))
  3. else np.nan)
  4. )

Output (with a difference to the provided one):

  1. Val F(Val)
  2. 0 1 5.0 # here the next is 5
  3. 1 2 4.0
  4. 2 3 4.0
  5. 3 4 4.0
  6. 4 2 5.0
  7. 5 -2 7.0
  8. 6 -1 7.0
  9. 7 -2 8.0
  10. 8 -3 NaN

huangapple
  • 本文由 发表于 2023年2月16日 04:31:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465168.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定