pandas函数,用于查找第一个未来实例的索引,其中列的值小于每行的值。

huangapple go评论47阅读模式
英文:

pandas function to find index of the first future instance where column is less than each row's value

问题

我是新来的Stack Overflow用户,我有一个关于在pandas中解决问题的问题。我想要创建一个函数,该函数返回第一个未来实例的索引,其中某一列小于该列的每一行的值。

例如,考虑以下数据框:

import numpy as np
import pandas as pd
df = pd.DataFrame({'Val': [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index=np.arange(0, 9))
df

我在寻找的输出是:

| Index    | F(Val)   |
| -------- | -------- |
| 0        | 4        |
| 1        | 4        |
| 2        | 4        |
| 3        | 4        |
| 4        | 6        |
| 5        | 6        |
| 6        | 7        |
| 7        | 8        |
| 8        | NaN      |

或等效于F(Val)的系列/数组。

我已经能够很容易地使用for循环解决这个问题,但显然在我正在处理的大型数据集上非常慢,并且不是非常优雅或最佳的解决方案。我希望解决方案是一个使用向量化的高效pandas函数。

此外,作为一个奖励问题(如果有人能提供帮助),如何使用向量化计算每行的索引和F(Val)索引之间的最大值?输出应如下所示:

| Index    | G(Val)   |
| -------- | -------- |
| 0        | 4        |
| 1        | 4        |
| 2        | 4        |
| 3        | 4        |
| 4        | 1        |
| 5        | 1        |
| 6        | -1       |
| 7        | -2       |
| 8        | NaN      |

谢谢!

英文:

I'm new to Stack Overflow, and I just have a question about solving a problem in pandas. I am looking to create a function that returns the index of the first future instance where a column is less than each row's value for that column.

For example, consider the dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame({'Val': [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index = np.arange(0,9))
df
Index Val
0 1
1 2
2 3
3 4
4 0
5 1
6 -1
7 -2
8 -3

I am looking for the output:

Index F(Val)
0 4
1 4
2 4
3 4
4 6
5 6
6 7
7 8
8 NaN

Or the series/array equivalent of F(Val).

I've been able to solve this quite easily using for loops, but obviously this is extremely slow on the large dataset I am working with an not a very elegant or optimal solution. My hope is that the solution is an efficient pandas function that employs vectorization.

Also, as a bonus question (if anyone can assist), how might the maximum value between each row's index and the F(Val) index be computed using vectorization? The output should look like:

Index G(Val)
0 4
1 4
2 4
3 4
4 1
5 1
6 -1
7 -2
8 NaN

Thanks!

答案1

得分: 1

你可以使用以下代码:

grp = df['Val'].lt(df['Val'].shift()).shift(fill_value=0).cumsum()
df['F(Val)'] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
print(df)

# 输出结果
   Val  F(Val)
0    1     4.0
1    2     4.0
2    3     4.0
3    4     4.0
4    0     6.0
5    1     6.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

注意:这是代码的翻译部分。

英文:

You can use:

grp = df['Val'].lt(df['Val'].shift()).shift(fill_value=0).cumsum()
df['F(Val)'] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
print(df)

# Output
   Val  F(Val)
0    1     4.0
1    2     4.0
2    3     4.0
3    4     4.0
4    0     6.0
5    1     6.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

答案2

得分: 1

# 使用 [tag:numpy] 广播和下三角矩阵:
a = df['Val'].to_numpy()
m = np.tril(a[:, None] <= a, k=-1)
df['F(Val)'] = np.where(m.any(0), m.argmax(0), np.nan)

# 与 [`expanding`]() 的相同逻辑:
df['F(Val)'] = (df.loc[::-1, 'Val'].expanding()
                  .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1] <= x.iloc[-1]))
                         else np.nan)
               )

# 输出(与提供的输出有差异):
   Val  F(Val)
0    1     5.0  # 这里下一个是5
1    2     4.0
2    3     4.0
3    4     4.0
4    2     5.0
5   -2     7.0
6   -1     7.0
7   -2     8.0
8   -3     NaN
英文:

Using [tag:numpy] broadcasting and the lower triangle:

a = df[&#39;Val&#39;].to_numpy()
m = np.tril(a[:,None]&lt;=a, k=-1)
df[&#39;F(Val)&#39;] = np.where(m.any(0), m.argmax(0), np.nan)

Same logic with expanding:

df[&#39;F(Val)&#39;] = (df.loc[::-1, &#39;Val&#39;].expanding()
                  .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1]&lt;=x.iloc[-1]))
                         else np.nan)
               )

Output (with a difference to the provided one):

   Val  F(Val)
0    1     5.0  # here the next is 5
1    2     4.0
2    3     4.0
3    4     4.0
4    2     5.0
5   -2     7.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

huangapple
  • 本文由 发表于 2023年2月16日 04:31:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465168.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定