2023年2月16日 04:31:39go评论79阅读模式

英文:

pandas function to find index of the first future instance where column is less than each row's value

问题

我是新来的Stack Overflow用户，我有一个关于在pandas中解决问题的问题。我想要创建一个函数，该函数返回第一个未来实例的索引，其中某一列小于该列的每一行的值。

例如，考虑以下数据框：

import numpy as np
import pandas as pd
df = pd.DataFrame({'Val': [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index=np.arange(0, 9))
df

我在寻找的输出是：

| Index    | F(Val)   |
| -------- | -------- |
| 0        | 4        |
| 1        | 4        |
| 2        | 4        |
| 3        | 4        |
| 4        | 6        |
| 5        | 6        |
| 6        | 7        |
| 7        | 8        |
| 8        | NaN      |

或等效于F(Val)的系列/数组。

我已经能够很容易地使用for循环解决这个问题，但显然在我正在处理的大型数据集上非常慢，并且不是非常优雅或最佳的解决方案。我希望解决方案是一个使用向量化的高效pandas函数。

此外，作为一个奖励问题（如果有人能提供帮助），如何使用向量化计算每行的索引和F(Val)索引之间的最大值？输出应如下所示：

| Index    | G(Val)   |
| -------- | -------- |
| 0        | 4        |
| 1        | 4        |
| 2        | 4        |
| 3        | 4        |
| 4        | 1        |
| 5        | 1        |
| 6        | -1       |
| 7        | -2       |
| 8        | NaN      |

谢谢！

英文:

I'm new to Stack Overflow, and I just have a question about solving a problem in pandas. I am looking to create a function that returns the index of the first future instance where a column is less than each row's value for that column.

For example, consider the dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame({&#39;Val&#39;: [1, 2, 3, 4, 0, 1, -1, -2, -3]}, index = np.arange(0,9))
df

Index	Val
0	1
1	2
2	3
3	4
4	0
5	1
6	-1
7	-2
8	-3

I am looking for the output:

Index	F(Val)
0	4
1	4
2	4
3	4
4	6
5	6
6	7
7	8
8	NaN

Or the series/array equivalent of F(Val).

I've been able to solve this quite easily using for loops, but obviously this is extremely slow on the large dataset I am working with an not a very elegant or optimal solution. My hope is that the solution is an efficient pandas function that employs vectorization.

Also, as a bonus question (if anyone can assist), how might the maximum value between each row's index and the F(Val) index be computed using vectorization? The output should look like:

Index	G(Val)
0	4
1	4
2	4
3	4
4	1
5	1
6	-1
7	-2
8	NaN

Thanks!

答案1

得分: 1

你可以使用以下代码：

grp = df['Val'].lt(df['Val'].shift()).shift(fill_value=0).cumsum()
df['F(Val)'] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
print(df)
# 输出结果
   Val  F(Val)
0    1     4.0
1    2     4.0
2    3     4.0
3    4     4.0
4    0     6.0
5    1     6.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

注意：这是代码的翻译部分。

英文:

You can use:

grp = df[&#39;Val&#39;].lt(df[&#39;Val&#39;].shift()).shift(fill_value=0).cumsum()
df[&#39;F(Val)&#39;] = df.groupby(grp).transform(lambda x: x.index[-1]).shift(-1)
print(df)
# Output
   Val  F(Val)
0    1     4.0
1    2     4.0
2    3     4.0
3    4     4.0
4    0     6.0
5    1     6.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

答案2

得分: 1

# 使用 [tag:numpy] 广播和下三角矩阵：
a = df['Val'].to_numpy()
m = np.tril(a[:, None] <= a, k=-1)
df['F(Val)'] = np.where(m.any(0), m.argmax(0), np.nan)
# 与 [`expanding`]() 的相同逻辑：
df['F(Val)'] = (df.loc[::-1, 'Val'].expanding()
                  .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1] <= x.iloc[-1]))
                         else np.nan)
               )
# 输出（与提供的输出有差异）：
   Val  F(Val)
0    1     5.0  # 这里下一个是5
1    2     4.0
2    3     4.0
3    4     4.0
4    2     5.0
5   -2     7.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

英文:

Using [tag:numpy] broadcasting and the lower triangle:

a = df[&#39;Val&#39;].to_numpy()
m = np.tril(a[:,None]&lt;=a, k=-1)
df[&#39;F(Val)&#39;] = np.where(m.any(0), m.argmax(0), np.nan)

Same logic with expanding:

df[&#39;F(Val)&#39;] = (df.loc[::-1, &#39;Val&#39;].expanding()
                  .apply(lambda x: s.idxmax() if len(s:=(x.iloc[-2::-1]&lt;=x.iloc[-1]))
                         else np.nan)
               )

Output (with a difference to the provided one):

   Val  F(Val)
0    1     5.0  # here the next is 5
1    2     4.0
2    3     4.0
3    4     4.0
4    2     5.0
5   -2     7.0
6   -1     7.0
7   -2     8.0
8   -3     NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas函数，用于查找第一个未来实例的索引，其中列的值小于每行的值。

问题

答案1

答案2

将 .csv 文件导入本地 SQL 服务器，使用预先创建的表，使用 Python。

在Pandas crosstable中对索引进行分组。

如何在Plotly中构建按值排序的多值列？

将Pivot Like数据转换为JSON使用Python或Pandas

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。