2020年1月6日 23:46:45go评论101阅读模式

英文:

pandas rolling apply on a custom function

问题

我想在滚动基础上应用pandas.rank。
我尝试使用pandas.rolling.apply，但不幸的是rolling不与'rank'一起工作。

有没有解决办法？

df = pd.DataFrame(np.random.randn(10, 3))
def my_rank(x):
   return x.rank(pct=True)
df.rolling(3).apply(my_rank)

英文:

I would like to apply pandas.rank on a rolling basis.
I tried to used pandas.rolling.apply but unfortunately rolling doesn't work with 'rank'.

Is there a way around?

df = pd.DataFrame(np.random.randn(10, 3))
def my_rank(x):
   return x.rank(pct=True)
df.rolling(3).apply(my_rank)

答案1

得分: 2

代码部分已经翻译如下：

def my_rank(x):
   return pd.Series(x).rank(pct=True).iloc[-1}
df.rolling(3).apply(my_rank)

请注意，翻译仅包括代码部分，不包括解释部分。

英文:

Code:

def my_rank(x):
   return pd.Series(x).rank(pct=True).iloc[-1]
df.rolling(3).apply(my_rank)

Output:

          0         1         2
0       NaN       NaN       NaN
1       NaN       NaN       NaN
2  0.666667  0.333333  0.666667
3  1.000000  0.333333  1.000000
4  0.666667  1.000000  0.333333
5  0.333333  0.666667  0.666667
6  1.000000  0.333333  0.666667
7  0.333333  0.333333  1.000000
8  1.000000  0.666667  1.000000
9  0.666667  1.000000  0.666667

Explanation:

Your code (great minimal reproduceable example btw!) threw the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'rank'.
Which meant the x in your my_rank function was getting passed as a numpy array, not a pandas Series. So first I updated return x.rank... to return pd.Series(x).rank..

Then I got the following error:
TypeError: cannot convert the series to <class 'float'>
Which makes sense, because pd.Series.rank takes a series of n numbers and returns a (ranked) series of n numbers. But since we're calling rank not once on a series, but repeatedly on a rolling window of a series, we only want one number as output for each rolling calculation. Hence the iloc[-1]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas 在自定义函数上进行滚动应用

问题

答案1

嵌套的Django中带有数值的for循环。

Streamlit 多选框，选项为列表

Python converting a column in df of strings in format "%M:%S.%f" into float of number of seconds

从一个单词中提取字符串中的数字

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。