2023年2月18日 15:39:34go评论103阅读模式

英文:

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

问题

我有一个pandas数据框：

np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})
lbound, ubound = 0, 1
change = df["Close"].diff()
df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
                    # 其他条件
                    (change > 0) & (change > ubound),
                    (change < 0) & (change < lbound),
                     change.between(lbound, ubound)],[0, 1, -1, 0])

      Close         Change      Result
0	54.881350	     NaN	         0
1	71.518937	    16.637586	    1
2	60.276338      -11.242599      -1
3	54.488318      -5.788019       -1
4	42.365480      -12.122838      -1
5	64.589411	    22.223931	    1
6	43.758721      -20.830690      -1
7	89.177300	    45.418579	    1
8	96.366276	    7.188976	    1
9	38.344152      -58.022124      -1

问题陈述 - 现在，我希望对索引1,2,3,4的大多数投票分配给结果列的索引0，将索引2,3,4,5分配给索引1，以此类推，以处理所有后续索引。

我尝试了：

df['Voting'] = df['Result'].rolling(window=4, min_periods=1).apply(lambda x: x.mode()[0]).shift()

但是，这并不产生我打算的结果。它采用前4个滚动窗口并应用模式函数。

     Close         Change        Result    Voting
0	54.881350	     NaN	        0	    NaN
1	71.518937	    16.637586	    1	    0.0
2	60.276338      -11.242599      -1	    0.0
3	54.488318      -5.788019       -1      -1.0
4	42.36548       -12.122838      -1      -1.0
5	64.589411	    22.223931	    1      -1.0
6	43.758721      -20.830690      -1      -1.0
7	89.177300	    45.418579	    1      -1.0
8	96.366276	    7.188976	    1      -1.0
9	38.344152      -58.022124      -1	    1.0

我打算的结果 - 对4个滚动窗口（索引1,2,3,4）应用模式函数，然后将结果分配给索引0，然后对下一个滚动窗口（索引2,3,4,5）应用结果分配给索引1，以此类推。

英文:

I have a pandas Dataframe:


np.random.seed(0)
df = pd.DataFrame({&#39;Close&#39;: np.random.uniform(0, 100, size=10)})
lbound, ubound = 0, 1
change = df[&quot;Close&quot;].diff()
df[&quot;Change&quot;] = change
df[&quot;Result&quot;] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
                        # The other conditions
                        (change &gt; 0) &amp; (change &gt; ubound),
                        (change &lt; 0) &amp; (change &lt; lbound),
                         change.between(lbound, ubound)],[0, 1, -1, 0])

      Close	          Change	   Result
0	54.881350	        NaN	         0
1	71.518937	      16.637586	     1
2	60.276338        -11.242599	    -1
3	54.488318        -5.788019	    -1
4	42.365480        -12.122838	    -1
5	64.589411	     22.223931	     1
6	43.758721       -20.830690	    -1
7	89.177300	     45.418579	     1
8	96.366276	     7.188976	     1
9	38.344152	     58.022124	    -1

Problem statement - Now, I want the majority of voting for index 1,2,3,4 assigned to index 0, index 2,3,4,5 assigned to index 1 of result columns, and so on for all the subsequent indexes.

I tried:

df[&#39;Voting&#39;] = df[&#39;Result&#39;].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()

But,this doesn't give the result I intend. It takes the first 4 rolling window and applies the mode function.

     Close	        Change	     Result	   Voting
0	54.881350	     NaN	        0	    NaN
1	71.518937	    16.637586	    1	    0.0
2	60.276338      -11.242599      -1	    0.0
3	54.488318      -5.788019       -1      -1.0
4	42.36548       -12.122838      -1      -1.0
5	64.589411	    22.223931	    1      -1.0
6	43.758721      -20.830690      -1      -1.0
7	89.177300	    45.418579	    1      -1.0
8	96.366276	    7.188976	    1      -1.0
9	38.344152      -58.022124      -1	    1.0

Result I Intend - Rolling window of 4(index 1,2,3,4) should be set and mode function be applied and result
should be assigned to index 0,then next rolling window(index 2,3,4,5) and result should
be assigned to index 1 and so on..

答案1

得分: 1

你需要在进行偏移操作之前反转列表（因为你不希望结果中包含当前索引）：

majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
                        .apply(majority).shift())
print(df)
# 输出
       Close     Change  Result  Voting
0  54.881350        NaN       0    -1.0
1  71.518937  16.637586       1    -1.0
2  60.276338 -11.242599      -1    -1.0
3  54.488318  -5.788019      -1     0.0
4  42.365480 -12.122838      -1     1.0
5  64.589411  22.223931       1     0.0
6  43.758721 -20.830690      -1     1.0
7  89.177300  45.418579       1     0.0
8  96.366276   7.188976       1    -1.0
9  38.344152  58.022124      -1     NaN

希望这有帮助！

英文:

You have to reverse your list before then shift of 1 (because you don't want the current index in the result):

majority = lambda x: 0 if len((m := x.mode())) &gt; 1 else m[0]
df[&#39;Voting&#39;] = (df[::-1].rolling(4, min_periods=1)[&#39;Result&#39;]
                        .apply(majority).shift())
print(df)
# Output
       Close     Change  Result  Voting
0  54.881350        NaN       0    -1.0
1  71.518937  16.637586       1    -1.0
2  60.276338 -11.242599      -1    -1.0
3  54.488318  -5.788019      -1     0.0
4  42.365480 -12.122838      -1     1.0
5  64.589411  22.223931       1     0.0
6  43.758721 -20.830690      -1     1.0
7  89.177300  45.418579       1     0.0
8  96.366276   7.188976       1    -1.0
9  38.344152  58.022124      -1     NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

问题

答案1

一个打印XO方块的函数

How can I make this while True loop run faster and work properly? It detects the presence of 3 unwanted items on screen, when gone, alert triggered

数据为什么没有添加到单元格？

How could I make my decryption program recognize which character the encrypted character belongs to, if each character could have 3 possible answers?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。