Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

huangapple go评论103阅读模式
英文:

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

问题

我有一个pandas数据框:

  1. np.random.seed(0)
  2. df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})
  3. lbound, ubound = 0, 1
  4. change = df["Close"].diff()
  5. df["Change"] = change
  6. df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
  7. # 其他条件
  8. (change > 0) & (change > ubound),
  9. (change < 0) & (change < lbound),
  10. change.between(lbound, ubound)],[0, 1, -1, 0])
  1. Close Change Result
  2. 0 54.881350 NaN 0
  3. 1 71.518937 16.637586 1
  4. 2 60.276338 -11.242599 -1
  5. 3 54.488318 -5.788019 -1
  6. 4 42.365480 -12.122838 -1
  7. 5 64.589411 22.223931 1
  8. 6 43.758721 -20.830690 -1
  9. 7 89.177300 45.418579 1
  10. 8 96.366276 7.188976 1
  11. 9 38.344152 -58.022124 -1

问题陈述 - 现在,我希望对索引1,2,3,4的大多数投票分配给结果列的索引0,将索引2,3,4,5分配给索引1,以此类推,以处理所有后续索引。

我尝试了:

  1. df['Voting'] = df['Result'].rolling(window=4, min_periods=1).apply(lambda x: x.mode()[0]).shift()

但是,这并不产生我打算的结果。它采用前4个滚动窗口并应用模式函数。

  1. Close Change Result Voting
  2. 0 54.881350 NaN 0 NaN
  3. 1 71.518937 16.637586 1 0.0
  4. 2 60.276338 -11.242599 -1 0.0
  5. 3 54.488318 -5.788019 -1 -1.0
  6. 4 42.36548 -12.122838 -1 -1.0
  7. 5 64.589411 22.223931 1 -1.0
  8. 6 43.758721 -20.830690 -1 -1.0
  9. 7 89.177300 45.418579 1 -1.0
  10. 8 96.366276 7.188976 1 -1.0
  11. 9 38.344152 -58.022124 -1 1.0

我打算的结果 - 对4个滚动窗口(索引1,2,3,4)应用模式函数,然后将结果分配给索引0,然后对下一个滚动窗口(索引2,3,4,5)应用结果分配给索引1,以此类推。

英文:

I have a pandas Dataframe:

  1. np.random.seed(0)
  2. df = pd.DataFrame({&#39;Close&#39;: np.random.uniform(0, 100, size=10)})
  3. lbound, ubound = 0, 1
  4. change = df[&quot;Close&quot;].diff()
  5. df[&quot;Change&quot;] = change
  6. df[&quot;Result&quot;] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
  7. # The other conditions
  8. (change &gt; 0) &amp; (change &gt; ubound),
  9. (change &lt; 0) &amp; (change &lt; lbound),
  10. change.between(lbound, ubound)],[0, 1, -1, 0])
  1. Close Change Result
  2. 0 54.881350 NaN 0
  3. 1 71.518937 16.637586 1
  4. 2 60.276338 -11.242599 -1
  5. 3 54.488318 -5.788019 -1
  6. 4 42.365480 -12.122838 -1
  7. 5 64.589411 22.223931 1
  8. 6 43.758721 -20.830690 -1
  9. 7 89.177300 45.418579 1
  10. 8 96.366276 7.188976 1
  11. 9 38.344152 58.022124 -1

Problem statement - Now, I want the majority of voting for index 1,2,3,4 assigned to index 0, index 2,3,4,5 assigned to index 1 of result columns, and so on for all the subsequent indexes.

I tried:

  1. df[&#39;Voting&#39;] = df[&#39;Result&#39;].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()

But,this doesn't give the result I intend. It takes the first 4 rolling window and applies the mode function.

  1. Close Change Result Voting
  2. 0 54.881350 NaN 0 NaN
  3. 1 71.518937 16.637586 1 0.0
  4. 2 60.276338 -11.242599 -1 0.0
  5. 3 54.488318 -5.788019 -1 -1.0
  6. 4 42.36548 -12.122838 -1 -1.0
  7. 5 64.589411 22.223931 1 -1.0
  8. 6 43.758721 -20.830690 -1 -1.0
  9. 7 89.177300 45.418579 1 -1.0
  10. 8 96.366276 7.188976 1 -1.0
  11. 9 38.344152 -58.022124 -1 1.0

Result I Intend - Rolling window of 4(index 1,2,3,4) should be set and mode function be applied and result
should be assigned to index 0,then next rolling window(index 2,3,4,5) and result should
be assigned to index 1 and so on..

答案1

得分: 1

你需要在进行偏移操作之前反转列表(因为你不希望结果中包含当前索引):

  1. majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
  2. df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
  3. .apply(majority).shift())
  4. print(df)
  5. # 输出
  6. Close Change Result Voting
  7. 0 54.881350 NaN 0 -1.0
  8. 1 71.518937 16.637586 1 -1.0
  9. 2 60.276338 -11.242599 -1 -1.0
  10. 3 54.488318 -5.788019 -1 0.0
  11. 4 42.365480 -12.122838 -1 1.0
  12. 5 64.589411 22.223931 1 0.0
  13. 6 43.758721 -20.830690 -1 1.0
  14. 7 89.177300 45.418579 1 0.0
  15. 8 96.366276 7.188976 1 -1.0
  16. 9 38.344152 58.022124 -1 NaN

希望这有帮助!

英文:

You have to reverse your list before then shift of 1 (because you don't want the current index in the result):

  1. majority = lambda x: 0 if len((m := x.mode())) &gt; 1 else m[0]
  2. df[&#39;Voting&#39;] = (df[::-1].rolling(4, min_periods=1)[&#39;Result&#39;]
  3. .apply(majority).shift())
  4. print(df)
  5. # Output
  6. Close Change Result Voting
  7. 0 54.881350 NaN 0 -1.0
  8. 1 71.518937 16.637586 1 -1.0
  9. 2 60.276338 -11.242599 -1 -1.0
  10. 3 54.488318 -5.788019 -1 0.0
  11. 4 42.365480 -12.122838 -1 1.0
  12. 5 64.589411 22.223931 1 0.0
  13. 6 43.758721 -20.830690 -1 1.0
  14. 7 89.177300 45.418579 1 0.0
  15. 8 96.366276 7.188976 1 -1.0
  16. 9 38.344152 58.022124 -1 NaN

huangapple
  • 本文由 发表于 2023年2月18日 15:39:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定