Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

huangapple go评论65阅读模式
英文:

Rolling and Mode function to get the majority of voting for rows in pandas Dataframe

问题

我有一个pandas数据框:

np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})

lbound, ubound = 0, 1
change = df["Close"].diff()

df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
                    # 其他条件
                    (change > 0) & (change > ubound),
                    (change < 0) & (change < lbound),
                     change.between(lbound, ubound)],[0, 1, -1, 0])
      Close         Change      Result
0	54.881350	     NaN	         0
1	71.518937	    16.637586	    1
2	60.276338      -11.242599      -1
3	54.488318      -5.788019       -1
4	42.365480      -12.122838      -1
5	64.589411	    22.223931	    1
6	43.758721      -20.830690      -1
7	89.177300	    45.418579	    1
8	96.366276	    7.188976	    1
9	38.344152      -58.022124      -1

问题陈述 - 现在,我希望对索引1,2,3,4的大多数投票分配给结果列的索引0,将索引2,3,4,5分配给索引1,以此类推,以处理所有后续索引。

我尝试了:

df['Voting'] = df['Result'].rolling(window=4, min_periods=1).apply(lambda x: x.mode()[0]).shift()

但是,这并不产生我打算的结果。它采用前4个滚动窗口并应用模式函数。

     Close         Change        Result    Voting
0	54.881350	     NaN	        0	    NaN
1	71.518937	    16.637586	    1	    0.0
2	60.276338      -11.242599      -1	    0.0
3	54.488318      -5.788019       -1      -1.0
4	42.36548       -12.122838      -1      -1.0
5	64.589411	    22.223931	    1      -1.0
6	43.758721      -20.830690      -1      -1.0
7	89.177300	    45.418579	    1      -1.0
8	96.366276	    7.188976	    1      -1.0
9	38.344152      -58.022124      -1	    1.0

我打算的结果 - 对4个滚动窗口(索引1,2,3,4)应用模式函数,然后将结果分配给索引0,然后对下一个滚动窗口(索引2,3,4,5)应用结果分配给索引1,以此类推。

英文:

I have a pandas Dataframe:


np.random.seed(0)
df = pd.DataFrame({&#39;Close&#39;: np.random.uniform(0, 100, size=10)})

lbound, ubound = 0, 1
change = df[&quot;Close&quot;].diff()

df[&quot;Change&quot;] = change
df[&quot;Result&quot;] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
                        # The other conditions
                        (change &gt; 0) &amp; (change &gt; ubound),
                        (change &lt; 0) &amp; (change &lt; lbound),
                         change.between(lbound, ubound)],[0, 1, -1, 0])
      Close	          Change	   Result
0	54.881350	        NaN	         0
1	71.518937	      16.637586	     1
2	60.276338        -11.242599	    -1
3	54.488318        -5.788019	    -1
4	42.365480        -12.122838	    -1
5	64.589411	     22.223931	     1
6	43.758721       -20.830690	    -1
7	89.177300	     45.418579	     1
8	96.366276	     7.188976	     1
9	38.344152	     58.022124	    -1


Problem statement - Now, I want the majority of voting for index 1,2,3,4 assigned to index 0, index 2,3,4,5 assigned to index 1 of result columns, and so on for all the subsequent indexes.

I tried:

df[&#39;Voting&#39;] = df[&#39;Result&#39;].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()

But,this doesn't give the result I intend. It takes the first 4 rolling window and applies the mode function.

     Close	        Change	     Result	   Voting
0	54.881350	     NaN	        0	    NaN
1	71.518937	    16.637586	    1	    0.0
2	60.276338      -11.242599      -1	    0.0
3	54.488318      -5.788019       -1      -1.0
4	42.36548       -12.122838      -1      -1.0
5	64.589411	    22.223931	    1      -1.0
6	43.758721      -20.830690      -1      -1.0
7	89.177300	    45.418579	    1      -1.0
8	96.366276	    7.188976	    1      -1.0
9	38.344152      -58.022124      -1	    1.0

Result I Intend - Rolling window of 4(index 1,2,3,4) should be set and mode function be applied and result
should be assigned to index 0,then next rolling window(index 2,3,4,5) and result should
be assigned to index 1 and so on..

答案1

得分: 1

你需要在进行偏移操作之前反转列表(因为你不希望结果中包含当前索引):

majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
                        .apply(majority).shift())
print(df)

# 输出
       Close     Change  Result  Voting
0  54.881350        NaN       0    -1.0
1  71.518937  16.637586       1    -1.0
2  60.276338 -11.242599      -1    -1.0
3  54.488318  -5.788019      -1     0.0
4  42.365480 -12.122838      -1     1.0
5  64.589411  22.223931       1     0.0
6  43.758721 -20.830690      -1     1.0
7  89.177300  45.418579       1     0.0
8  96.366276   7.188976       1    -1.0
9  38.344152  58.022124      -1     NaN

希望这有帮助!

英文:

You have to reverse your list before then shift of 1 (because you don't want the current index in the result):

majority = lambda x: 0 if len((m := x.mode())) &gt; 1 else m[0]
df[&#39;Voting&#39;] = (df[::-1].rolling(4, min_periods=1)[&#39;Result&#39;]
                        .apply(majority).shift())
print(df)

# Output
       Close     Change  Result  Voting
0  54.881350        NaN       0    -1.0
1  71.518937  16.637586       1    -1.0
2  60.276338 -11.242599      -1    -1.0
3  54.488318  -5.788019      -1     0.0
4  42.365480 -12.122838      -1     1.0
5  64.589411  22.223931       1     0.0
6  43.758721 -20.830690      -1     1.0
7  89.177300  45.418579       1     0.0
8  96.366276   7.188976       1    -1.0
9  38.344152  58.022124      -1     NaN

huangapple
  • 本文由 发表于 2023年2月18日 15:39:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定