英文:
Rolling and Mode function to get the majority of voting for rows in pandas Dataframe
问题
我有一个pandas数据框:
np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})
lbound, ubound = 0, 1
change = df["Close"].diff()
df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
# 其他条件
(change > 0) & (change > ubound),
(change < 0) & (change < lbound),
change.between(lbound, ubound)],[0, 1, -1, 0])
Close Change Result
0 54.881350 NaN 0
1 71.518937 16.637586 1
2 60.276338 -11.242599 -1
3 54.488318 -5.788019 -1
4 42.365480 -12.122838 -1
5 64.589411 22.223931 1
6 43.758721 -20.830690 -1
7 89.177300 45.418579 1
8 96.366276 7.188976 1
9 38.344152 -58.022124 -1
问题陈述 - 现在,我希望对索引1,2,3,4的大多数投票分配给结果列的索引0,将索引2,3,4,5分配给索引1,以此类推,以处理所有后续索引。
我尝试了:
df['Voting'] = df['Result'].rolling(window=4, min_periods=1).apply(lambda x: x.mode()[0]).shift()
但是,这并不产生我打算的结果。它采用前4个滚动窗口并应用模式函数。
Close Change Result Voting
0 54.881350 NaN 0 NaN
1 71.518937 16.637586 1 0.0
2 60.276338 -11.242599 -1 0.0
3 54.488318 -5.788019 -1 -1.0
4 42.36548 -12.122838 -1 -1.0
5 64.589411 22.223931 1 -1.0
6 43.758721 -20.830690 -1 -1.0
7 89.177300 45.418579 1 -1.0
8 96.366276 7.188976 1 -1.0
9 38.344152 -58.022124 -1 1.0
我打算的结果 - 对4个滚动窗口(索引1,2,3,4)应用模式函数,然后将结果分配给索引0,然后对下一个滚动窗口(索引2,3,4,5)应用结果分配给索引1,以此类推。
英文:
I have a pandas Dataframe:
np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})
lbound, ubound = 0, 1
change = df["Close"].diff()
df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
# The other conditions
(change > 0) & (change > ubound),
(change < 0) & (change < lbound),
change.between(lbound, ubound)],[0, 1, -1, 0])
Close Change Result
0 54.881350 NaN 0
1 71.518937 16.637586 1
2 60.276338 -11.242599 -1
3 54.488318 -5.788019 -1
4 42.365480 -12.122838 -1
5 64.589411 22.223931 1
6 43.758721 -20.830690 -1
7 89.177300 45.418579 1
8 96.366276 7.188976 1
9 38.344152 58.022124 -1
Problem statement - Now, I want the majority of voting for index 1,2,3,4 assigned to index 0, index 2,3,4,5 assigned to index 1 of result columns, and so on for all the subsequent indexes.
I tried:
df['Voting'] = df['Result'].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()
But,this doesn't give the result I intend. It takes the first 4 rolling window and applies the mode function.
Close Change Result Voting
0 54.881350 NaN 0 NaN
1 71.518937 16.637586 1 0.0
2 60.276338 -11.242599 -1 0.0
3 54.488318 -5.788019 -1 -1.0
4 42.36548 -12.122838 -1 -1.0
5 64.589411 22.223931 1 -1.0
6 43.758721 -20.830690 -1 -1.0
7 89.177300 45.418579 1 -1.0
8 96.366276 7.188976 1 -1.0
9 38.344152 -58.022124 -1 1.0
Result I Intend - Rolling window of 4(index 1,2,3,4) should be set and mode function be applied and result
should be assigned to index 0,then next rolling window(index 2,3,4,5) and result should
be assigned to index 1 and so on..
答案1
得分: 1
你需要在进行偏移操作之前反转列表(因为你不希望结果中包含当前索引):
majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
.apply(majority).shift())
print(df)
# 输出
Close Change Result Voting
0 54.881350 NaN 0 -1.0
1 71.518937 16.637586 1 -1.0
2 60.276338 -11.242599 -1 -1.0
3 54.488318 -5.788019 -1 0.0
4 42.365480 -12.122838 -1 1.0
5 64.589411 22.223931 1 0.0
6 43.758721 -20.830690 -1 1.0
7 89.177300 45.418579 1 0.0
8 96.366276 7.188976 1 -1.0
9 38.344152 58.022124 -1 NaN
希望这有帮助!
英文:
You have to reverse your list before then shift of 1 (because you don't want the current index in the result):
majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
.apply(majority).shift())
print(df)
# Output
Close Change Result Voting
0 54.881350 NaN 0 -1.0
1 71.518937 16.637586 1 -1.0
2 60.276338 -11.242599 -1 -1.0
3 54.488318 -5.788019 -1 0.0
4 42.365480 -12.122838 -1 1.0
5 64.589411 22.223931 1 0.0
6 43.758721 -20.830690 -1 1.0
7 89.177300 45.418579 1 0.0
8 96.366276 7.188976 1 -1.0
9 38.344152 58.022124 -1 NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论