英文:
Pandas: Group by and count days before and after specific date
问题
以下是要添加的两列的翻译:
- days_after - 计算距离“third_friday”之后直到下一个“third_friday”的天数(按“name”列分组)
- days_before - 计算距离“third_friday”之前的天数(按“name”列分组)
期望的输出如下:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
英文:
Dataframe to start with:
df = pd.DataFrame([
{'date': date(2023, 1, 1), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 1, 1), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
])
I want to add two additional columns:
- days_after - counts days after "third_friday" and until next third friday (grouped by "name" column)
- days_before - counts days before "third_friday" (grouped by "name" column)
Expected output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
答案1
得分: 2
你可以使用以下代码:
# 获取两列之间的差异
s = df['date'].sub(df['third_friday'])
# 是否大于0?
s2 = s.ge('0')
# 是否不等于0?
m = s.ne('0')
# 从每个s2的第一个True开始形成分组
group = (s2&~s2.shift(fill_value=False)).cumsum()
# 设置分组器
g = df[m].groupby(['name', group])
# 每个组的上升和下降计数
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)
输出如下:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
英文:
You can use:
# get difference between 2 columns
s = df['date'].sub(df['third_friday'])
# is it > 0?
s2 = s.ge('0')
# is it not 0?
m = s.ne('0')
# form groups starting on each first True of s2
group = (s2&~s2.shift(fill_value=False)).cumsum()
# set up grouper
g = df[m].groupby(['name', group])
# up and down count per group
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)
Output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论