Pandas:按特定日期之前和之后的天数分组并计数

huangapple go评论68阅读模式
英文:

Pandas: Group by and count days before and after specific date

问题

以下是要添加的两列的翻译:

  • days_after - 计算距离“third_friday”之后直到下一个“third_friday”的天数(按“name”列分组)
  • days_before - 计算距离“third_friday”之前的天数(按“name”列分组)

期望的输出如下:

          date name third_friday  days_after  days_before
0   2023-01-01   AA   2023-01-20           1            4
1   2023-01-02   AA   2023-01-20           2            3
2   2023-01-05   AA   2023-01-20           3            2
3   2023-01-06   AA   2023-01-20           4            1
4   2023-01-20   AA   2023-01-20           0            0
5   2023-01-28   AA   2023-01-20           1            6
6   2023-01-29   AA   2023-01-20           2            5
7   2023-02-01   AA   2023-02-17           3            4
8   2023-02-02   AA   2023-02-17           4            3
9   2023-02-05   AA   2023-02-17           5            2
10  2023-02-06   AA   2023-02-17           6            1
11  2023-02-20   AA   2023-02-17           1            3
12  2023-02-27   AA   2023-02-17           2            2
13  2023-02-28   AA   2023-02-17           3            1
14  2023-01-01   BB   2023-01-20           1            4
15  2023-01-02   BB   2023-01-20           2            3
16  2023-01-05   BB   2023-01-20           3            2
17  2023-01-06   BB   2023-01-20           4            1
18  2023-01-20   BB   2023-01-20           0            0
19  2023-01-28   BB   2023-01-20           1            6
20  2023-01-29   BB   2023-01-20           2            5
21  2023-02-01   BB   2023-02-17           3            4
22  2023-02-02   BB   2023-02-17           4            3
23  2023-02-05   BB   2023-02-17           5            2
24  2023-02-06   BB   2023-02-17           6            1
25  2023-02-20   BB   2023-02-17           1            3
26  2023-02-27   BB   2023-02-17           2            2
27  2023-02-28   BB   2023-02-17           3            1
英文:

Dataframe to start with:

df = pd.DataFrame([
{'date': date(2023, 1, 1), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 1, 1), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
])

I want to add two additional columns:

  • days_after - counts days after "third_friday" and until next third friday (grouped by "name" column)
  • days_before - counts days before "third_friday" (grouped by "name" column)

Expected output:

          date name third_friday  days_after  days_before
0   2023-01-01   AA   2023-01-20           1            4
1   2023-01-02   AA   2023-01-20           2            3
2   2023-01-05   AA   2023-01-20           3            2
3   2023-01-06   AA   2023-01-20           4            1
4   2023-01-20   AA   2023-01-20           0            0
5   2023-01-28   AA   2023-01-20           1            6
6   2023-01-29   AA   2023-01-20           2            5
7   2023-02-01   AA   2023-02-17           3            4
8   2023-02-02   AA   2023-02-17           4            3
9   2023-02-05   AA   2023-02-17           5            2
10  2023-02-06   AA   2023-02-17           6            1
11  2023-02-20   AA   2023-02-17           1            3
12  2023-02-27   AA   2023-02-17           2            2
13  2023-02-28   AA   2023-02-17           3            1
14  2023-01-01   BB   2023-01-20           1            4
15  2023-01-02   BB   2023-01-20           2            3
16  2023-01-05   BB   2023-01-20           3            2
17  2023-01-06   BB   2023-01-20           4            1
18  2023-01-20   BB   2023-01-20           0            0
19  2023-01-28   BB   2023-01-20           1            6
20  2023-01-29   BB   2023-01-20           2            5
21  2023-02-01   BB   2023-02-17           3            4
22  2023-02-02   BB   2023-02-17           4            3
23  2023-02-05   BB   2023-02-17           5            2
24  2023-02-06   BB   2023-02-17           6            1
25  2023-02-20   BB   2023-02-17           1            3
26  2023-02-27   BB   2023-02-17           2            2
27  2023-02-28   BB   2023-02-17           3            1

答案1

得分: 2

你可以使用以下代码:

# 获取两列之间的差异
s = df['date'].sub(df['third_friday'])
# 是否大于0?
s2 = s.ge('0')
# 是否不等于0?
m = s.ne('0')
# 从每个s2的第一个True开始形成分组
group = (s2&~s2.shift(fill_value=False)).cumsum()

# 设置分组器
g = df[m].groupby(['name', group])

# 每个组的上升和下降计数
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)

输出如下:

         date name third_friday  days_after  days_before
0  2023-01-01   AA   2023-01-20           1            4
1  2023-01-02   AA   2023-01-20           2            3
2  2023-01-05   AA   2023-01-20           3            2
3  2023-01-06   AA   2023-01-20           4            1
4  2023-01-20   AA   2023-01-20           0            0
5  2023-01-28   AA   2023-01-20           1            6
6  2023-01-29   AA   2023-01-20           2            5
7  2023-02-01   AA   2023-02-17           3            4
8  2023-02-02   AA   2023-02-17           4            3
9  2023-02-05   AA   2023-02-17           5            2
10 2023-02-06   AA   2023-02-17           6            1
11 2023-02-20   AA   2023-02-17           1            3
12 2023-02-27   AA   2023-02-17           2            2
13 2023-02-28   AA   2023-02-17           3            1
14 2023-01-01   BB   2023-01-20           1            4
15 2023-01-02   BB   2023-01-20           2            3
16 2023-01-05   BB   2023-01-20           3            2
17 2023-01-06   BB   2023-01-20           4            1
18 2023-01-20   BB   2023-01-20           0            0
19 2023-01-28   BB   2023-01-20           1            6
20 2023-01-29   BB   2023-01-20           2            5
21 2023-02-01   BB   2023-02-17           3            4
22 2023-02-02   BB   2023-02-17           4            3
23 2023-02-05   BB   2023-02-17           5            2
24 2023-02-06   BB   2023-02-17           6            1
25 2023-02-20   BB   2023-02-17           1            3
26 2023-02-27   BB   2023-02-17           2            2
27 2023-02-28   BB   2023-02-17           3            1
英文:

You can use:

# get difference between 2 columns
s = df['date'].sub(df['third_friday'])
# is it > 0?
s2 = s.ge('0')
# is it not 0?
m = s.ne('0')
# form groups starting on each first True of s2
group = (s2&~s2.shift(fill_value=False)).cumsum()
# set up grouper
g = df[m].groupby(['name', group])
# up and down count per group
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)

Output:

         date name third_friday  days_after  days_before
0  2023-01-01   AA   2023-01-20           1            4
1  2023-01-02   AA   2023-01-20           2            3
2  2023-01-05   AA   2023-01-20           3            2
3  2023-01-06   AA   2023-01-20           4            1
4  2023-01-20   AA   2023-01-20           0            0
5  2023-01-28   AA   2023-01-20           1            6
6  2023-01-29   AA   2023-01-20           2            5
7  2023-02-01   AA   2023-02-17           3            4
8  2023-02-02   AA   2023-02-17           4            3
9  2023-02-05   AA   2023-02-17           5            2
10 2023-02-06   AA   2023-02-17           6            1
11 2023-02-20   AA   2023-02-17           1            3
12 2023-02-27   AA   2023-02-17           2            2
13 2023-02-28   AA   2023-02-17           3            1
14 2023-01-01   BB   2023-01-20           1            4
15 2023-01-02   BB   2023-01-20           2            3
16 2023-01-05   BB   2023-01-20           3            2
17 2023-01-06   BB   2023-01-20           4            1
18 2023-01-20   BB   2023-01-20           0            0
19 2023-01-28   BB   2023-01-20           1            6
20 2023-01-29   BB   2023-01-20           2            5
21 2023-02-01   BB   2023-02-17           3            4
22 2023-02-02   BB   2023-02-17           4            3
23 2023-02-05   BB   2023-02-17           5            2
24 2023-02-06   BB   2023-02-17           6            1
25 2023-02-20   BB   2023-02-17           1            3
26 2023-02-27   BB   2023-02-17           2            2
27 2023-02-28   BB   2023-02-17           3            1

huangapple
  • 本文由 发表于 2023年3月7日 19:41:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75661532.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定