英文:
Why does Pandas not group weeks starting with the specified day?
问题
我想按周(星期日至星期六)对数据进行分组,并计算每周中各天的总和。但出于某种原因,Pandas并不按我预期的方式进行分组。以下是一些从2023年2月12日星期日开始,到2023年2月25日星期六结束的示例数据。我期望传递pd.Grouper(freq='W')
或pd.Grouper(freq='W-SUN')
,但除非我传递freq='W-SAT'
,否则它不会使用常规的周。
发生了什么?
$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:56)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>>
>>> S = pd.Series([1,2,3,70,2,3,4,5,6,10,20,4,9,1],
... pd.DatetimeIndex(['2023-02-%d' % d for d in range(12,26)]))
>>> S
2023-02-12 1
2023-02-13 2
2023-02-14 3
2023-02-15 70
2023-02-16 2
2023-02-17 3
2023-02-18 4
2023-02-19 5
2023-02-20 6
2023-02-21 10
2023-02-22 20
2023-02-23 4
2023-02-24 9
2023-02-25 1
dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SUN', origin='start_day')).sum()
2023-02-12 1
2023-02-19 89
2023-02-26 50
Freq: W-SUN, dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SAT', origin='start_day')).sum()
2023-02-18 85
2023-02-25 55
Freq: W-SAT, dtype: int64
英文:
I want to group data by week (Sunday - Saturday) and sum each of the days in that week. For some reason, Pandas doesn't group the way I expect. Here is some sample data starting with Sunday Feb 12 2023, and ending with Saturday Feb 25 2023. I expect to pass in pd.Grouper(freq='W')
or pd.Grouper(freq='W-SUN')
but it doesn't use conventional weeks unless I pass in freq='W-SAT'
.
What's going on?
$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:56)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>>
>>> S = pd.Series([1,2,3,70,2,3,4,5,6,10,20,4,9,1],
... pd.DatetimeIndex(['2023-02-%d' % d for d in range(12,26)]))
>>> S
2023-02-12 1
2023-02-13 2
2023-02-14 3
2023-02-15 70
2023-02-16 2
2023-02-17 3
2023-02-18 4
2023-02-19 5
2023-02-20 6
2023-02-21 10
2023-02-22 20
2023-02-23 4
2023-02-24 9
2023-02-25 1
dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SUN',origin='start_day')).sum()
2023-02-12 1
2023-02-19 89
2023-02-26 50
Freq: W-SUN, dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SAT',origin='start_day')).sum()
2023-02-18 85
2023-02-25 55
Freq: W-SAT, dtype: int64
答案1
得分: 1
IIUC,功能按预期工作。为了可能简化约定,您传递的参数是您希望您的周以结束的日期。因此,对于星期日到星期六的周,确实需要W-SAT
。这在这里有指定。尽管对于周它没有指定结束条件,但对于以下锚点,它有:
>W-SAT每周频率(星期六)
>(B)Q(S)-DEC
季度频率,年底在十二月。与‘Q’相同
>(B)Q(S)-JAN
季度频率,年底在一月
使用W-SAT
会导致汇总从星期日开始,并一直持续到星期六,完成7天。
英文:
IIUC, the functionality is working as desired. To perhaps simplify the convention, the argument you pass is the date in which you would expect your week to end. Therefore, for a sunday to saturday week, you indeed need W-SAT
. This is specified here. Even though for weeks it doesn't specify the end criteria, for following anchors, it does:
>W-SAT weekly frequency (Saturdays)
>(B)Q(S)-DEC
quarterly frequency, year ends in December. Same as ‘Q’
>(B)Q(S)-JAN
quarterly frequency, year ends in January
Using W-SAT
causes the aggregation to start on a sunday and run until saturday to complete the 7-days.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论