Pandas为什么不按指定的日期开始分组周?

huangapple go评论67阅读模式
英文:

Why does Pandas not group weeks starting with the specified day?

问题

我想按周(星期日至星期六)对数据进行分组,并计算每周中各天的总和。但出于某种原因,Pandas并不按我预期的方式进行分组。以下是一些从2023年2月12日星期日开始,到2023年2月25日星期六结束的示例数据。我期望传递pd.Grouper(freq='W')pd.Grouper(freq='W-SUN'),但除非我传递freq='W-SAT',否则它不会使用常规的周。

发生了什么?

$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:56) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> 
>>> S = pd.Series([1,2,3,70,2,3,4,5,6,10,20,4,9,1],
...               pd.DatetimeIndex(['2023-02-%d' % d for d in range(12,26)]))
>>> S
2023-02-12     1
2023-02-13     2
2023-02-14     3
2023-02-15    70
2023-02-16     2
2023-02-17     3
2023-02-18     4
2023-02-19     5
2023-02-20     6
2023-02-21    10
2023-02-22    20
2023-02-23     4
2023-02-24     9
2023-02-25     1
dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SUN', origin='start_day')).sum()
2023-02-12     1
2023-02-19    89
2023-02-26    50
Freq: W-SUN, dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SAT', origin='start_day')).sum()
2023-02-18    85
2023-02-25    55
Freq: W-SAT, dtype: int64
英文:

I want to group data by week (Sunday - Saturday) and sum each of the days in that week. For some reason, Pandas doesn't group the way I expect. Here is some sample data starting with Sunday Feb 12 2023, and ending with Saturday Feb 25 2023. I expect to pass in pd.Grouper(freq='W') or pd.Grouper(freq='W-SUN') but it doesn't use conventional weeks unless I pass in freq='W-SAT'.

What's going on?

$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:56) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> 
>>> S = pd.Series([1,2,3,70,2,3,4,5,6,10,20,4,9,1],
...               pd.DatetimeIndex(['2023-02-%d' % d for d in range(12,26)]))
>>> S
2023-02-12     1
2023-02-13     2
2023-02-14     3
2023-02-15    70
2023-02-16     2
2023-02-17     3
2023-02-18     4
2023-02-19     5
2023-02-20     6
2023-02-21    10
2023-02-22    20
2023-02-23     4
2023-02-24     9
2023-02-25     1
dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SUN',origin='start_day')).sum()
2023-02-12     1
2023-02-19    89
2023-02-26    50
Freq: W-SUN, dtype: int64
>>> S.groupby(pd.Grouper(freq='W-SAT',origin='start_day')).sum()
2023-02-18    85
2023-02-25    55
Freq: W-SAT, dtype: int64

答案1

得分: 1

IIUC,功能按预期工作。为了可能简化约定,您传递的参数是您希望您的周以结束的日期。因此,对于星期日到星期六的周,确实需要W-SAT。这在这里有指定。尽管对于周它没有指定结束条件,但对于以下锚点,它有:

>W-SAT每周频率(星期六)

>(B)Q(S)-DEC
季度频率,年底在十二月。与‘Q’相同

>(B)Q(S)-JAN
季度频率,年底在一月

使用W-SAT会导致汇总从星期日开始,并一直持续到星期六,完成7天。

英文:

IIUC, the functionality is working as desired. To perhaps simplify the convention, the argument you pass is the date in which you would expect your week to end. Therefore, for a sunday to saturday week, you indeed need W-SAT. This is specified here. Even though for weeks it doesn't specify the end criteria, for following anchors, it does:

>W-SAT weekly frequency (Saturdays)

>(B)Q(S)-DEC
quarterly frequency, year ends in December. Same as ‘Q’

>(B)Q(S)-JAN
quarterly frequency, year ends in January

Using W-SAT causes the aggregation to start on a sunday and run until saturday to complete the 7-days.

huangapple
  • 本文由 发表于 2023年6月5日 06:49:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76402704.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定