按月份在 pandas 中按小时分组

huangapple go评论65阅读模式
英文:

Groupby number of hours in a month in pandas

问题

请问是否可以指导如何按小时为基础的索引进行分组,以找出特定月份中有多少小时的空值?因此,我考虑创建一个具有以月为基础的索引的数据框。

以下是具有时间戳作为索引和另一列偶尔包含空值的数据框。

timestamp rel_humidity
1999-09-27 05:00:00 82.875
1999-09-27 06:00:00 83.5
1999-09-27 07:00:00 83.0
1999-09-27 08:00:00 80.6
1999-09-27 09:00:00 nan
1999-09-27 10:00:00 nan
1999-09-27 11:00:00 nan
1999-09-27 12:00:00 nan

我尝试了以下代码,但得到的数据框不是我预期的结果。

gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()

我经常在使用groupby函数时感到困惑。因此,非常感谢任何帮助。提前感谢!

英文:

Could someone please guide how to groupby no. of hours from hourly based index to find how many hours of null values are there in a specific month? Therefore, I am thinking of having a dataframe with monthly based index.

Below given is the dataframe which has timestamp as index and another column with has occassionally null values.

timestamp rel_humidity
1999-09-27 05:00:00 82.875
1999-09-27 06:00:00 83.5
1999-09-27 07:00:00 83.0
1999-09-27 08:00:00 80.6
1999-09-27 09:00:00 nan
1999-09-27 10:00:00 nan
1999-09-27 11:00:00 nan
1999-09-27 12:00:00 nan

I tried this but the resulting dataframe is not what I expected.

gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()

I always struggle with groupby in function. Therefore, highly appreciate any help. Thanks in advance!

答案1

得分: 2

如果需要,如果每个月没有丢失的值,则通过Series.isna创建掩码,将DatetimeIndex转换为月份周期,使用DatetimeIndex.to_period和聚合sum - 掩码中的True处理为1或使用Grouper替代:

gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(OG_1998_2022_gaps.index.to_period('M')).sum())

gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(pd.Grouper(freq='M')).sum())

如果只需要匹配的行解决方案类似,但首先通过boolean indexing进行筛选,然后通过GroupBy.size进行计数:

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(OG_1998_2022_gaps.index.to_period('M')).size())

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(pd.Grouper(freq='M')).size())
英文:

If need 0 if no missing value per month create mask by Series.isna, convert DatetimeIndex to month periods by DatetimeIndex.to_period and aggregate sum - Trues of mask are processing like 1 or alternative with Grouper:

gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(OG_1998_2022_gaps.index.to_period('m')).sum())

gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(pd.Grouper(freq='m')).sum())

If need only matched rows solution is similar, but first filter by boolean indexing and then aggregate counts by GroupBy.size:

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(OG_1998_2022_gaps.index.to_period('m')).size())

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(pd.Grouper(freq='m')).size())

答案2

得分: 2

替代groupby的一个更好的选择(依我个人看来)是使用pd.Series.resample

import pandas as pd

# 具有DatetimeIndex的一些示例数据:
series = pd.Series(
    np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
    index=pd.date_range(start="1999-09-26", end="1999-12-26", freq="H")
)

# 解决方案:
series.isna().resample("M").sum()

# 请注意,GroupBy.count和Resampler.count计算非空值的数量,而您似乎正在寻找相反的结果 :)

在您的情况下:

OG_1998_2022_gaps['rel_humidity'].isna().resample("M").sum()
英文:

Alternative to groupby, but (in my opinion) much nicer, is to use pd.Series.resample:

import pandas as pd

# Some sample data with a DatetimeIndex:
series = pd.Series(
    np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
    index=pd.date_range(start="1999-09-26", end="1999-12-26", freq="H")
)

# Solution:
series.isna().resample("M").sum()

# Note that GroupBy.count and Resampler.count count the number of non-null values,
# whereas you seem to be looking for the opposite :)

In your case:

OG_1998_2022_gaps['rel_humidity'].isna().resample("M").sum()

huangapple
  • 本文由 发表于 2023年2月16日 17:58:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75470577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定