英文:
Groupby number of hours in a month in pandas
问题
请问是否可以指导如何按小时为基础的索引进行分组,以找出特定月份中有多少小时的空值?因此,我考虑创建一个具有以月为基础的索引的数据框。
以下是具有时间戳作为索引和另一列偶尔包含空值的数据框。
timestamp | rel_humidity |
---|---|
1999-09-27 05:00:00 | 82.875 |
1999-09-27 06:00:00 | 83.5 |
1999-09-27 07:00:00 | 83.0 |
1999-09-27 08:00:00 | 80.6 |
1999-09-27 09:00:00 | nan |
1999-09-27 10:00:00 | nan |
1999-09-27 11:00:00 | nan |
1999-09-27 12:00:00 | nan |
我尝试了以下代码,但得到的数据框不是我预期的结果。
gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()
我经常在使用groupby函数时感到困惑。因此,非常感谢任何帮助。提前感谢!
英文:
Could someone please guide how to groupby no. of hours from hourly based index to find how many hours of null values are there in a specific month? Therefore, I am thinking of having a dataframe with monthly based index.
Below given is the dataframe which has timestamp as index and another column with has occassionally null values.
timestamp | rel_humidity |
---|---|
1999-09-27 05:00:00 | 82.875 |
1999-09-27 06:00:00 | 83.5 |
1999-09-27 07:00:00 | 83.0 |
1999-09-27 08:00:00 | 80.6 |
1999-09-27 09:00:00 | nan |
1999-09-27 10:00:00 | nan |
1999-09-27 11:00:00 | nan |
1999-09-27 12:00:00 | nan |
I tried this but the resulting dataframe is not what I expected.
gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()
I always struggle with groupby in function. Therefore, highly appreciate any help. Thanks in advance!
答案1
得分: 2
如果需要,如果每个月没有丢失的值,则通过Series.isna
创建掩码,将DatetimeIndex
转换为月份周期,使用DatetimeIndex.to_period
和聚合sum
- 掩码中的True
处理为1
或使用Grouper
替代:
gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
.groupby(OG_1998_2022_gaps.index.to_period('M')).sum())
gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
.groupby(pd.Grouper(freq='M')).sum())
如果只需要匹配的行解决方案类似,但首先通过boolean indexing
进行筛选,然后通过GroupBy.size
进行计数:
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
.groupby(OG_1998_2022_gaps.index.to_period('M')).size())
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
.groupby(pd.Grouper(freq='M')).size())
英文:
If need 0
if no missing value per month create mask by Series.isna
, convert DatetimeIndex
to month periods by DatetimeIndex.to_period
and aggregate sum
- True
s of mask are processing like 1
or alternative with Grouper
:
gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
.groupby(OG_1998_2022_gaps.index.to_period('m')).sum())
gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
.groupby(pd.Grouper(freq='m')).sum())
If need only matched rows solution is similar, but first filter by boolean indexing
and then aggregate counts by GroupBy.size
:
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
.groupby(OG_1998_2022_gaps.index.to_period('m')).size())
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
.groupby(pd.Grouper(freq='m')).size())
答案2
得分: 2
替代groupby
的一个更好的选择(依我个人看来)是使用pd.Series.resample
:
import pandas as pd
# 具有DatetimeIndex的一些示例数据:
series = pd.Series(
np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
index=pd.date_range(start="1999-09-26", end="1999-12-26", freq="H")
)
# 解决方案:
series.isna().resample("M").sum()
# 请注意,GroupBy.count和Resampler.count计算非空值的数量,而您似乎正在寻找相反的结果 :)
在您的情况下:
OG_1998_2022_gaps['rel_humidity'].isna().resample("M").sum()
英文:
Alternative to groupby, but (in my opinion) much nicer, is to use pd.Series.resample
:
import pandas as pd
# Some sample data with a DatetimeIndex:
series = pd.Series(
np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
index=pd.date_range(start="1999-09-26", end="1999-12-26", freq="H")
)
# Solution:
series.isna().resample("M").sum()
# Note that GroupBy.count and Resampler.count count the number of non-null values,
# whereas you seem to be looking for the opposite :)
In your case:
OG_1998_2022_gaps['rel_humidity'].isna().resample("M").sum()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论