2023年2月16日 17:58:39go评论101阅读模式

英文:

Groupby number of hours in a month in pandas

问题

请问是否可以指导如何按小时为基础的索引进行分组，以找出特定月份中有多少小时的空值？因此，我考虑创建一个具有以月为基础的索引的数据框。

以下是具有时间戳作为索引和另一列偶尔包含空值的数据框。

timestamp	rel_humidity
1999-09-27 05:00:00	82.875
1999-09-27 06:00:00	83.5
1999-09-27 07:00:00	83.0
1999-09-27 08:00:00	80.6
1999-09-27 09:00:00	nan
1999-09-27 10:00:00	nan
1999-09-27 11:00:00	nan
1999-09-27 12:00:00	nan

我尝试了以下代码，但得到的数据框不是我预期的结果。

gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()

我经常在使用groupby函数时感到困惑。因此，非常感谢任何帮助。提前感谢！

英文:

Could someone please guide how to groupby no. of hours from hourly based index to find how many hours of null values are there in a specific month? Therefore, I am thinking of having a dataframe with monthly based index.

Below given is the dataframe which has timestamp as index and another column with has occassionally null values.

timestamp	rel_humidity
1999-09-27 05:00:00	82.875
1999-09-27 06:00:00	83.5
1999-09-27 07:00:00	83.0
1999-09-27 08:00:00	80.6
1999-09-27 09:00:00	nan
1999-09-27 10:00:00	nan
1999-09-27 11:00:00	nan
1999-09-27 12:00:00	nan

I tried this but the resulting dataframe is not what I expected.

gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()

I always struggle with groupby in function. Therefore, highly appreciate any help. Thanks in advance!

答案1

得分: 2

如果需要，如果每个月没有丢失的值，则通过Series.isna创建掩码，将DatetimeIndex转换为月份周期，使用DatetimeIndex.to_period和聚合sum - 掩码中的True处理为1或使用Grouper替代：

gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(OG_1998_2022_gaps.index.to_period('M')).sum())
gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                 .groupby(pd.Grouper(freq='M')).sum())

如果只需要匹配的行解决方案类似，但首先通过boolean indexing进行筛选，然后通过GroupBy.size进行计数：

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(OG_1998_2022_gaps.index.to_period('M')).size())
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                 .groupby(pd.Grouper(freq='M')).size())

英文:

If need 0 if no missing value per month create mask by Series.isna, convert DatetimeIndex to month periods by DatetimeIndex.to_period and aggregate sum - Trues of mask are processing like 1 or alternative with Grouper:

gap_in_month = (OG_1998_2022_gaps[&#39;rel_humidity&#39;].isna()
                 .groupby(OG_1998_2022_gaps.index.to_period(&#39;m&#39;)).sum())
gap_in_month = (OG_1998_2022_gaps[&#39;rel_humidity&#39;].isna()
                 .groupby(pd.Grouper(freq=&#39;m&#39;)).sum())

If need only matched rows solution is similar, but first filter by boolean indexing and then aggregate counts by GroupBy.size:

gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps[&#39;rel_humidity&#39;].isna()]
                 .groupby(OG_1998_2022_gaps.index.to_period(&#39;m&#39;)).size())
gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps[&#39;rel_humidity&#39;].isna()]
                 .groupby(pd.Grouper(freq=&#39;m&#39;)).size())

答案2

得分: 2

替代groupby的一个更好的选择（依我个人看来）是使用pd.Series.resample：

import pandas as pd
# 具有DatetimeIndex的一些示例数据：
series = pd.Series(
    np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
    index=pd.date_range(start="1999-09-26", end="1999-12-26", freq="H")
)
# 解决方案：
series.isna().resample("M").sum()
# 请注意，GroupBy.count和Resampler.count计算非空值的数量，而您似乎正在寻找相反的结果 :)

在您的情况下：

OG_1998_2022_gaps['rel_humidity'].isna().resample("M").sum()

英文:

Alternative to groupby, but (in my opinion) much nicer, is to use pd.Series.resample:

import pandas as pd
# Some sample data with a DatetimeIndex:
series = pd.Series(
    np.random.choice([1.0, 2.0, 3.0, np.nan], size=2185),
    index=pd.date_range(start=&quot;1999-09-26&quot;, end=&quot;1999-12-26&quot;, freq=&quot;H&quot;)
)
# Solution:
series.isna().resample(&quot;M&quot;).sum()
# Note that GroupBy.count and Resampler.count count the number of non-null values,
# whereas you seem to be looking for the opposite :)

In your case:

OG_1998_2022_gaps[&#39;rel_humidity&#39;].isna().resample(&quot;M&quot;).sum()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按月份在 pandas 中按小时分组

问题

答案1

答案2

从一个包含 n 个数字的列表中选择样本，不重复。

python3.9在Monterey M1上意外退出（分段错误）。

可以用 Python 将 OrderDict 写入 CSV 表格吗？

AttributeError: ‘int’对象没有属性’encode，当上传包含查询的txt文件到S3时

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。