2023年6月9日 03:31:34go评论75阅读模式

英文:

Polars - Groupby_rolling - How to reset count each day?

问题

我已经从Pandas-Python切换到Polars，我正在尝试弄清楚如何对每一天进行滚动求和。我不只想要每一天的总和，我想要每一天每个时间段（5分钟段）的总和，以"rolling_sum"的方式呈现。以下是我目前的代码：

from datetime import datetime, timedelta

df = pl.DataFrame(
    {
    "date": pl.date_range(
            datetime(1985, 1, 1),
            datetime(1985, 1, 6),
            timedelta(minutes=5),
            time_unit="ns",
            eager=True).alias("date"),
    }
)
df = df.with_columns(
  A=pl.lit(2, dtype=pl.Int32)
)

df = df.lazy().groupby_rolling(
    "date",
    period="1d",
).agg(
    pl.col("A").sum().alias("A_daily"),
  ).collect()

print(df[280:])

这会产生以下结果：

shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date                ┆ A_daily │
│ ---                 ┆ ---     │
│ datetime[ns]        ┆ i32     │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562     │
│ 1985-01-01 23:25:00 ┆ 564     │
│ 1985-01-01 23:30:00 ┆ 566     │
│ 1985-01-01 23:35:00 ┆ 568     │
│ 1985-01-01 23:40:00 ┆ 570     │
│ 1985-01-01 23:45:00 ┆ 572     │
│ 1985-01-01 23:50:00 ┆ 574     │
│ 1985-01-01 23:55:00 ┆ 576     │
│ 1985-01-02 00:00:00 ┆ 576     │
│ 1985-01-02 00:05:00 ┆ 576     │
│ 1985-01-02 00:10:00 ┆ 576     │
│ 1985-01-02 00:15:00 ┆ 576     │
│ 1985-01-02 00:20:00 ┆ 576     │
│ 1985-01-02 00:25:00 ┆ 576     │
│ 1985-01-02 00:30:00 ┆ 576     │
│ 1985-01-02 00:35:00 ┆ 576     │
│ 1985-01-02 00:40:00 ┆ 576     │
│ 1985-01-02 00:45:00 ┆ 576     │
│ 1985-01-02 00:50:00 ┆ 576     │
...
│ 1985-01-05 23:50:00 ┆ 576     │
│ 1985-01-05 23:55:00 ┆ 576     │
│ 1985-01-06 00:00:00 ┆ 576     │
└─────────────────────┴─────────┘

我猜我实际上想要的是每一天，A_Daily重新开始，即如下所示：

shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date                ┆ A_daily │
│ ---                 ┆ ---     │
│ datetime[ns]        ┆ i32     │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562     │
│ 1985-01-01 23:25:00 ┆ 564     │
│ 1985-01-01 23:30:00 ┆ 566     │
│ 1985-01-01 23:35:00 ┆ 568     │
│ 1985-01-01 23:40:00 ┆ 570     │
│ 1985-01-01 23:45:00 ┆ 572     │
│ 1985-01-01 23:50:00 ┆ 574     │
│ 1985-01-01 23:55:00 ┆ 576     │
**│ 1985-01-02 00:00:00 ┆   2     │**
│ 1985-01-02 00:05:00 ┆   4     │
│ 1985-01-02 00:10:00 ┆   6     │
│ 1985-01-02 00:15:00 ┆   8     │
│ 1985-01-02 00:20:00 ┆  10     │
│ 1985-01-02 00:25:00 ┆  12     │
│ 1985-01-02 00:30:00 ┆  14     │
│ 1985-01-02 00:35:00 ┆  18     │
│ 1985-01-02 00:40:00 ┆  20     │
│ 1985-01-02 00:45:00 ┆  22     │
│ 1985-01-02 00:50:00 ┆  24     │

基本上，我只想在每天重新开始滚动总和。在Pandas中，我会像这样做：

tn = outdata.groupby(pd.Grouper(freq='D', key='_daily_reset'))[
    #     '_vp_volume'].transform('cumsum')

谢谢！

英文:

I've switched from Pandas-Python to Polars and I'm trying to figure out how to essentially do a rolling sum for each day. I don't want just the total for each day, I want the totals for each time period (5m period) each day, in a "rolling_sum" fashion. Here is the code I have so far:

from datetime import datetime, timedelta


df = pl.DataFrame(
    {
    &quot;date&quot;: pl.date_range(
            datetime(1985, 1, 1),
            datetime(1985, 1, 6),
            timedelta(minutes=5),
            time_unit=&quot;ns&quot;,
            eager=True).alias(&quot;date&quot;),
    }
)
df = df.with_columns(
  A=pl.lit(2, dtype=pl.Int32)
)


df = df.lazy().groupby_rolling(
    &quot;date&quot;,
    period=&quot;1d&quot;,
).agg(
    pl.col(&quot;A&quot;).sum().alias(&quot;A_daily&quot;),
  ).collect()


print(df[280:])

This produces:

shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date                ┆ A_daily │
│ ---                 ┆ ---     │
│ datetime[ns]        ┆ i32     │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562     │
│ 1985-01-01 23:25:00 ┆ 564     │
│ 1985-01-01 23:30:00 ┆ 566     │
│ 1985-01-01 23:35:00 ┆ 568     │
│ 1985-01-01 23:40:00 ┆ 570     │
│ 1985-01-01 23:45:00 ┆ 572     │
│ 1985-01-01 23:50:00 ┆ 574     │
│ 1985-01-01 23:55:00 ┆ 576     │
│ 1985-01-02 00:00:00 ┆ 576     │
│ 1985-01-02 00:05:00 ┆ 576     │
│ 1985-01-02 00:10:00 ┆ 576     │
│ 1985-01-02 00:15:00 ┆ 576     │
│ 1985-01-02 00:20:00 ┆ 576     │
│ 1985-01-02 00:25:00 ┆ 576     │
│ 1985-01-02 00:30:00 ┆ 576     │
│ 1985-01-02 00:35:00 ┆ 576     │
│ 1985-01-02 00:40:00 ┆ 576     │
│ 1985-01-02 00:45:00 ┆ 576     │
│ 1985-01-02 00:50:00 ┆ 576     │
...
│ 1985-01-05 23:50:00 ┆ 576     │
│ 1985-01-05 23:55:00 ┆ 576     │
│ 1985-01-06 00:00:00 ┆ 576     │
└─────────────────────┴─────────┘

What I guess what I'm actually after is for each day, I want the A_Daily to reset, i.e. to look like:

shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date                ┆ A_daily │
│ ---                 ┆ ---     │
│ datetime[ns]        ┆ i32     │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562     │
│ 1985-01-01 23:25:00 ┆ 564     │
│ 1985-01-01 23:30:00 ┆ 566     │
│ 1985-01-01 23:35:00 ┆ 568     │
│ 1985-01-01 23:40:00 ┆ 570     │
│ 1985-01-01 23:45:00 ┆ 572     │
│ 1985-01-01 23:50:00 ┆ 574     │
│ 1985-01-01 23:55:00 ┆ 576     │
**│ 1985-01-02 00:00:00 ┆   2     │**
│ 1985-01-02 00:05:00 ┆   4     │
│ 1985-01-02 00:10:00 ┆   6     │
│ 1985-01-02 00:15:00 ┆   8     │
│ 1985-01-02 00:20:00 ┆  10     │
│ 1985-01-02 00:25:00 ┆  12     │
│ 1985-01-02 00:30:00 ┆  14     │
│ 1985-01-02 00:35:00 ┆  18     │
│ 1985-01-02 00:40:00 ┆  20     │
│ 1985-01-02 00:45:00 ┆  22     │
│ 1985-01-02 00:50:00 ┆  24     │

Basically, I want to just reset the rolling sum each day. In Pandas, I'd have done it sort of like this:

tn = outdata.groupby(pd.Grouper(freq=&#39;D&#39;, key=&#39;_daily_reset&#39;))[
    #     &#39;_vp_volume&#39;].transform(&#39;cumsum&#39;)

Thanks!

答案1

得分: 1

.dt.truncate() 可以对日期进行分组/桶化，然后您可以使用 .cumsum().over() 进行操作。

df.with_columns(
   pl.col("A").cumsum().over(pl.col("date").dt.truncate("1d"))
)

shape: (1_441, 2)
┌─────────────────────┬─────┐
│ date                ┆ A   │
│ ---                 ┆ --- │
│ datetime[ns]        ┆ i32 │
╞═════════════════════╪═════╡
│ 1985-01-01 00:00:00 ┆ 2   │
│ 1985-01-01 00:05:00 ┆ 4   │
│ 1985-01-01 00:10:00 ┆ 6   │
│ 1985-01-01 00:15:00 ┆ 8   │
│ 1985-01-01 00:20:00 ┆ 10  │
│ 1985-01-01 00:25:00 ┆ 12  │
│ …                   ┆ …   │
│ 1985-01-05 23:35:00 ┆ 568 │
│ 1985-01-05 23:40:00 ┆ 570 │
│ 1985-01-05 23:45:00 ┆ 572 │
│ 1985-01-05 23:50:00 ┆ 574 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

英文:

.dt.truncate() can group/bucket the dates which you can .cumsum().over()

df.with_columns(
   pl.col(&quot;A&quot;).cumsum().over(pl.col(&quot;date&quot;).dt.truncate(&quot;1d&quot;))
)

shape: (1_441, 2)
┌─────────────────────┬─────┐
│ date                ┆ A   │
│ ---                 ┆ --- │
│ datetime[ns]        ┆ i32 │
╞═════════════════════╪═════╡
│ 1985-01-01 00:00:00 ┆ 2   │
│ 1985-01-01 00:05:00 ┆ 4   │
│ 1985-01-01 00:10:00 ┆ 6   │
│ 1985-01-01 00:15:00 ┆ 8   │
│ 1985-01-01 00:20:00 ┆ 10  │
│ 1985-01-01 00:25:00 ┆ 12  │
│ …                   ┆ …   │
│ 1985-01-05 23:35:00 ┆ 568 │
│ 1985-01-05 23:40:00 ┆ 570 │
│ 1985-01-05 23:45:00 ┆ 572 │
│ 1985-01-05 23:50:00 ┆ 574 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars – Groupby_rolling – 如何每天重置计数？

问题

答案1

CancelledError异常处理程序未触发。

美丽汤 Img Src 抓取

生成带有 YYYY 格式的随机年份的 Python 代码是什么？

合并具有相同键的两个数据框，其中包含多行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论