英文:
Polars - Groupby_rolling - How to reset count each day?
问题
我已经从Pandas-Python切换到Polars,我正在尝试弄清楚如何对每一天进行滚动求和。我不只想要每一天的总和,我想要每一天每个时间段(5分钟段)的总和,以"rolling_sum"的方式呈现。以下是我目前的代码:
from datetime import datetime, timedelta
df = pl.DataFrame(
{
"date": pl.date_range(
datetime(1985, 1, 1),
datetime(1985, 1, 6),
timedelta(minutes=5),
time_unit="ns",
eager=True).alias("date"),
}
)
df = df.with_columns(
A=pl.lit(2, dtype=pl.Int32)
)
df = df.lazy().groupby_rolling(
"date",
period="1d",
).agg(
pl.col("A").sum().alias("A_daily"),
).collect()
print(df[280:])
这会产生以下结果:
shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date ┆ A_daily │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562 │
│ 1985-01-01 23:25:00 ┆ 564 │
│ 1985-01-01 23:30:00 ┆ 566 │
│ 1985-01-01 23:35:00 ┆ 568 │
│ 1985-01-01 23:40:00 ┆ 570 │
│ 1985-01-01 23:45:00 ┆ 572 │
│ 1985-01-01 23:50:00 ┆ 574 │
│ 1985-01-01 23:55:00 ┆ 576 │
│ 1985-01-02 00:00:00 ┆ 576 │
│ 1985-01-02 00:05:00 ┆ 576 │
│ 1985-01-02 00:10:00 ┆ 576 │
│ 1985-01-02 00:15:00 ┆ 576 │
│ 1985-01-02 00:20:00 ┆ 576 │
│ 1985-01-02 00:25:00 ┆ 576 │
│ 1985-01-02 00:30:00 ┆ 576 │
│ 1985-01-02 00:35:00 ┆ 576 │
│ 1985-01-02 00:40:00 ┆ 576 │
│ 1985-01-02 00:45:00 ┆ 576 │
│ 1985-01-02 00:50:00 ┆ 576 │
...
│ 1985-01-05 23:50:00 ┆ 576 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 576 │
└─────────────────────┴─────────┘
我猜我实际上想要的是每一天,A_Daily
重新开始,即如下所示:
shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date ┆ A_daily │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562 │
│ 1985-01-01 23:25:00 ┆ 564 │
│ 1985-01-01 23:30:00 ┆ 566 │
│ 1985-01-01 23:35:00 ┆ 568 │
│ 1985-01-01 23:40:00 ┆ 570 │
│ 1985-01-01 23:45:00 ┆ 572 │
│ 1985-01-01 23:50:00 ┆ 574 │
│ 1985-01-01 23:55:00 ┆ 576 │
**│ 1985-01-02 00:00:00 ┆ 2 │**
│ 1985-01-02 00:05:00 ┆ 4 │
│ 1985-01-02 00:10:00 ┆ 6 │
│ 1985-01-02 00:15:00 ┆ 8 │
│ 1985-01-02 00:20:00 ┆ 10 │
│ 1985-01-02 00:25:00 ┆ 12 │
│ 1985-01-02 00:30:00 ┆ 14 │
│ 1985-01-02 00:35:00 ┆ 18 │
│ 1985-01-02 00:40:00 ┆ 20 │
│ 1985-01-02 00:45:00 ┆ 22 │
│ 1985-01-02 00:50:00 ┆ 24 │
基本上,我只想在每天重新开始滚动总和。在Pandas中,我会像这样做:
tn = outdata.groupby(pd.Grouper(freq='D', key='_daily_reset'))[
# '_vp_volume'].transform('cumsum')
谢谢!
英文:
I've switched from Pandas-Python to Polars and I'm trying to figure out how to essentially do a rolling sum for each day. I don't want just the total for each day, I want the totals for each time period (5m period) each day, in a "rolling_sum" fashion. Here is the code I have so far:
from datetime import datetime, timedelta
df = pl.DataFrame(
{
"date": pl.date_range(
datetime(1985, 1, 1),
datetime(1985, 1, 6),
timedelta(minutes=5),
time_unit="ns",
eager=True).alias("date"),
}
)
df = df.with_columns(
A=pl.lit(2, dtype=pl.Int32)
)
df = df.lazy().groupby_rolling(
"date",
period="1d",
).agg(
pl.col("A").sum().alias("A_daily"),
).collect()
print(df[280:])
This produces:
shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date ┆ A_daily │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562 │
│ 1985-01-01 23:25:00 ┆ 564 │
│ 1985-01-01 23:30:00 ┆ 566 │
│ 1985-01-01 23:35:00 ┆ 568 │
│ 1985-01-01 23:40:00 ┆ 570 │
│ 1985-01-01 23:45:00 ┆ 572 │
│ 1985-01-01 23:50:00 ┆ 574 │
│ 1985-01-01 23:55:00 ┆ 576 │
│ 1985-01-02 00:00:00 ┆ 576 │
│ 1985-01-02 00:05:00 ┆ 576 │
│ 1985-01-02 00:10:00 ┆ 576 │
│ 1985-01-02 00:15:00 ┆ 576 │
│ 1985-01-02 00:20:00 ┆ 576 │
│ 1985-01-02 00:25:00 ┆ 576 │
│ 1985-01-02 00:30:00 ┆ 576 │
│ 1985-01-02 00:35:00 ┆ 576 │
│ 1985-01-02 00:40:00 ┆ 576 │
│ 1985-01-02 00:45:00 ┆ 576 │
│ 1985-01-02 00:50:00 ┆ 576 │
...
│ 1985-01-05 23:50:00 ┆ 576 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 576 │
└─────────────────────┴─────────┘
What I guess what I'm actually after is for each day, I want the A_Daily
to reset, i.e. to look like:
shape: (1_161, 2)
┌─────────────────────┬─────────┐
│ date ┆ A_daily │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════════╡
│ 1985-01-01 23:20:00 ┆ 562 │
│ 1985-01-01 23:25:00 ┆ 564 │
│ 1985-01-01 23:30:00 ┆ 566 │
│ 1985-01-01 23:35:00 ┆ 568 │
│ 1985-01-01 23:40:00 ┆ 570 │
│ 1985-01-01 23:45:00 ┆ 572 │
│ 1985-01-01 23:50:00 ┆ 574 │
│ 1985-01-01 23:55:00 ┆ 576 │
**│ 1985-01-02 00:00:00 ┆ 2 │**
│ 1985-01-02 00:05:00 ┆ 4 │
│ 1985-01-02 00:10:00 ┆ 6 │
│ 1985-01-02 00:15:00 ┆ 8 │
│ 1985-01-02 00:20:00 ┆ 10 │
│ 1985-01-02 00:25:00 ┆ 12 │
│ 1985-01-02 00:30:00 ┆ 14 │
│ 1985-01-02 00:35:00 ┆ 18 │
│ 1985-01-02 00:40:00 ┆ 20 │
│ 1985-01-02 00:45:00 ┆ 22 │
│ 1985-01-02 00:50:00 ┆ 24 │
Basically, I want to just reset the rolling sum each day. In Pandas, I'd have done it sort of like this:
tn = outdata.groupby(pd.Grouper(freq='D', key='_daily_reset'))[
# '_vp_volume'].transform('cumsum')
Thanks!
答案1
得分: 1
.dt.truncate()
可以对日期进行分组/桶化,然后您可以使用 .cumsum().over()
进行操作。
df.with_columns(
pl.col("A").cumsum().over(pl.col("date").dt.truncate("1d"))
)
shape: (1_441, 2)
┌─────────────────────┬─────┐
│ date ┆ A │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════╡
│ 1985-01-01 00:00:00 ┆ 2 │
│ 1985-01-01 00:05:00 ┆ 4 │
│ 1985-01-01 00:10:00 ┆ 6 │
│ 1985-01-01 00:15:00 ┆ 8 │
│ 1985-01-01 00:20:00 ┆ 10 │
│ 1985-01-01 00:25:00 ┆ 12 │
│ … ┆ … │
│ 1985-01-05 23:35:00 ┆ 568 │
│ 1985-01-05 23:40:00 ┆ 570 │
│ 1985-01-05 23:45:00 ┆ 572 │
│ 1985-01-05 23:50:00 ┆ 574 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 2 │
└─────────────────────┴─────┘
英文:
.dt.truncate()
can group/bucket the dates which you can .cumsum().over()
df.with_columns(
pl.col("A").cumsum().over(pl.col("date").dt.truncate("1d"))
)
shape: (1_441, 2)
┌─────────────────────┬─────┐
│ date ┆ A │
│ --- ┆ --- │
│ datetime[ns] ┆ i32 │
╞═════════════════════╪═════╡
│ 1985-01-01 00:00:00 ┆ 2 │
│ 1985-01-01 00:05:00 ┆ 4 │
│ 1985-01-01 00:10:00 ┆ 6 │
│ 1985-01-01 00:15:00 ┆ 8 │
│ 1985-01-01 00:20:00 ┆ 10 │
│ 1985-01-01 00:25:00 ┆ 12 │
│ … ┆ … │
│ 1985-01-05 23:35:00 ┆ 568 │
│ 1985-01-05 23:40:00 ┆ 570 │
│ 1985-01-05 23:45:00 ┆ 572 │
│ 1985-01-05 23:50:00 ┆ 574 │
│ 1985-01-05 23:55:00 ┆ 576 │
│ 1985-01-06 00:00:00 ┆ 2 │
└─────────────────────┴─────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论