group_rolling per group in polars

huangapple go评论61阅读模式
英文:

group_rolling per group in polars

问题

以下是翻译好的代码部分:

让我们假设我有以下的 DataFrame

example_df = pl.DataFrame(
    {
    "date": pl.date_range(low=date(2022, 1, 1), high=date(2022, 1, 30), interval="1d"),
    "group": ["A", "B", "C"]*10,
    'total': np.random.randint(100, size=30)
    }
)

现在我想在5天的时间窗口内对每个组的总和进行求和类似于将 groupby 与 groupby_rolling 结合在一起

我想知道是否有比以下代码更加优雅和高效的方法来实现

df_list = []

for table_group in example_df['group'].unique():
    temp_df = (
        example_df
        .filter(pl.col('group')==table_group)
        .sort('date')
        .groupby_rolling('date', period='5d')
        .agg(pl.col('total').sum().prefix('roll_'))
    )
    
    temp_df2 = (
        example_df
        .filter(pl.col('group')==table_group)
        .sort('date')
        .select([
            pl.all(),
            temp_df.get_column('roll_total')
            ])
    )
    df_list.append(temp_df2)

pl.concat(df_list)

希望这有所帮助。

英文:

Let's say I have the following df:

example_df = pl.DataFrame(
    {
    "date": pl.date_range(low=date(2022, 1, 1), high=date(2022, 1, 30), interval="1d"),
    "group": ["A", "B", "C"]*10,
    'total': np.random.randint(100, size=30)
    }
)

Now I want to sum the total per group, in a time window of 5 days. Something like combining groupby with groupby_rolling.

I wonder if there more elegant and efficient way to do it than the following code:

df_list = []

for table_group in example_df['group'].unique():
    temp_df = (
        example_df
        .filter(pl.col('group')==table_group)
        .sort('date')
        .groupby_rolling('date', period='5d')
        .agg(pl.col('total').sum().prefix('roll_'))
    )
    
    temp_df2 = (
        example_df
        .filter(pl.col('group')==table_group)
        .sort('date')
        .select([
            pl.all(),
            temp_df.get_column('roll_total')
            ])
    )
    df_list.append(temp_df2)

pl.concat(df_list)

My main issue is that I have millions of rows and over 10,000 groups(and more than 1 column that I want to sum)...
So this code is not so elegant and consumes more memory than I want it to.

答案1

得分: 1

以下是您要翻译的内容:

"It appears that the piece you're missing is the by parameter.

This allows you to specify grouping.

(df.groupby_rolling(index_column="date", by="group", period="5d")
   .agg(
      pl.col("total").last(),
      roll_total = pl.sum("total")))
shape: (30, 4)
┌───────┬────────────┬───────┬────────────┐
│ group ┆ date       ┆ total ┆ roll_total │
│ ---   ┆ ---        ┆ ---   ┆ ---        │
│ str   ┆ date       ┆ i64   ┆ i64        │
╞═══════╪════════════╪═══════╪════════════╡
│ A     ┆ 2022-01-01 ┆ 96    ┆ 96         │
│ A     ┆ 2022-01-04 ┆ 16    ┆ 112        │
│ A     ┆ 2022-01-07 ┆ 17    ┆ 33         │
│ A     ┆ 2022-01-10 ┆ 4     ┆ 21         │
│ …     ┆ …          ┆ …     ┆ …          │
│ C     ┆ 2022-01-21 ┆ 17    ┆ 78         │
│ C     ┆ 2022-01-24 ┆ 84    ┆ 101        │
│ C     ┆ 2022-01-27 ┆ 78    ┆ 162        │
│ C     ┆ 2022-01-30 ┆ 54    ┆ 132        │
└───────┴────────────┴───────┴────────────┘
```"

<details>
<summary>英文:</summary>

It appears that the piece you&#39;re missing is the [`by`](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.groupby_rolling.html) parameter.

This allows you to specify grouping.

(df.groupby_rolling(index_column="date", by="group", period="5d")
.agg(
pl.col("total").last(),
roll_total = pl.sum("total")))

shape: (30, 4)
┌───────┬────────────┬───────┬────────────┐
│ group ┆ date ┆ total ┆ roll_total │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 ┆ i64 │
╞═══════╪════════════╪═══════╪════════════╡
│ A ┆ 2022-01-01 ┆ 96 ┆ 96 │
│ A ┆ 2022-01-04 ┆ 16 ┆ 112 │
│ A ┆ 2022-01-07 ┆ 17 ┆ 33 │
│ A ┆ 2022-01-10 ┆ 4 ┆ 21 │
│ … ┆ … ┆ … ┆ … │
│ C ┆ 2022-01-21 ┆ 17 ┆ 78 │
│ C ┆ 2022-01-24 ┆ 84 ┆ 101 │
│ C ┆ 2022-01-27 ┆ 78 ┆ 162 │
│ C ┆ 2022-01-30 ┆ 54 ┆ 132 │
└───────┴────────────┴───────┴────────────┘


</details>



huangapple
  • 本文由 发表于 2023年3月21日 02:25:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793994.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定