问题

你可以使用 Polars 的原生表达式 API 来更高效地完成这个任务。以下是使用 Polars 原生表达式 API 的示例代码：

result = df1.with_column(
    pl.when(
        pl.col("start").is_between(pl.col("idx"), pl.col("end"), closed="left")
    )
    .then(pl.col("values"))
    .otherwise(0)
    .alias("sum_values")
)

expected = result.groupby("start", "end").agg(pl.sum("sum_values"))

# 如果需要与你期望的输出匹配的格式，你可以使用以下代码：
expected = expected.rename(
    [
        "start",
        "end",
        "sum_values",
    ]
)

这段代码使用 Polars 的 when 和 otherwise 方法来创建一个新的列 sum_values，然后使用 groupby 和 agg 来计算每组的总和。最后，你可以根据需要对列进行重命名，以匹配你期望的输出格式。

英文:

Say I have

df1 = pl.DataFrame({&#39;start&#39;: [1., 2., 4.], &#39;end&#39;: [2, 4., 6]})
df2 = pl.DataFrame({&#39;idx&#39;: [1, 1.7, 2.3, 2.5, 3., 4], &#39;values&#39;: [3, 1, 4, 2, 3, 5]})

They look like this:

In [8]: df1
Out[8]:
shape: (3, 2)
┌───────┬─────┐
│ start ┆ end │
│ ---   ┆ --- │
│ f64   ┆ f64 │
╞═══════╪═════╡
│ 1.0   ┆ 2.0 │
│ 2.0   ┆ 4.0 │
│ 4.0   ┆ 6.0 │
└───────┴─────┘

In [9]: df2
Out[9]:
shape: (6, 2)
┌─────┬────────┐
│ idx ┆ values │
│ --- ┆ ---    │
│ f64 ┆ i64    │
╞═════╪════════╡
│ 1.0 ┆ 3      │
│ 1.7 ┆ 1      │
│ 2.3 ┆ 4      │
│ 2.5 ┆ 2      │
│ 3.0 ┆ 3      │
│ 4.0 ┆ 5      │
└─────┴────────┘

I would like to end up with something like this:

In [6]: expected = pl.DataFrame({
   ...:     &#39;start&#39;: [1., 2., 4.],
   ...:     &#39;end&#39;: [2., 4.5, 6.],
   ...:     &#39;sum_values&#39;: [4, 9, 5]
   ...: })

In [7]: expected
Out[7]:
shape: (3, 3)
┌───────┬─────┬────────────┐
│ start ┆ end ┆ sum_values │
│ ---   ┆ --- ┆ ---        │
│ f64   ┆ f64 ┆ i64        │
╞═══════╪═════╪════════════╡
│ 1.0   ┆ 2.0 ┆ 4          │
│ 2.0   ┆ 4.5 ┆ 9          │
│ 4.0   ┆ 6.0 ┆ 5          │
└───────┴─────┴────────────┘

Here's an inefficient way of doing it I came up with, using apply:

(
    df1.with_columns(
        df1.apply(
            lambda row: df2.filter(
                pl.col(&quot;idx&quot;).is_between(row[0], row[1], closed=&quot;left&quot;)
            )[&quot;values&quot;].sum()
        )[&quot;apply&quot;].alias(&quot;sum_values&quot;)
    )
)

It gives the correct output, but because it uses apply and a Python lambda function, it's not as performant as it could be.

Is there a way to write this using polars native expressions API?

答案1

得分: 2

我不确定是否有其他方法，除了交叉连接：

(df1.join(df2, how='cross')
.filter(pl.col('idx').is_between('start', 'end', closed='left'))
.groupby('start', 'end')
.sum()
)

形状：(3, 4)
┌───────┬─────┬─────┬────────┐
│ start ┆ end ┆ idx ┆ values │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ i64 │
╞═══════╪═════╪═════╪════════╡
│ 4.0 ┆ 6.0 ┆ 4.0 ┆ 5 │
│ 1.0 ┆ 2.0 ┆ 2.7 ┆ 4 │
│ 2.0 ┆ 4.0 ┆ 7.8 ┆ 9 │
└───────┴─────┴─────┴────────┘


<details>
<summary>英文:</summary>

I&#39;m not sure if there is another way apart from a cross join:

(df1.join(df2, how='cross')
.filter(pl.col('idx').is_between('start', 'end', closed='left'))
.groupby('start', 'end')
.sum()
)

shape: (3, 4)
┌───────┬─────┬─────┬────────┐
│ start ┆ end ┆ idx ┆ values │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ i64 │
╞═══════╪═════╪═════╪════════╡
│ 4.0 ┆ 6.0 ┆ 4.0 ┆ 5 │
│ 1.0 ┆ 2.0 ┆ 2.7 ┆ 4 │
│ 2.0 ┆ 4.0 ┆ 7.8 ┆ 9 │
└───────┴─────┴─────┴────────┘



</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据其他数据框进行筛选和聚合。

问题

答案1

如何将输入数据传递给现有的 TensorFlow 2.x 模型（使用 Java）？

关于Python代码的一个简单计算的确奇怪的问题。

如何在循环中将不同数据框的列相加？

如何使用pandas解析HTML表格数据的特定部分

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论