问题

I am trying to benchmark Polars but I am stuck on how to replicate the following Pandas expression in Polars.

df['ll_lat'] = (df['lat'] // 0.1 * 0.1).round(1)
df['ll_lon'] = (df['lon'] // 0.1 * 0.1).round(1)
df['temporalBasket'] = df['eventtime'].astype(str).str[:13]
df = df.groupby(['ll_lat', 'll_lon', 'temporalBasket']).agg(strikes=('lat', 'count'))
df

Can someone help me translate and explain how I should be thinking about Polars column creation etc. please?

英文:

I am trying to benchmark Polars but I am stuck on how to replicate the following Pandas expression in Polars.

df[&#39;ll_lat&#39;] = (df[&#39;lat&#39;] // 0.1 * 0.1).round(1)
df[&#39;ll_lon&#39;] = (df[&#39;lon&#39;] // 0.1 * 0.1).round(1)
df[&#39;temporalBasket&#39;] = df[&#39;eventtime&#39;].astype(str).str[:13]
df = df.groupby([&#39;ll_lat&#39;, &#39;ll_lon&#39;, &#39;temporalBasket&#39;]).agg(strikes=(&#39;lat&#39;, &#39;count&#39;))
df

Can someone help me translate and explain how I should be thinking about Polars column creation etc. please?

Here is a df.head() output to make things a little clearer.

答案1

得分: 1

在Polars中，你可以执行类似Pandas的操作。然而，你可以使用截取字符串来提取日期和小时，而不是切片字符串。这应该会更快，也更易阅读。

关于向最接近的小数位取整，我没有找到Polars的方法。所以，我保留了你的逻辑。

# 示例数据
data = {
    'lat': [45.123, 45.155, 45.171, 45.191, 45.123],
    'lon': [12.321, 12.322, 12.345, 12.366, 12.321],
    'eventtime': [
        datetime(2023, 4, 1, 10, 20),
        datetime(2023, 4, 1, 12, 30),
        datetime(2023, 4, 1, 10, 45),
        datetime(2023, 4, 2, 9, 15),
        datetime(2023, 4, 2, 11, 50),
    ],
}

df_pl = pl.DataFrame(data)

df_pl.groupby(
    (pl.col('lat') // 0.1 * 0.1).alias('ll_lat'),
    (pl.col('lon') // 0.1 * 0.1).alias('ll_lon'),
    pl.col('eventtime').dt.truncate('1h').alias('temporalBasket')
).agg(
    strikes=pl.col('lat').count()
)

输出如下：

┌────────┬────────┬─────────────────────┬─────────┐
│ ll_lat ┆ ll_lon ┆ temporalBasket      ┆ strikes │
│ ---    ┆ ---    ┆ ---                 ┆ ---     │
│ f64    ┆ f64    ┆ datetime[μs]        ┆ u32     │
╞════════╪════════╪═════════════════════╪═════════╡
│ 45.1   ┆ 12.3   ┆ 2023-04-01 12:00:00 ┆ 1       │
│ 45.1   ┆ 12.3   ┆ 2023-04-02 09:00:00 ┆ 1       │
│ 45.1   ┆ 12.3   ┆ 2023-04-01 10:00:00 ┆ 2       │
│ 45.1   ┆ 12.3   ┆ 2023-04-02 11:00:00 ┆ 1       │
└────────┴────────┴─────────────────────┴─────────┘

英文:

You can do something similar in Polars to what you are doing in Pandas. However, you can use truncate the extract the day + hour instead of slicing the string. This should be faster, and also easier to read.

For rounding down to the nearest decimal, I did not find a Polars method for it. So I kept your logic.

# Sample data
data = {
    &#39;lat&#39;: [45.123, 45.155, 45.171, 45.191, 45.123],
    &#39;lon&#39;: [12.321, 12.322, 12.345, 12.366, 12.321],
    &#39;eventtime&#39;: [
        datetime(2023, 4, 1, 10, 20),
        datetime(2023, 4, 1, 12, 30),
        datetime(2023, 4, 1, 10, 45),
        datetime(2023, 4, 2, 9, 15),
        datetime(2023, 4, 2, 11, 50),
    ],
}

df_pl = pl.DataFrame(data)

df_pl.groupby(
    (pl.col(&#39;lat&#39;) // 0.1 * 0.1).alias(&#39;ll_lat&#39;),
    (pl.col(&#39;lon&#39;) // 0.1 * 0.1).alias(&#39;ll_lon&#39;),
    pl.col(&#39;eventtime&#39;).dt.truncate(&#39;1h&#39;).alias(&#39;temporalBasket&#39;)
).agg(
    strikes=pl.col(&#39;lat&#39;).count()
)

# Output
┌────────┬────────┬─────────────────────┬─────────┐
│ ll_lat ┆ ll_lon ┆ temporalBasket      ┆ strikes │
│ ---    ┆ ---    ┆ ---                 ┆ ---     │
│ f64    ┆ f64    ┆ datetime[μs]        ┆ u32     │
╞════════╪════════╪═════════════════════╪═════════╡
│ 45.1   ┆ 12.3   ┆ 2023-04-01 12:00:00 ┆ 1       │
│ 45.1   ┆ 12.3   ┆ 2023-04-02 09:00:00 ┆ 1       │
│ 45.1   ┆ 12.3   ┆ 2023-04-01 10:00:00 ┆ 2       │
│ 45.1   ┆ 12.3   ┆ 2023-04-02 11:00:00 ┆ 1       │
└────────┴────────┴─────────────────────┴─────────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars用于Pandas复杂查询的语法

问题

答案1

如何将年度变化反转以填充NaN值？

如何在由pandas.to_latex()生成的LaTeX表格中自动换行文本？

id变更计数生成了错误的值

DatetimeIndex在使用pandas绘图时的格式化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论