英文:
Offset when filtering timezone aware datetimes in python polars
问题
我有一个带有时区感知日期时间的DataFrame,列名为"datetime"。原始时区是UTC。
from zoneinfo import ZoneInfo
(
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
.filter(
pl.col("datetime")
.cast(pl.Date)
.is_between(
datetime(2023, 8, 1, tzinfo=ZoneInfo("Europe/Berlin")),
datetime(2023, 8, 3, tzinfo=ZoneInfo("Europe/Berlin")),
)
)
)
转换为CEST("Europe/Berlin")是有效的。但是,当我过滤datetime时,存在2小时的偏移。
datetime
datetime[μs, Europe/Berlin]
2023-08-01 02:00:02 CEST
2023-08-03 01:00:02 CEST
原始数据集的第一行不在列表中,但应该在列表中。第三行在列表中,但不应该在列表中。
这看起来像是UTC和CEST之间的差异。如果Python的datetime对象是naive的(例如,只是datetime(2023,8,1)),结果是相同的。
在过滤时,如何让polars考虑时区呢?
英文:
I have a Dataframe with timezone aware datetimes in the column "datetime". The original timezone is UTC.
from zoneinfo import ZoneInfo
(
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
.filter(
pl.col("datetime")
.cast(pl.Date)
.is_between(
datetime(2023, 8, 1, tzinfo=ZoneInfo("Europe/Berlin")),
datetime(2023, 8, 3, tzinfo=ZoneInfo("Europe/Berlin")),
)
)
)
The conversion into CEST ("Europe/Berlin") works. However, when I filter for datetime there is a 2 hour offset.
datetime
datetime[μs, Europe/Berlin]
2023-08-01 02:00:02 CEST
2023-08-03 01:00:02 CEST
The first row from the original dataset is not in the list but it should. The third row is in the list but it should not.
This looks like the difference between UTC and CEST. The result is the same if the python datetime object is naive (e. g. just datetime(2023,8,1)).
How do I get polars to take the timezone into account when filtering.
答案1
得分: 1
由于您已将日期时间正确本地化到一个时区,因此您的筛选器应该反映这一点,即也使用知道时区的日期时间,而不需要使用.cast(pl.Date)
将其转换回不带时区信息的日期时间:
from zoneinfo import ZoneInfo
import polars as pl
df = (
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
)
print(
df.filter(
pl.col("datetime")
.is_between(
datetime(2023, 8, 1, tzinfo=ZoneInfo("Europe/Berlin")),
datetime(2023, 8, 3, tzinfo=ZoneInfo("Europe/Berlin")),
)
)
)
┌─────────────────────────────┐
│ datetime │
│ --- │
│ datetime[μs, Europe/Berlin] │
╞═════════════════════════════╡
│ 2023-08-01 00:00:02 CEST │
│ 2023-08-01 02:00:02 CEST │
└─────────────────────────────┘
要观察将日期时间转换回pl.Date
(或pl.Datetime
,以便更好地说明),可以运行以下示例:
df = (
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
.with_columns(pl.col("datetime").cast(pl.Datetime))
)
print(df["datetime"])
[
2023-07-31 22:00:02
2023-08-01 00:00:02
2023-08-02 23:00:02
]
在Polars中,不带时区信息的日期时间类似于UTC。
英文:
Since you correctly localize your datetimes to a time zone, your filter should reflect that, i.e. also use the aware datetime, without casting back to a naive date with .cast(pl.Date)
:
from zoneinfo import ZoneInfo
import polars as pl
df = (
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
)
print(
df.filter(
pl.col("datetime")
.is_between(
datetime(2023, 8, 1, tzinfo=ZoneInfo("Europe/Berlin")),
datetime(2023, 8, 3, tzinfo=ZoneInfo("Europe/Berlin")),
)
)
)
┌─────────────────────────────┐
│ datetime │
│ --- │
│ datetime[μs, Europe/Berlin] │
╞═════════════════════════════╡
│ 2023-08-01 00:00:02 CEST │
│ 2023-08-01 02:00:02 CEST │
└─────────────────────────────┘
To observe the effect of casting back to pl.Date (or pl.Datetime, for better illustration), run for example
df = (
pl.DataFrame(
{
"datetime": [
"[01/Aug/2023:00:00:02 +0200]",
"[01/Aug/2023:02:00:02 +0200]",
"[03/Aug/2023:01:00:02 +0200]",
]
}
)
.with_columns(pl.col("datetime").str.to_datetime("[%d/%b/%Y:%H:%M:%S %z]"))
.with_columns(pl.col("datetime").dt.convert_time_zone("Europe/Berlin"))
.with_columns(pl.col("datetime").cast(pl.Datetime))
)
print(df["datetime"])
[
2023-07-31 22:00:02
2023-08-01 00:00:02
2023-08-02 23:00:02
]
Naive datetime in polars resembles UTC.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论