DST temporal feature from timestamp using polars

huangapple go评论67阅读模式
英文:

DST temporal feature from timestamp using polars

问题

我正在将代码从pandas迁移到polars。我有一个包含时间戳和值列的时间序列数据,我需要计算一些特征。例如:

df = pl.DataFrame({
    "timestamp": pl.date_range(
        datetime(2017, 1, 1),
        datetime(2018, 1, 1),
        timedelta(minutes=15),
        time_zone="Australia/Sydney",
        time_unit="ms", eager=True),
    })
value = np.random.normal(0, 1, len(df))
df = df.with_columns([pl.Series(value).alias("value")])

我需要生成一个包含指示时间戳是否为标准时间或夏令时的列。我目前正在使用apply,因为据我所见,没有Temporal Expr(时间表达式)。也就是说,我当前的代码是:

def dst(timestamp:datetime):
    return int(timestamp.dst().total_seconds()!=0)

df = df.with_columns(pl.struct(["timestamp"]).apply(lambda x: dst(**x)).alias("dst"))

(这使用了一个有效地检查tzinfo.dst(dt)偏移是否为零的技巧)

是否有使用polars expressions而不是(慢的)apply来执行这个操作的(快速)方法?

英文:

I'm migrating code to polars from pandas. I have time-series data consisting of a timestamp and value column and I need to compute a bunch of features. i.e.

df = pl.DataFrame({
    "timestamp": pl.date_range(
        datetime(2017, 1, 1),
        datetime(2018, 1, 1),
        timedelta(minutes=15),
        time_zone="Australia/Sydney",
        time_unit="ms", eager=True),
        })
    value = np.random.normal(0, 1, len(df))
    df = df.with_columns([pl.Series(value).alias("value")])

I need to generate a column containing an indicator if the timestamp is standard or daylight time. I'm currently using apply because as far as I can see the isn't a Temporal Expr, i.e. my current code is

def dst(timestamp:datetime):
    return int(timestamp.dst().total_seconds()!=0)

df = df.with_columns(pl.struct(["timestamp"]).apply(lambda x: dst(**x)).alias("dst"))

(this uses a trick that effectively checks if the tzinfo.dst(dt) offset is zero or not)

Is there a (fast) way of doing this using polars expressions rather than (slow) apply?

答案1

得分: 1

你可以利用 strftime 来实现这个功能。

(
    df
        .with_columns(
            dst=pl.when(pl.col('timestamp').dt.strftime("%Z").str.contains("(DT$)"))
            .then(True)
            .otherwise(False)
            )
)

它依赖于本地时区以 "DT" 结尾来确定夏令时的状态。这在这里可以工作,并且也适用于美国的时区(例如 EST/EDT、CST/CDT 等),但是有许多不适用的示例

或者,您可以使用UTC偏移量,但这会更加复杂。

(
    df
        .with_columns(
            tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
            )
    .join(
        df
            .select(
                tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
                )
            .unique('tzoff')
            .sort('tzoff')
            .with_columns(
                dst=pl.lit([False, True])
                ), 
        on='tzoff')
    .drop('tzoff')
)

这个方法假设时区只有2个偏移量,较小的是标准时间,较大的是夏令时。

英文:

You can exploit strftime for this.

(
    df
        .with_columns(
            dst=pl.when(pl.col('timestamp').dt.strftime("%Z").str.contains("(DT$)"))
            .then(True)
            .otherwise(False)
            )
)

It relies on the local time zone ending in "DT" to determine the dst status. That works here and would work for US time zones (ie EST/EDT, CST/CDT, etc) but examples that wouldn't work are numerous.

Alternatively you could use the utc offset but it's a lot more convoluted.

(
    df
        .with_columns(
            tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
            )
    .join(
        df
            .select(
                tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
                )
            .unique('tzoff')
            .sort('tzoff')
            .with_columns(
                dst=pl.lit([False, True])
                ), 
        on='tzoff')
    .drop('tzoff')
)

This one assumes that the timezone only has 2 offsets and that the smaller of the two is standard time and the bigger one is daylight savings.

答案2

得分: 1

使用polars>=0.18.5,以下代码可以正常工作:

df = df.with_columns((pl.col("timestamp").dt.dst_offset()==0).cast(pl.Int32).alias("dst"))
英文:

With polars>=0.18.5 the following works

df = df.with_columns((pl.col("timestamp").dt.dst_offset()==0).cast(pl.Int32).alias("dst"))

huangapple
  • 本文由 发表于 2023年6月29日 12:57:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76578147.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定