strptime() 在不同系统上处理夏令时(DST)的方式不同。

huangapple go评论84阅读模式
英文:

strptime() handles DST differently on different systems

问题

I'm in the extremely unfortunate position of having to manually compensate (in a script, not just a one-time cleanup) for some incoming data that doesn't observe Daylight Savings Time conventions properly. I'm trying to find the source of a difference in processing for some dates near the spring DST boundary, and ultimately correct for it.

An hourly sequence of incoming values might look like this:

dts <- c("03/12/2023 01:00:00",
         "03/12/2023 02:00:00",  # <- invalid, "should be" 03:00:00
         "03/12/2023 04:00:00",
         "03/12/2023 05:00:00")

On a MacOS system with R 4.2.2, strptime parses those like so:

(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"

x$zone
# [1] "PST" ""    "PDT" "PDT"

But on an Ubuntu 18.04 system with R 4.2.2, I get this:

(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00 PDT"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"

x$zone
# [1] "PST" "PDT"    "PDT" "PDT"

Now, in the former case I can recognize the problem and correct it:

(bad <- is.na(x) & is.na(lubridate::dst(x)))
# [1] FALSE  TRUE FALSE FALSE

But in the latter case nothing is detected as wrong:

(bad <- is.na(x) & is.na(lubridate::dst(x)))
[1] FALSE FALSE FALSE FALSE

The docs for strptime say:

Input uses the POSIX function strptime and output the C99 function strftime. However, not all OSes (notably Windows) provided strptime and many issues were found for those which did, so since 2000 R has used a fork of code from ‘⁠glibc⁠’.

So I'm not understanding where this issue is coming from, since theoretically strptime should be the same on all recent systems?

Any insight or ideas?

英文:

I'm in the extremely unfortunate position of having to manually compensate (in a script, not just a one-time cleanup) for some incoming data that doesn't observe Daylight Savings Time conventions properly. I'm trying to find the source of a difference in processing for some dates near the spring DST boundary, and ultimately correct for it.

An hourly sequence of incoming values might look like this:

dts &lt;- c(&quot;03/12/2023 01:00:00&quot;,
         &quot;03/12/2023 02:00:00&quot;,  # &lt;- invalid, &quot;should be&quot; 03:00:00
         &quot;03/12/2023 04:00:00&quot;,
         &quot;03/12/2023 05:00:00&quot;)

On a MacOS system with R 4.2.2, strptime parses those like so:

(x &lt;- strptime(dts, &quot;%m/%d/%Y %H:%M:%S&quot;, tz = &quot;America/Los_Angeles&quot;))
# [1] &quot;2023-03-12 01:00:00 PST&quot; &quot;2023-03-12 02:00:00&quot;
# [3] &quot;2023-03-12 04:00:00 PDT&quot; &quot;2023-03-12 05:00:00 PDT&quot;

x$zone
# [1] &quot;PST&quot; &quot;&quot;    &quot;PDT&quot; &quot;PDT&quot;

But on an Ubuntu 18.04 system with R 4.2.2, I get this:

(x &lt;- strptime(dts, &quot;%m/%d/%Y %H:%M:%S&quot;, tz = &quot;America/Los_Angeles&quot;))
# [1] &quot;2023-03-12 01:00:00 PST&quot; &quot;2023-03-12 02:00:00 PDT&quot;
# [3] &quot;2023-03-12 04:00:00 PDT&quot; &quot;2023-03-12 05:00:00 PDT&quot;

x$zone
# [1] &quot;PST&quot; &quot;PDT&quot;    &quot;PDT&quot; &quot;PDT&quot;

Now, in the former case I can recognize the problem and correct it:

(bad &lt;- is.na(x) &amp; is.na(lubridate::dst(x)))
# [1] FALSE  TRUE FALSE FALSE

But in the latter case nothing is detected as wrong:

&gt; (bad &lt;- is.na(x) &amp; is.na(lubridate::dst(x)))
[1] FALSE FALSE FALSE FALSE

The docs for strptime say:
> Input uses the POSIX function strptime and output the C99 function strftime. However, not all OSes (notably Windows) provided strptime and many issues were found for those which did, so since 2000 R has used a fork of code from ‘⁠glibc⁠’.

So I'm not understanding where this issue is coming from, since theoretically strptime should be the same on all recent systems?

Any insight or ideas?

答案1

得分: 1

看起来共识是这不应该发生,因为现代的R附带了自己的glibc中的strptime(3)代码,以及自己版本的Olsen数据库。

我在R的bugzilla上开了一个问题报告:https://bugs.r-project.org/show_bug.cgi?id=18581

英文:

Looks like the consensus is that this shouldn't really be happening, because modern R ships with both its own strptime(3) code from glibc, and its own version of the Olsen database.

I opened a ticket in R's bugzilla about it: https://bugs.r-project.org/show_bug.cgi?id=18581

huangapple
  • 本文由 发表于 2023年8月4日 02:03:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830603.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定