英文:
strptime() handles DST differently on different systems
问题
I'm in the extremely unfortunate position of having to manually compensate (in a script, not just a one-time cleanup) for some incoming data that doesn't observe Daylight Savings Time conventions properly. I'm trying to find the source of a difference in processing for some dates near the spring DST boundary, and ultimately correct for it.
An hourly sequence of incoming values might look like this:
dts <- c("03/12/2023 01:00:00",
"03/12/2023 02:00:00", # <- invalid, "should be" 03:00:00
"03/12/2023 04:00:00",
"03/12/2023 05:00:00")
On a MacOS system with R 4.2.2, strptime
parses those like so:
(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"
x$zone
# [1] "PST" "" "PDT" "PDT"
But on an Ubuntu 18.04 system with R 4.2.2, I get this:
(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00 PDT"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"
x$zone
# [1] "PST" "PDT" "PDT" "PDT"
Now, in the former case I can recognize the problem and correct it:
(bad <- is.na(x) & is.na(lubridate::dst(x)))
# [1] FALSE TRUE FALSE FALSE
But in the latter case nothing is detected as wrong:
(bad <- is.na(x) & is.na(lubridate::dst(x)))
[1] FALSE FALSE FALSE FALSE
The docs for strptime
say:
Input uses the POSIX function strptime and output the C99 function strftime. However, not all OSes (notably Windows) provided strptime and many issues were found for those which did, so since 2000 R has used a fork of code from ‘glibc’.
So I'm not understanding where this issue is coming from, since theoretically strptime
should be the same on all recent systems?
Any insight or ideas?
英文:
I'm in the extremely unfortunate position of having to manually compensate (in a script, not just a one-time cleanup) for some incoming data that doesn't observe Daylight Savings Time conventions properly. I'm trying to find the source of a difference in processing for some dates near the spring DST boundary, and ultimately correct for it.
An hourly sequence of incoming values might look like this:
dts <- c("03/12/2023 01:00:00",
"03/12/2023 02:00:00", # <- invalid, "should be" 03:00:00
"03/12/2023 04:00:00",
"03/12/2023 05:00:00")
On a MacOS system with R 4.2.2, strptime
parses those like so:
(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"
x$zone
# [1] "PST" "" "PDT" "PDT"
But on an Ubuntu 18.04 system with R 4.2.2, I get this:
(x <- strptime(dts, "%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles"))
# [1] "2023-03-12 01:00:00 PST" "2023-03-12 02:00:00 PDT"
# [3] "2023-03-12 04:00:00 PDT" "2023-03-12 05:00:00 PDT"
x$zone
# [1] "PST" "PDT" "PDT" "PDT"
Now, in the former case I can recognize the problem and correct it:
(bad <- is.na(x) & is.na(lubridate::dst(x)))
# [1] FALSE TRUE FALSE FALSE
But in the latter case nothing is detected as wrong:
> (bad <- is.na(x) & is.na(lubridate::dst(x)))
[1] FALSE FALSE FALSE FALSE
The docs for strptime
say:
> Input uses the POSIX function strptime and output the C99 function strftime. However, not all OSes (notably Windows) provided strptime and many issues were found for those which did, so since 2000 R has used a fork of code from ‘glibc’.
So I'm not understanding where this issue is coming from, since theoretically strptime
should be the same on all recent systems?
Any insight or ideas?
答案1
得分: 1
看起来共识是这不应该发生,因为现代的R附带了自己的glibc
中的strptime(3)
代码,以及自己版本的Olsen数据库。
我在R的bugzilla上开了一个问题报告:https://bugs.r-project.org/show_bug.cgi?id=18581
英文:
Looks like the consensus is that this shouldn't really be happening, because modern R ships with both its own strptime(3)
code from glibc
, and its own version of the Olsen database.
I opened a ticket in R's bugzilla about it: https://bugs.r-project.org/show_bug.cgi?id=18581
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论