hablar::dte() Issue in converting a datetime of class POSIXct to a date

huangapple go评论81阅读模式
英文:

hablar::dte() Issue in converting a datetime of class POSIXct to a date

问题

在R 4.2.3中,我发现将dte()应用于类别为"POSIXct"的日期时间会使日期减少一天。以下问题摘自https://github.com/davidsjoberg/hablar/issues/17;更多信息请参见链接。

感谢您允许我快速更改变量的类别的包。我发现将dte()应用于类别为"POSIXct"的日期时间会使日期减少一天。请参见下面的示例。

library(readxl)
library(hablar)
library(tidyselect)
library(magrittr)

A <- read_excel(
  readxl_example("deaths.xlsx"),
  range = "arts!A5:F15",
  .name_repair = "universal"
)

class(A$Date.of.birth)
# [1] "POSIXct" "POSIXt"

# A tibble: 10 × 6
#    Name            Profession   Age Has.kids Date.of.birth       Date.of.death
#    <chr>           <chr>      <dbl> <lgl>   <dttm>              <dttm>             
#  1 David Bowie     musician    69 TRUE    1947-01-08 00:00:00 2016-01-10 00:00:00
#  2 Carrie Fisher   actor      60 TRUE    1956-10-21 00:00:00 2016-12-27 00:00:00
#  3 Chuck Berry     musician    90 TRUE    1926-10-18 00:00:00 2017-03-18 00:00:00
#  4 Bill Paxton     actor      61 TRUE    1955-05-17 00:00:00 2017-02-25 00:00:00
#  5 Prince          musician    57 TRUE    1958-06-07 00:00:00 2016-04-21 00:00:00
#  6 Alan Rickman    actor      69 FALSE   1946-02-21 00:00:00 2016-01-14 00:00:00
#  7 Florence Henderson actor      82 TRUE    1934-02-14 00:00:00 2016-11-24 00:00:00
#  8 Harper Lee      author     89 FALSE   1926-04-28 00:00:00 2016-02-19 00:00:00
#  9 Zsa Zsa G&#225;bor   actor      99 TRUE    1917-02-06 00:00:00 2016-12-18 00:00:00
# 10 George Michael  musician    53 FALSE   1963-06-25 00:00:00 2016-12-25 00:00:00

# … with abbreviated variable names

在这个示例中,与Bowie的出生日期为"1947-01-08"不同,日期变成了"1947-01-07"。所有这些音乐家的日期都是如此。

我知道readxl包正确读取了数据,因为这是数据来自的Excel表。 Excel和生成的R tibble之间的日期完全匹配。

包和版本使用的版本:

R版本4.2.3(2023-03-15 ucrt)
平台:x86_64-w64-mingw32/x64(64位)
运行在:Windows 10 x64(版本19045),RStudio 2023.3.0.386

区域设置:LC_COLLATE=English_United States.utf8
LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United
States.utf8 LC_NUMERIC=C
LC_TIME=English_United States.utf8

包版本:base64enc_0.1.3 bslib_0.4.2 cachem_1.0.7
callr_3.7.3 cellranger_1.1.0 cli_3.6.0 clipr_0.8.0
compiler_4.2.3 cpp11_0.4.3 crayon_1.5.2 digest_0.6.31
dplyr_1.1.0 ellipsis_0.3.2 evaluate_0.20 fansi_1.0.4
fastmap_1.1.1 fs_1.6.1 generics_0.1.3 glue_1.6.2
graphics_4.2.3 grDevices_4.2.3 hablar_0.3.2 highr_0.10
hms_1.1.2 htmltools_0.5.4 jquerylib_0.1.4 jsonlite_1.8.4
knitr_1.42 lifecycle_1.0.3 lubridate_1.9.2 magrittr_2.0.3
memoise_2.0.1 methods_4.2.3 mime_0.12 pillar_1.8.1
pkgconfig_2.0.3 prettyunits_1.1.1 processx_3.8.0 progress_1.2.2
ps_1.7.3 purrr_1.0.1 R6_2.5.1 rappdirs_0.3.3
readxl_1.4.2 rematch_1.0.1 reprex_2.0.2 rlang_1.1.0
rmarkdown_2.20 rstudioapi_0.14 sass_0.4.5 stats_4.2.3
stringi_1.7.12 stringr_1.5.0 tibble_3.2.1 tidyselect_1.2.0
timechange_0.2.0 tinytex_0.44 tools_4.2.3 utf8_1.2.3
utils_4.2.3 vctrs_0.6.0 withr_2.5.0 xfun_0.37
yaml_2.3.7
英文:

In R 4.2.3, I found that applying dte() to date times of a class "POSIXct" makes the day one less. The following issue is copied from https://github.com/davidsjoberg/hablar/issues/17; please see the link for more information.

> Thank you for allowing a package that quickly allows me to change
> classes of variables. I found that applying dte() to date times of a
> class "POSIXct" makes the day be one less. Please see the example
> below.
>
> {r} library(readxl) library(hablar) library(tidyselect)
&gt; library(magrittr)
&gt;
&gt; A &lt;- read_excel( readxl_example(&quot;deaths.xlsx&quot;), range =
&gt; &quot;arts!A5:F15&quot;, .name_repair = &quot;universal&quot; )
&gt; #&gt; New names:
&gt; #&gt; • `Has kids` -&gt; `Has.kids`
&gt; #&gt; • `Date of birth` -&gt; `Date.of.birth`
&gt; #&gt; • `Date of death` -&gt; `Date.of.death` class(A$Date.of.birth)
&gt; #&gt; [1] &quot;POSIXct&quot; &quot;POSIXt&quot; A
&gt; #&gt; # A tibble: 10 &#215; 6
&gt; #&gt; Name Profe…&#185; Age Has.k…&#178; Date.of.birth Date.of.death
&gt; #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;dttm&gt; &lt;dttm&gt;
&gt; #&gt; 1 David Bowie musici… 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
&gt; #&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
&gt; #&gt; 3 Chuck Berry musici… 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
&gt; #&gt; 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
&gt; #&gt; 5 Prince musici… 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
&gt; #&gt; 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
&gt; #&gt; 7 Florence Hende… actor 82 TRUE 1934-02-14 00:00:00 2016-11-24 00:00:00
&gt; #&gt; 8 Harper Lee author 89 FALSE 1926-04-28 00:00:00 2016-02-19 00:00:00
&gt; #&gt; 9 Zsa Zsa G&#225;bor actor 99 TRUE 1917-02-06 00:00:00 2016-12-18 00:00:00
&gt; #&gt; 10 George Michael musici… 53 FALSE 1963-06-25 00:00:00 2016-12-25 00:00:00
&gt; #&gt; # … with abbreviated variable names &#185;​Profession, &#178;​Has.kids A %&gt;% hablar::convert(dte(starts_with(&quot;Date&quot;)))
&gt; #&gt; # A tibble: 10 &#215; 6
&gt; #&gt; Name Profession Age Has.kids Date.of.birth Date.of.death
&gt; #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;date&gt; &lt;date&gt;
&gt; #&gt; 1 David Bowie musician 69 TRUE 1947-01-07 2016-01-09
&gt; #&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-20 2016-12-26
&gt; #&gt; 3 Chuck Berry musician 90 TRUE 1926-10-17 2017-03-17
&gt; #&gt; 4 Bill Paxton actor 61 TRUE 1955-05-16 2017-02-24
&gt; #&gt; 5 Prince musician 57 TRUE 1958-06-06 2016-04-20
&gt; #&gt; 6 Alan Rickman actor 69 FALSE 1946-02-20 2016-01-13
&gt; #&gt; 7 Florence Henderson actor 82 TRUE 1934-02-13 2016-11-23
&gt; #&gt; 8 Harper Lee author 89 FALSE 1926-04-27 2016-02-18
&gt; #&gt; 9 Zsa Zsa G&#225;bor actor 99 TRUE 1917-02-05 2016-12-17
&gt; #&gt; 10 George Michael musician 53 FALSE 1963-06-24 2016-12-24 Created on 2023-03-27 with [reprex
&gt; v2.0.2](https://reprex.tidyverse.org/)

>
> For example, instead of the date of birth for Bowie being
> `1947-01-08", the day becomes "1947-01-07". The same is true for all
> dates of these musicians.
>
> I know that package readxl read the data right as this is the
> excel sheet that the data came from. The dates match identically
> between the excel and the resulting R tibble. <img width="502"
> alt="Annotation 2023-03-27 093609"
> src="https://user-images.githubusercontent.com/17706062/227954203-ae72f540-730a-4f1d-b472-57cdce131c16.png">
>
> The packages and versions used: R version 4.2.3 (2023-03-15 ucrt)
&gt; Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10
&gt; x64 (build 19045), RStudio 2023.3.0.386
&gt;
&gt; Locale: LC_COLLATE=English_United States.utf8
&gt; LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United
&gt; States.utf8 LC_NUMERIC=C
&gt; LC_TIME=English_United States.utf8
&gt;
&gt; Package version: base64enc_0.1.3 bslib_0.4.2 cachem_1.0.7
&gt; callr_3.7.3 cellranger_1.1.0 cli_3.6.0 clipr_0.8.0
&gt; compiler_4.2.3 cpp11_0.4.3 crayon_1.5.2 digest_0.6.31
&gt; dplyr_1.1.0 ellipsis_0.3.2 evaluate_0.20 fansi_1.0.4
&gt; fastmap_1.1.1 fs_1.6.1 generics_0.1.3 glue_1.6.2
&gt; graphics_4.2.3 grDevices_4.2.3 hablar_0.3.2 highr_0.10
&gt; hms_1.1.2 htmltools_0.5.4 jquerylib_0.1.4 jsonlite_1.8.4
&gt; knitr_1.42 lifecycle_1.0.3 lubridate_1.9.2 magrittr_2.0.3
&gt; memoise_2.0.1 methods_4.2.3 mime_0.12 pillar_1.8.1
&gt; pkgconfig_2.0.3 prettyunits_1.1.1 processx_3.8.0 progress_1.2.2
&gt; ps_1.7.3 purrr_1.0.1 R6_2.5.1 rappdirs_0.3.3
&gt; readxl_1.4.2 rematch_1.0.1 reprex_2.0.2 rlang_1.1.0
&gt; rmarkdown_2.20 rstudioapi_0.14 sass_0.4.5 stats_4.2.3
&gt; stringi_1.7.12 stringr_1.5.0 tibble_3.2.1 tidyselect_1.2.0
&gt; timechange_0.2.0 tinytex_0.44 tools_4.2.3 utf8_1.2.3
&gt; utils_4.2.3 vctrs_0.6.0 withr_2.5.0 xfun_0.37
&gt; yaml_2.3.7

答案1

得分: -1

请参阅 https://github.com/davidsjoberg/hablar/issues/17 获取可能的答案。以下是内容,以防链接页面失效:

由于某种原因,strftime 会删除具有午夜时间的日期的一天。根据文档,R 版本 4.2.0 及其后版本已经进行了更改:

strftimeformat.POSIXlt 的包装器,它和 format.POSIXct 首先通过调用 as.POSIXlt 将日期转换为类 "POSIXlt"(因此它们也适用于类 "Date")。注意,只有该转换依赖于时区。自从 R 版本 4.2.0 以来,as.POSIXlt() 转换现在对非有限数值 -Inf、Inf、NA 和 NaN 进行了不同处理(以前都被视为 NA),并且 POSIXlt 的 format() 方法现在将这些不同的非有限时间和日期与类型 double 类似地处理。

对于属于 POSIXct 类的变量,使用 as.Date() 可以解决问题,因此不需要检查 POSIXct 类。我没有写入权限来提取请求。

as_reliable_dte <- function (.x, ...){
    if (any(class(.x) == "Date")) {
      return(.x)
    }
    if (is.logical(.x)) {
      stop("Logical vectors can't be converted to date.")
    }
    if (is.factor(.x)) {
      .x <- as.character(.x)
    }
    # if (any(class(.x) == "POSIXct")) {
    #   .x <- strftime(.x)
    # }
    if (TRUE) {
      return(as.Date(.x, ...))
    }
}

注意:函数 as_reliable_dte() 是由 dte() 调用的内部函数。

dte <- function (...,
.args = list()) {
  list(vars = dplyr::quos(...), fun =
  ~as_reliable_dte(., !!!.args))
}

A <- read_excel(
  readxl_example("deaths.xlsx"),
  range = "arts!A5:F15",
  .name_repair = "universal"
)
A %>%
  hablar::convert(dte(starts_with("Date")))
# A tibble: 10 x 6
   Name               Profession   Age Has.kids Date.of.birth Date.of.death
   <chr>              <chr>      <dbl> <lgl>    <date>        <date>      
 1 David Bowie        musician      69 TRUE     1947-01-08    2016-01-10  
 2 Carrie Fisher      actor         60 TRUE     1956-10-21    2016-12-27  
 3 Chuck Berry        musician      90 TRUE     1926-10-18    2017-03-18  
 4 Bill Paxton        actor         61 TRUE     1955-05-17    2017-02-25  
 5 Prince             musician      57 TRUE     1958-06-07    2016-04-21  
 6 Alan Rickman       actor         69 FALSE    1946-02-21    2016-01-14  
 7 Florence Henderson actor         82 TRUE     1934-02-14    2016-11-24  
 8 Harper Lee         author        89 FALSE    1926-04-28    2016-02-19  
 9 Zsa Zsa Gábor      actor         99 TRUE     1917-02-06    2016-12-18  
10 George Michael     musician      53 FALSE    1963-06-25    2016-12-25  
英文:

Please see https://github.com/davidsjoberg/hablar/issues/17 for a potential answer. The contents are shown below in case the linked page becomes invalidated:

> For some reason, strftime removes a day for dates that have times at
> midnight. According to the documentation, changes have been made in R
> versions 4.2.0 and following:
>
> > strftime is a wrapper for format.POSIXlt, and it and format.POSIXct first convert to class "POSIXlt" by calling
> as.POSIXlt
> (so they also work for class
> "Date"). Note
> that only that conversion depends on the time zone. Since R version
> 4.2.0, that as.POSIXlt() conversion now treats the non-finite numeric -Inf, Inf, NA and NaN differently (where previously all were treated as NA) and also the format() method for POSIXlt now treats these
> different non-finite times and dates analogously to type
> double.
>
> Using as.Date() for variables belonging to the POSIXct class solves
> the problem, so the checking of the POSIXct class is not needed. I
> don't have the writing permissions to pull a request.
>
> {r} as_reliable_dte &lt;- function (.x, ...){
&gt; if (any(class(.x) == &quot;Date&quot;)) {
&gt; return(.x)
&gt; }
&gt; if (is.logical(.x)) {
&gt; stop(&quot;Logical vectors can&#39;t be converted to date.&quot;)
&gt; }
&gt; if (is.factor(.x)) {
&gt; .x &lt;- as.character(.x)
&gt; }
&gt; # if (any(class(.x) == &quot;POSIXct&quot;)) {
&gt; # .x &lt;- strftime(.x)
&gt; # }
&gt; if (TRUE) {
&gt; return(as.Date(.x, ...))
&gt; } }

>
> Note to other users: The function as_reliable_dte() is an internal
> function that is called by dte(). {r} dte &lt;- function (...,
&gt; .args = list()) { list(vars = dplyr::quos(...), fun =
&gt; ~as_reliable_dte(., !!!.args)) }
&gt;
&gt;
&gt; A &lt;- read_excel( readxl_example(&quot;deaths.xlsx&quot;), range =
&gt; &quot;arts!A5:F15&quot;, .name_repair = &quot;universal&quot; ) A %&gt;%
&gt; hablar::convert(dte(starts_with(&quot;Date&quot;)))

&gt; # A tibble: 10 &#215; 6 Name Profession Age Has.kids Date.of.birth Date.of.death &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
&gt; &lt;lgl&gt; &lt;date&gt; &lt;date&gt; 1 David Bowie musician
&gt; 69 TRUE 1947-01-08 2016-01-10 2 Carrie Fisher actor
&gt; 60 TRUE 1956-10-21 2016-12-27 3 Chuck Berry musician
&gt; 90 TRUE 1926-10-18 2017-03-18 4 Bill Paxton actor
&gt; 61 TRUE 1955-05-17 2017-02-25 5 Prince musician
&gt; 57 TRUE 1958-06-07 2016-04-21 6 Alan Rickman actor
&gt; 69 FALSE 1946-02-21 2016-01-14 7 Florence Henderson actor
&gt; 82 TRUE 1934-02-14 2016-11-24 8 Harper Lee author
&gt; 89 FALSE 1926-04-28 2016-02-19 9 Zsa Zsa G&#225;bor actor
&gt; 99 TRUE 1917-02-06 2016-12-18 10 George Michael musician
&gt; 53 FALSE 1963-06-25 2016-12-25

huangapple
  • 本文由 发表于 2023年7月18日 04:59:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76708031.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定