如何在应用函数后保持日期格式

huangapple go评论58阅读模式
英文:

How to maintain date format when after applying function

问题

我有一个数据框,其中包含格式不佳的日期信息。

date = c("18102016", "11102017", "4052017", "18102018", "3102018")
df <- data.frame(date = date, x1 = 1:5, x2 = rep(1,5))


我已经编写了名为 `fix_date_all()` 的函数,当应用于向量 `df$date` 时,可以进行正确的格式化。

fix_date_all<- function(date){
fix_date <- function(d) {
if (nchar(d) != 8) d <- paste0("0", d)

dd <- d %>% substr(1,2)
mm <- d %>% substr(3,4)
yyyy <- d %>% substr(5,8)

d <- paste0(dd, ".", mm, ".", yyyy) %>% as.Date("%d.%m.%Y")

d

}

lapply(date, fix_date)
}

fix_date_all(df$date)


现在,我想使用类似 tidyverse 风格的方法将此变量转换为正确的日期格式:

df %>% mutate(across(date, fix_date_all))


然而,当以 tidyverse 风格使用时,日期被搞乱了。

date x1 x2
1 17092 1 1
2 17450 2 1
3 17290 3 1
4 17822 4 1
5 17807 5 1

英文:

I have dataframe with a poorly formatted date information.

date = c(&quot;18102016&quot;, &quot;11102017&quot;, &quot;4052017&quot;, &quot;18102018&quot;, &quot;3102018&quot;)
df &lt;- data.frame(date = date, x1 = 1:5, x2 = rep(1,5)) 

I have already written the function fix_date_all() which does the proper formatting when applied to the vector df$date

fix_date_all&lt;- function(date){
  fix_date &lt;- function(d) {
    if (nchar(d) != 8) d &lt;- paste0(&quot;0&quot;, d)
    
    dd &lt;- d %&gt;% substr(1,2)
    mm &lt;- d %&gt;% substr(3,4)
    yyyy &lt;- d %&gt;% substr(5,8)
    
    d &lt;- paste0(dd, &quot;.&quot;, mm, &quot;.&quot;, yyyy) %&gt;% as.Date(&quot;%d.%m.%Y&quot;)
    
    d
  }
  
  lapply(date, fix_date)
}

fix_date_all(df$date)

Now I would like to transform this variable to a proper date format using a tidyverse like style:

df %&gt;% mutate(across(date, fix_date_all))

However, when using it in a tidyverse style, the date gets screwed up.

   date x1 x2
1 17092  1  1
2 17450  2  1
3 17290  3  1
4 17822  4  1
5 17807  5  1

答案1

得分: 3

以下是已翻译的内容:

The output is a list from the lapply call.

fix_date_all(df$date)
[[1]]
[1] "2016-10-18"

[[2]]
[1] "2017-10-11"

[[3]]
[1] "2017-05-04"

[[4]]
[1] "2018-10-18"

[[5]]
[1] "2018-10-03"

We need to flatten it with c

library(dplyr)
df %>% 
   mutate(date = fix_date_all(date) %>%
   do.call(c, .))

-output

        date x1 x2
1 2016-10-18  1  1
2 2017-10-11  2  1
3 2017-05-04  3  1
4 2018-10-18  4  1
5 2018-10-03  5  1

Or in the newer version of purrr, use list_c

library(purrr)
df %>% 
    mutate(date = fix_date_all(date)  %>% list_c)
        date x1 x2
1 2016-10-18  1  1
2 2017-10-11  2  1
3 2017-05-04  3  1
4 2018-10-18  4  1
5 2018-10-03  5  1
英文:

The output is a list from the lapply call.

fix_date_all(df$date)
[[1]]
[1] &quot;2016-10-18&quot;

[[2]]
[1] &quot;2017-10-11&quot;

[[3]]
[1] &quot;2017-05-04&quot;

[[4]]
[1] &quot;2018-10-18&quot;

[[5]]
[1] &quot;2018-10-03&quot;

We need to flatten it with c

library(dplyr)
df %&gt;% 
   mutate(date = fix_date_all(date) %&gt;%
   do.call(c, .))

-output

        date x1 x2
1 2016-10-18  1  1
2 2017-10-11  2  1
3 2017-05-04  3  1
4 2018-10-18  4  1
5 2018-10-03  5  1

Or in the newer version of purrr, use list_c

library(purrr)
df %&gt;% 
    mutate(date = fix_date_all(date)  %&gt;% list_c)
        date x1 x2
1 2016-10-18  1  1
2 2017-10-11  2  1
3 2017-05-04  3  1
4 2018-10-18  4  1
5 2018-10-03  5  1

答案2

得分: 3

第二个选择是摆脱 `lapply` 并重写您的函数,例如使用 `string::str_pad`:

``` r
library(dplyr, warn.conflicts = FALSE)

fix_date_all <- function(date){
  date %>%  
    stringr::str_pad(width = 8, pad = "0") %>% 
    as.Date(format = "%d%m%Y")
}

fix_date_all(df$date)
#> [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"

df %>% 
  mutate(across(date, fix_date_all))
#>         date x1 x2
#> 1 2016-10-18  1  1
#> 2 2017-10-11  2  1
#> 3 2017-05-04  3  1
#> 4 2018-10-18  4  1
#> 5 2018-10-03  5  1

<details>
<summary>英文:</summary>

A second option would be to get rid of `lapply` and rewrite your function using e.g. `string::str_pad`:

``` r
library(dplyr, warn.conflicts = FALSE)

fix_date_all&lt;- function(date){
  date %&gt;%  
    stringr::str_pad(width = 8, pad = &quot;0&quot;) %&gt;% 
    as.Date(format = &quot;%d%m%Y&quot;)
}

fix_date_all(df$date)
#&gt; [1] &quot;2016-10-18&quot; &quot;2017-10-11&quot; &quot;2017-05-04&quot; &quot;2018-10-18&quot; &quot;2018-10-03&quot;

df %&gt;% 
  mutate(across(date, fix_date_all))
#&gt;         date x1 x2
#&gt; 1 2016-10-18  1  1
#&gt; 2 2017-10-11  2  1
#&gt; 3 2017-05-04  3  1
#&gt; 4 2018-10-18  4  1
#&gt; 5 2018-10-03  5  1

答案3

得分: 2

sprintf会在数字较短时以0填充,然后将其转换为日期。不使用任何包。

as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y")
## [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"

请注意,它是矢量化的,并可以在mutate中使用:

library(dplyr)
data.frame(date) %>%
    mutate(date = as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y"))
##         date
## 1 2016-10-18
## 2 2017-10-11
## 3 2017-05-04
## 4 2018-10-18
## 5 2018-10-03
英文:

The sprintf will prepend with a 0 if short and then we convert to Date. No packages are used.

as.Date(sprintf(&quot;%08d&quot;, as.numeric(date)), &quot;%d%m%Y&quot;)
## [1] &quot;2016-10-18&quot; &quot;2017-10-11&quot; &quot;2017-05-04&quot; &quot;2018-10-18&quot; &quot;2018-10-03&quot;

Note that it is vectorized and works within mutate:

library(dplyr)
data.frame(date) %&gt;%
    mutate(date = as.Date(sprintf(&quot;%08d&quot;, as.numeric(date)), &quot;%d%m%Y&quot;))
##         date
## 1 2016-10-18
## 2 2017-10-11
## 3 2017-05-04
## 4 2018-10-18
## 5 2018-10-03

答案4

得分: 1

请注意,这部分内容是代码,不需要翻译。

英文:

Instead of lapply use sapply. But at the same time, just use vectorized ifelse as shown below:

fix_date_all&lt;- function(d){
    d &lt;- ifelse(nchar(d) != 8, paste0(&quot;0&quot;, d), d)
    as.Date(d, &quot;%d%m%Y&quot;)
}

df %&gt;% 
    mutate(date = fix_date_all(date))

        date x1 x2
1 2016-10-18  1  1
2 2017-10-11  2  1
3 2017-05-04  3  1
4 2018-10-18  4  1
5 2018-10-03  5  1
&gt; 

huangapple
  • 本文由 发表于 2023年2月7日 01:11:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364471.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定