英文:
How to maintain date format when after applying function
问题
我有一个数据框,其中包含格式不佳的日期信息。
date = c("18102016", "11102017", "4052017", "18102018", "3102018")
df <- data.frame(date = date, x1 = 1:5, x2 = rep(1,5))
我已经编写了名为 `fix_date_all()` 的函数,当应用于向量 `df$date` 时,可以进行正确的格式化。
fix_date_all<- function(date){
fix_date <- function(d) {
if (nchar(d) != 8) d <- paste0("0", d)
dd <- d %>% substr(1,2)
mm <- d %>% substr(3,4)
yyyy <- d %>% substr(5,8)
d <- paste0(dd, ".", mm, ".", yyyy) %>% as.Date("%d.%m.%Y")
d
}
lapply(date, fix_date)
}
fix_date_all(df$date)
现在,我想使用类似 tidyverse 风格的方法将此变量转换为正确的日期格式:
df %>% mutate(across(date, fix_date_all))
然而,当以 tidyverse 风格使用时,日期被搞乱了。
date x1 x2
1 17092 1 1
2 17450 2 1
3 17290 3 1
4 17822 4 1
5 17807 5 1
英文:
I have dataframe with a poorly formatted date information.
date = c("18102016", "11102017", "4052017", "18102018", "3102018")
df <- data.frame(date = date, x1 = 1:5, x2 = rep(1,5))
I have already written the function fix_date_all()
which does the proper formatting when applied to the vector df$date
fix_date_all<- function(date){
fix_date <- function(d) {
if (nchar(d) != 8) d <- paste0("0", d)
dd <- d %>% substr(1,2)
mm <- d %>% substr(3,4)
yyyy <- d %>% substr(5,8)
d <- paste0(dd, ".", mm, ".", yyyy) %>% as.Date("%d.%m.%Y")
d
}
lapply(date, fix_date)
}
fix_date_all(df$date)
Now I would like to transform this variable to a proper date format using a tidyverse like style:
df %>% mutate(across(date, fix_date_all))
However, when using it in a tidyverse style, the date gets screwed up.
date x1 x2
1 17092 1 1
2 17450 2 1
3 17290 3 1
4 17822 4 1
5 17807 5 1
答案1
得分: 3
以下是已翻译的内容:
The output is a list
from the lapply
call.
fix_date_all(df$date)
[[1]]
[1] "2016-10-18"
[[2]]
[1] "2017-10-11"
[[3]]
[1] "2017-05-04"
[[4]]
[1] "2018-10-18"
[[5]]
[1] "2018-10-03"
We need to flatten it with c
library(dplyr)
df %>%
mutate(date = fix_date_all(date) %>%
do.call(c, .))
-output
date x1 x2
1 2016-10-18 1 1
2 2017-10-11 2 1
3 2017-05-04 3 1
4 2018-10-18 4 1
5 2018-10-03 5 1
Or in the newer version of purrr
, use list_c
library(purrr)
df %>%
mutate(date = fix_date_all(date) %>% list_c)
date x1 x2
1 2016-10-18 1 1
2 2017-10-11 2 1
3 2017-05-04 3 1
4 2018-10-18 4 1
5 2018-10-03 5 1
英文:
The output is a list
from the lapply
call.
fix_date_all(df$date)
[[1]]
[1] "2016-10-18"
[[2]]
[1] "2017-10-11"
[[3]]
[1] "2017-05-04"
[[4]]
[1] "2018-10-18"
[[5]]
[1] "2018-10-03"
We need to flatten it with c
library(dplyr)
df %>%
mutate(date = fix_date_all(date) %>%
do.call(c, .))
-output
date x1 x2
1 2016-10-18 1 1
2 2017-10-11 2 1
3 2017-05-04 3 1
4 2018-10-18 4 1
5 2018-10-03 5 1
Or in the newer version of purrr
, use list_c
library(purrr)
df %>%
mutate(date = fix_date_all(date) %>% list_c)
date x1 x2
1 2016-10-18 1 1
2 2017-10-11 2 1
3 2017-05-04 3 1
4 2018-10-18 4 1
5 2018-10-03 5 1
答案2
得分: 3
第二个选择是摆脱 `lapply` 并重写您的函数,例如使用 `string::str_pad`:
``` r
library(dplyr, warn.conflicts = FALSE)
fix_date_all <- function(date){
date %>%
stringr::str_pad(width = 8, pad = "0") %>%
as.Date(format = "%d%m%Y")
}
fix_date_all(df$date)
#> [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
df %>%
mutate(across(date, fix_date_all))
#> date x1 x2
#> 1 2016-10-18 1 1
#> 2 2017-10-11 2 1
#> 3 2017-05-04 3 1
#> 4 2018-10-18 4 1
#> 5 2018-10-03 5 1
<details>
<summary>英文:</summary>
A second option would be to get rid of `lapply` and rewrite your function using e.g. `string::str_pad`:
``` r
library(dplyr, warn.conflicts = FALSE)
fix_date_all<- function(date){
date %>%
stringr::str_pad(width = 8, pad = "0") %>%
as.Date(format = "%d%m%Y")
}
fix_date_all(df$date)
#> [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
df %>%
mutate(across(date, fix_date_all))
#> date x1 x2
#> 1 2016-10-18 1 1
#> 2 2017-10-11 2 1
#> 3 2017-05-04 3 1
#> 4 2018-10-18 4 1
#> 5 2018-10-03 5 1
答案3
得分: 2
sprintf
会在数字较短时以0填充,然后将其转换为日期。不使用任何包。
as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y")
## [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
请注意,它是矢量化的,并可以在mutate
中使用:
library(dplyr)
data.frame(date) %>%
mutate(date = as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y"))
## date
## 1 2016-10-18
## 2 2017-10-11
## 3 2017-05-04
## 4 2018-10-18
## 5 2018-10-03
英文:
The sprintf
will prepend with a 0 if short and then we convert to Date. No packages are used.
as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y")
## [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
Note that it is vectorized and works within mutate
:
library(dplyr)
data.frame(date) %>%
mutate(date = as.Date(sprintf("%08d", as.numeric(date)), "%d%m%Y"))
## date
## 1 2016-10-18
## 2 2017-10-11
## 3 2017-05-04
## 4 2018-10-18
## 5 2018-10-03
答案4
得分: 1
请注意,这部分内容是代码,不需要翻译。
英文:
Instead of lapply
use sapply
. But at the same time, just use vectorized ifelse
as shown below:
fix_date_all<- function(d){
d <- ifelse(nchar(d) != 8, paste0("0", d), d)
as.Date(d, "%d%m%Y")
}
df %>%
mutate(date = fix_date_all(date))
date x1 x2
1 2016-10-18 1 1
2 2017-10-11 2 1
3 2017-05-04 3 1
4 2018-10-18 4 1
5 2018-10-03 5 1
>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论