英文:
Apply function to vector of char elementwise
问题
我有一个日期向量,我从Excel导入的,它的格式非常奇怪。其中一些是以字符形式出现,如 dd/mm/yyyy
,而另一些则以字符形式出现,例如 45265
,这是Excel中的日期对应的数字。
我想应用一个函数将这个向量转换为R中的正确日期格式。我尝试的解决方案返回了一个错误,我无法理解它。
t1 = c("14/02/2020", "17/02/2020", "18/02/2020", "19/02/2020", "20/02/2020",
"21/02/2020", "26/02/2020", "27/02/2020", "28/02/2020", "43864",
"43893", "43924", "43954", "43985", "44077")
lapply(t1, function(x) ifelse(grepl("/", x), dmy(x), as.Date(as.numeric(x), origin='1900-01-01')))
这段代码尝试将日期向量中的字符转换为日期格式,如果字符串包含 "/",则使用 dmy()
函数进行转换,否则将字符串解析为数值并使用日期原点 '1900-01-01'
进行转换。
英文:
I have a vector of dates which I imported from excel and it comes in a very weird format. Some of them come as char as dd/mm/yyyy
and some come as a char as, for instance, 45265
which is the number corresponding to this date in excel.
I want to apply a function to convert this vector to proper dates in R. The solution I tried returns an error and I cannot understand it.
t1=c("14/02/2020", "17/02/2020", "18/02/2020", "19/02/2020", "20/02/2020",
"21/02/2020", "26/02/2020", "27/02/2020", "28/02/2020", "43864",
"43893", "43924", "43954", "43985", "44077")
lapply(t1,function(x) ifelse(grepl("/",x),dmy(x),as.Date(as.numeric(x),origin='1900-01-01')))
答案1
得分: 2
以下是您要翻译的内容:
- most of what you want to do can be done as a vector, no need to apply;
大部分您想要做的事情可以作为一个向量完成,无需应用;
ifelse
is class-unsafe, trying to use it withDate
-class (orPOSIXt
-class), for example, will strip the class and return numbers. See https://stackoverflow.com/q/6668963/3358272.
ifelse
不安全于类,尝试与 Date
类(或 POSIXt
类)一起使用,例如,将剥离类并返回数字。请参阅 https://stackoverflow.com/q/6668963/3358272。
I suggest this as an alternative:
我建议以下替代方法:
out <- rep(as.Date(NA), length(t1))
out[grepl("/", t1)] <- as.Date(t1[grepl("/", t1)], format = "%d/%m/%Y")
out[is.na(out)] <- as.Date(as.numeric(t1[is.na(out)]), origin = "1900-01-01")
out
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
# [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
如果您有更多的候选格式,您可以考虑 https://stackoverflow.com/a/52319606/3358272 和 https://stackoverflow.com/a/70304571/3358272,这些方法会迭代可能的格式(以类似的方式)并尝试将它们全部转换完成(或用尽)。
An alternative to using base::ifelse
(which strips class) is to use either dplyr::if_else
or data.table::fifelse
, which might be simpler if you are using either package for other uses. Note that they will run both methods on all of t1
, so you will get warnings (both implementations).
与使用会剥离类的 base::ifelse
不同,可以使用 dplyr::if_else
或 data.table::fifelse
,如果您在其他用途中使用这两个包,可能会更简单。请注意,它们将在所有 t1
上同时运行两种方法,因此您将收到警告(两种实现都会有)。
if_else(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
# [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
data.table::fifelse(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" &
<details>
<summary>英文:</summary>
Two things:
1. most of what you want to do can be done as a vector, no need to apply;
2. `ifelse` is class-unsafe, trying to use it with `Date`-class (or `POSIXt`-class), for example, will strip the class and return numbers. See https://stackoverflow.com/q/6668963/3358272.
I suggest this as an alternative:
```r
out <- rep(as.Date(NA), length(t1))
out[grepl("/", t1)] <- as.Date(t1[grepl("/", t1)], format = "%d/%m/%Y")
out[is.na(out)] <- as.Date(as.numeric(t1[is.na(out)]), origin = "1900-01-01")
out
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
# [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
If you have more candidate formats, you might consider https://stackoverflow.com/a/52319606/3358272 and https://stackoverflow.com/a/70304571/3358272, which iterates over possible formats (in a similar way) and attempts to convert them all until completion (or exhaustion).
An alternative to using base::ifelse
(which strips class) is to use either dplyr::if_else
or data.table::fifelse
, which might be simpler if you are using either package for other uses. Note that they will run both methods on all of t1
, so you will get warnings (both implementations).
if_else(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
# [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
data.table::fifelse(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
# [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
# [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
This can be suppressed by wrapping the whole if_else
/fifelse
with suppressWarnings
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论