将函数逐个应用于字符向量的元素。

huangapple go评论63阅读模式
英文:

Apply function to vector of char elementwise

问题

我有一个日期向量,我从Excel导入的,它的格式非常奇怪。其中一些是以字符形式出现,如 dd/mm/yyyy,而另一些则以字符形式出现,例如 45265,这是Excel中的日期对应的数字。

我想应用一个函数将这个向量转换为R中的正确日期格式。我尝试的解决方案返回了一个错误,我无法理解它。

t1 = c("14/02/2020", "17/02/2020", "18/02/2020", "19/02/2020", "20/02/2020",
     "21/02/2020", "26/02/2020", "27/02/2020", "28/02/2020", "43864",
     "43893", "43924", "43954", "43985", "44077")
lapply(t1, function(x) ifelse(grepl("/", x), dmy(x), as.Date(as.numeric(x), origin='1900-01-01')))

这段代码尝试将日期向量中的字符转换为日期格式,如果字符串包含 "/",则使用 dmy() 函数进行转换,否则将字符串解析为数值并使用日期原点 '1900-01-01' 进行转换。

英文:

I have a vector of dates which I imported from excel and it comes in a very weird format. Some of them come as char as dd/mm/yyyy and some come as a char as, for instance, 45265 which is the number corresponding to this date in excel.

I want to apply a function to convert this vector to proper dates in R. The solution I tried returns an error and I cannot understand it.

t1=c("14/02/2020", "17/02/2020", "18/02/2020", "19/02/2020", "20/02/2020", 
     "21/02/2020", "26/02/2020", "27/02/2020", "28/02/2020", "43864", 
     "43893", "43924", "43954", "43985", "44077")
lapply(t1,function(x) ifelse(grepl("/",x),dmy(x),as.Date(as.numeric(x),origin='1900-01-01')))

答案1

得分: 2

以下是您要翻译的内容:

  1. most of what you want to do can be done as a vector, no need to apply;

大部分您想要做的事情可以作为一个向量完成,无需应用;

  1. ifelse is class-unsafe, trying to use it with Date-class (or POSIXt-class), for example, will strip the class and return numbers. See https://stackoverflow.com/q/6668963/3358272.

ifelse 不安全于类,尝试与 Date 类(或 POSIXt 类)一起使用,例如,将剥离类并返回数字。请参阅 https://stackoverflow.com/q/6668963/3358272

I suggest this as an alternative:

我建议以下替代方法:

out <- rep(as.Date(NA), length(t1))
out[grepl("/", t1)] <- as.Date(t1[grepl("/", t1)], format = "%d/%m/%Y")
out[is.na(out)] <- as.Date(as.numeric(t1[is.na(out)]), origin = "1900-01-01")
out
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
#  [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"

如果您有更多的候选格式,您可以考虑 https://stackoverflow.com/a/52319606/3358272https://stackoverflow.com/a/70304571/3358272,这些方法会迭代可能的格式(以类似的方式)并尝试将它们全部转换完成(或用尽)。

An alternative to using base::ifelse (which strips class) is to use either dplyr::if_else or data.table::fifelse, which might be simpler if you are using either package for other uses. Note that they will run both methods on all of t1, so you will get warnings (both implementations).

与使用会剥离类的 base::ifelse 不同,可以使用 dplyr::if_elsedata.table::fifelse,如果您在其他用途中使用这两个包,可能会更简单。请注意,它们将在所有 t1 上同时运行两种方法,因此您将收到警告(两种实现都会有)。

if_else(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
#  [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
data.table::fifelse(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" &

<details>
<summary>英文:</summary>

Two things:

1. most of what you want to do can be done as a vector, no need to apply;
2. `ifelse` is class-unsafe, trying to use it with `Date`-class (or `POSIXt`-class), for example, will strip the class and return numbers. See https://stackoverflow.com/q/6668963/3358272.

I suggest this as an alternative:

```r
out &lt;- rep(as.Date(NA), length(t1))
out[grepl(&quot;/&quot;, t1)] &lt;- as.Date(t1[grepl(&quot;/&quot;, t1)], format = &quot;%d/%m/%Y&quot;)
out[is.na(out)] &lt;- as.Date(as.numeric(t1[is.na(out)]), origin = &quot;1900-01-01&quot;)
out
#  [1] &quot;2020-02-14&quot; &quot;2020-02-17&quot; &quot;2020-02-18&quot; &quot;2020-02-19&quot; &quot;2020-02-20&quot; &quot;2020-02-21&quot; &quot;2020-02-26&quot; &quot;2020-02-27&quot;
#  [9] &quot;2020-02-28&quot; &quot;2020-02-05&quot; &quot;2020-03-05&quot; &quot;2020-04-05&quot; &quot;2020-05-05&quot; &quot;2020-06-05&quot; &quot;2020-09-05&quot;

If you have more candidate formats, you might consider https://stackoverflow.com/a/52319606/3358272 and https://stackoverflow.com/a/70304571/3358272, which iterates over possible formats (in a similar way) and attempts to convert them all until completion (or exhaustion).

An alternative to using base::ifelse (which strips class) is to use either dplyr::if_else or data.table::fifelse, which might be simpler if you are using either package for other uses. Note that they will run both methods on all of t1, so you will get warnings (both implementations).

if_else(grepl(&quot;/&quot;, t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = &quot;1900-01-01&quot;))
# WARN [2023-05-17 09:54:01] {&quot;msg&quot;:&quot;uncaught warning&quot;,&quot;warning&quot;:&quot; 6 failed to parse.&quot;,&quot;where&quot;:[&quot;ccbr()&quot;,&quot;if_else(grepl(\&quot;/\&quot;, t1), lubridat&quot;,&quot;lubridate::dmy(t1)&quot;],&quot;pid&quot;:&quot;39316&quot;}
# WARN [2023-05-17 09:54:01] {&quot;msg&quot;:&quot;uncaught warning&quot;,&quot;warning&quot;:&quot;NAs introduced by coercion&quot;,&quot;where&quot;:[&quot;ccbr()&quot;,&quot;if_else(grepl(\&quot;/\&quot;, t1), lubridat&quot;,&quot;as.Date(as.numeric(t1), origin =&quot;],&quot;pid&quot;:&quot;39316&quot;}
#  [1] &quot;2020-02-14&quot; &quot;2020-02-17&quot; &quot;2020-02-18&quot; &quot;2020-02-19&quot; &quot;2020-02-20&quot; &quot;2020-02-21&quot; &quot;2020-02-26&quot; &quot;2020-02-27&quot;
#  [9] &quot;2020-02-28&quot; &quot;2020-02-05&quot; &quot;2020-03-05&quot; &quot;2020-04-05&quot; &quot;2020-05-05&quot; &quot;2020-06-05&quot; &quot;2020-09-05&quot;
data.table::fifelse(grepl(&quot;/&quot;, t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = &quot;1900-01-01&quot;))
# WARN [2023-05-17 09:54:11] {&quot;msg&quot;:&quot;uncaught warning&quot;,&quot;warning&quot;:&quot; 6 failed to parse.&quot;,&quot;where&quot;:[&quot;ccbr()&quot;,&quot;data.table::fifelse(grepl(\&quot;/\&quot;, t&quot;,&quot;lubridate::dmy(t1)&quot;],&quot;pid&quot;:&quot;39316&quot;}
# WARN [2023-05-17 09:54:11] {&quot;msg&quot;:&quot;uncaught warning&quot;,&quot;warning&quot;:&quot;NAs introduced by coercion&quot;,&quot;where&quot;:[&quot;ccbr()&quot;,&quot;data.table::fifelse(grepl(\&quot;/\&quot;, t&quot;,&quot;as.Date(as.numeric(t1), origin =&quot;],&quot;pid&quot;:&quot;39316&quot;}
#  [1] &quot;2020-02-14&quot; &quot;2020-02-17&quot; &quot;2020-02-18&quot; &quot;2020-02-19&quot; &quot;2020-02-20&quot; &quot;2020-02-21&quot; &quot;2020-02-26&quot; &quot;2020-02-27&quot;
#  [9] &quot;2020-02-28&quot; &quot;2020-02-05&quot; &quot;2020-03-05&quot; &quot;2020-04-05&quot; &quot;2020-05-05&quot; &quot;2020-06-05&quot; &quot;2020-09-05&quot;

This can be suppressed by wrapping the whole if_else/fifelse with suppressWarnings.

huangapple
  • 本文由 发表于 2023年5月17日 21:30:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272671.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定