在R中将2月29日转换为3月1日的最快方式是什么?

huangapple go评论103阅读模式
英文:

Fastest way to convert 29th February into 1st March in R?

问题

我有一个在R中的日期向量("YYYY-MM-DD"),其中包含一些2月29日。为了简化我的分析,我想将这些日期转换为3月1日。我正在使用一个for循环来做这个,但我想知道是否有一种更快的方法,因为我将把它应用到成千上万个日期。

这是一个可重现的示例:

  1. library(lubridate)
  2. dates0 <- as.POSIXlt(as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
  3. "1921-06-22 UTC","1926-05-16 UTC", "1920-02-29 UTC")))
  4. dates0[6]
  5. for(i in 1:6){
  6. if(day(dates0[i])==29 & month(dates0[i])==2) day(dates0[i]) <- 1; month(dates0[i]) <- 3
  7. }
  8. dates0[6]

希望这对你有帮助。

英文:

I have a vector of dates ("YYYY-MM-DD") in R, which contain some 29th February. For simplicity of my analysis, I want to convert these dates into 1st March. I am doing this using a for loop, but I wondered if there is a faster way as I will apply it to tens of thousands of dates.

Here is a reproducible example:

  1. library(lubridate)
  2. dates0 &lt;- as.POSIXlt(as.Date(c(&quot;1945-04-09 UTC&quot;, &quot;1957-06-10 UTC&quot;, &quot;1924-02-28 UTC&quot;,
  3. &quot;1921-06-22 UTC&quot;,&quot;1926-05-16 UTC&quot;, &quot;1920-02-29 UTC&quot;)))
  4. dates0[6]
  5. for(i in 1:6){
  6. if(day(dates0[i])==29 &amp; month(dates0[i])==2) day(dates0[i]) &lt;- 1; month(dates0[i]) &lt;- 3
  7. }
  8. dates0[6]

答案1

得分: 3

尝试

  1. tmp = 月份(dates0) == 2 &amp; 天数(dates0) == 29
  2. dates0[tmp] = dates0[tmp] + 天数(1)
英文:

Try

  1. tmp=month(dates0)==2 &amp; day(dates0)==29
  2. dates0[tmp]=dates0[tmp]+days(1)

答案2

得分: 3

以下是您要翻译的内容:

  1. A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:
  2. tochange &lt; - day(dates0) == 29 &amp; month(dates0) == 2
  3. month(dates0[tochange]) &lt; - 3
  4. day(dates0[tochange]) &lt; - 1
  5. In terms of fastest:
  6. dates1 &lt; - rep(dates0, times = 1000)
  7. OP &lt; - \(x) {
  8. for(i in 1:length(x)){
  9. if(day(x[i])==29 &amp; month(x[i])==2) day(x[i]) &lt; - 1; month(x[i]) &lt; - 3
  10. }
  11. }
  12. setBoth &lt; - \(x) {
  13. tochange &lt; - day(x) == 29 &amp; month(x) == 2
  14. month(x[tochange]) &lt; - 3
  15. day(x[tochange]) &lt; - 1
  16. }
  17. incrementDay &lt; - \(x) {
  18. tmp=month(x)==2 &amp; day(x)==29
  19. x[tmp]=x[tmp]+days(1)
  20. }
  21. microbenchmark(
  22. OP = OP(dates1),
  23. setBoth = setBoth(dates1),
  24. incrementDay = incrementDay(dates1)
  25. )
  26. Unit: milliseconds
  27. expr min lq mean median uq max neval cld
  28. OP 1.1611 1.38955 1.839490 1.67015 1.8911 8.3134 100 a
  29. setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885 100 a
  30. incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587 7.8440 100 a
  31. There doesn't seem to be a lot in it, my way likely slowest.

请注意,代码部分没有被翻译,只翻译了文本内容。

英文:

A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:

  1. tochange &lt;- day(dates0) == 29 &amp; month(dates0) == 2
  2. month(dates0[tochange]) &lt;- 3
  3. day(dates0[tochange]) &lt;- 1

In terms of fastest:

  1. dates1 &lt;- rep(dates0, times = 1000)
  2. OP &lt;- \(x) {
  3. for(i in 1:length(x)){
  4. if(day(x[i])==29 &amp; month(x[i])==2) day(x[i]) &lt;- 1; month(x[i]) &lt;- 3
  5. }
  6. }
  7. setBoth &lt;- \(x) {
  8. tochange &lt;- day(x) == 29 &amp; month(x) == 2
  9. month(x[tochange]) &lt;- 3
  10. day(x[tochange]) &lt;- 1
  11. }
  12. incrementDay &lt;- \(x) {
  13. tmp=month(x)==2 &amp; day(x)==29
  14. x[tmp]=x[tmp]+days(1)
  15. }
  16. microbenchmark(
  17. OP = OP(dates1),
  18. setBoth = setBoth(dates1),
  19. incrementDay = incrementDay(dates1)
  20. )
  21. Unit: milliseconds
  22. expr min lq mean median uq max neval cld
  23. OP 1.1611 1.38955 1.839490 1.67015 1.8911 8.3134 100 a
  24. setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885 100 a
  25. incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587 7.8440 100 a

There doesn't seem to be a lot in it, my way likely slowest.

答案3

得分: 2

如果不需要使用POSIXlt,可以在使用Date时提高速度,当找到2月29日时,只需添加1,这在base中可以正常工作。

  1. i <- which(format(dates1, "%m%d") == "0229")
  2. `[<-`(dates1, i, value = dates1[i] + 1)

数据

  1. dates1 <- as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
  2. "1921-06-22 UTC", "1926-05-16 UTC", "1920-02-29 UTC"))
  3. dates0 <- as.POSIXlt(dates1)

基准测试

  1. incrementDay <- function(x) { #@user2974951
  2. tmp <- month(x) == 2 & day(x) == 29
  3. x[tmp] <- x[tmp] + days(1)
  4. x
  5. }
  6. idLub <- function(dates1) {
  7. i <- which(month(dates1) == 2 & day(dates1) == 29)
  8. `[<-`(dates1, i, value = dates1[i] + 1)
  9. }
  10. idBase <- function(dates1) {
  11. i <- which(format(dates1, "%m%d") == "0229")
  12. `[<-`(dates1, i, value = dates1[i] + 1)
  13. }
  14. bench::mark(incrementDay = incrementDay(dates1),
  15. idLub = idLub(dates1),
  16. idBase = idBase(dates1))

结果

  1. expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
  2. <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
  3. 1 increment 441.3µs 454.7µs 2161. 328B 14.4 1048 7 485ms
  4. 2 idLub 46.2µs 48.8µs 20060. 21.8KB 21.0 9531 10 475ms
  5. 3 idBase 28.8µs 30.9µs 31793. 0B 22.3 9993 7 314ms

在这种情况下,使用Date而不是POSIXlt可以将时间减少10倍。

英文:

In case there is no need to use POSIXlt it would gain much speed when using Date, where, when found the 29 Feb., simply 1 needs to be added and this works in base.

  1. i &lt;- which(format(dates1, &quot;%m%d&quot;) == &quot;0229&quot;)
  2. `[&lt;-`(dates1, i, value = dates1[i] + 1) }

Data

  1. dates1 &lt;- as.Date(c(&quot;1945-04-09 UTC&quot;, &quot;1957-06-10 UTC&quot;, &quot;1924-02-28 UTC&quot;,
  2. &quot;1921-06-22 UTC&quot;,&quot;1926-05-16 UTC&quot;, &quot;1920-02-29 UTC&quot;))
  3. dates0 &lt;- as.POSIXlt(dates1)

Benchmark

  1. incrementDay &lt;- \(x) { #@user2974951
  2. tmp &lt;- month(x)==2 &amp; day(x)==29
  3. x[tmp] &lt;- x[tmp]+days(1)
  4. x }
  5. idLub &lt;- \(dates1) {
  6. i &lt;- which(month(dates1)==2 &amp; day(dates1)==29)
  7. `[&lt;-`(dates1, i, value = dates1[i] + 1) }
  8. idBase &lt;- \(dates1) {
  9. i &lt;- which(format(dates1, &quot;%m%d&quot;) == &quot;0229&quot;)
  10. `[&lt;-`(dates1, i, value = dates1[i] + 1) }
  11. bench::mark(incrementDay = incrementDay(dates1),
  12. idLub = idLub(dates1),
  13. idBase = idBase(dates1) )

Result

  1. expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
  2. &lt;bch:expr&gt; &lt;bch:t&gt; &lt;bch:t&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;bch:tm&gt;
  3. 1 increment 441.3&#181;s 454.7&#181;s 2161. 328B 14.4 1048 7 485ms
  4. 2 idLub 46.2&#181;s 48.8&#181;s 20060. 21.8KB 21.0 9531 10 475ms
  5. 3 idBase 28.8&#181;s 30.9&#181;s 31793. 0B 22.3 9993 7 314ms

Using Date instead of POSIXlt decreases the time in this case by a factor of 10.

huangapple
  • 本文由 发表于 2023年6月22日 17:13:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76530297.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定