在R中将2月29日转换为3月1日的最快方式是什么?

huangapple go评论62阅读模式
英文:

Fastest way to convert 29th February into 1st March in R?

问题

我有一个在R中的日期向量("YYYY-MM-DD"),其中包含一些2月29日。为了简化我的分析,我想将这些日期转换为3月1日。我正在使用一个for循环来做这个,但我想知道是否有一种更快的方法,因为我将把它应用到成千上万个日期。

这是一个可重现的示例:

library(lubridate)
dates0 <-  as.POSIXlt(as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
             "1921-06-22 UTC","1926-05-16 UTC", "1920-02-29 UTC")))

dates0[6]

for(i in 1:6){
  if(day(dates0[i])==29 & month(dates0[i])==2) day(dates0[i]) <- 1; month(dates0[i]) <- 3
}

dates0[6]

希望这对你有帮助。

英文:

I have a vector of dates ("YYYY-MM-DD") in R, which contain some 29th February. For simplicity of my analysis, I want to convert these dates into 1st March. I am doing this using a for loop, but I wondered if there is a faster way as I will apply it to tens of thousands of dates.

Here is a reproducible example:

library(lubridate)
dates0 &lt;-  as.POSIXlt(as.Date(c(&quot;1945-04-09 UTC&quot;, &quot;1957-06-10 UTC&quot;, &quot;1924-02-28 UTC&quot;,
             &quot;1921-06-22 UTC&quot;,&quot;1926-05-16 UTC&quot;, &quot;1920-02-29 UTC&quot;)))

dates0[6]

for(i in 1:6){
  if(day(dates0[i])==29 &amp; month(dates0[i])==2) day(dates0[i]) &lt;- 1; month(dates0[i]) &lt;- 3
}

dates0[6]

答案1

得分: 3

尝试

tmp = 月份(dates0) == 2 &amp; 天数(dates0) == 29
dates0[tmp] = dates0[tmp] + 天数(1)
英文:

Try

tmp=month(dates0)==2 &amp; day(dates0)==29
dates0[tmp]=dates0[tmp]+days(1)

答案2

得分: 3

以下是您要翻译的内容:

A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:

tochange &lt; - day(dates0) == 29 &amp; month(dates0) == 2
month(dates0[tochange]) &lt; - 3
day(dates0[tochange]) &lt; - 1

In terms of fastest:
dates1 &lt; - rep(dates0, times = 1000)

OP &lt; - \(x) {
  for(i in 1:length(x)){
    if(day(x[i])==29 &amp; month(x[i])==2) day(x[i]) &lt; - 1; month(x[i]) &lt; - 3
  }
}

setBoth &lt; - \(x) {
  tochange &lt; - day(x) == 29 &amp; month(x) == 2
  month(x[tochange]) &lt; - 3
  day(x[tochange]) &lt; - 1
}

incrementDay &lt; - \(x) {
  tmp=month(x)==2 &amp; day(x)==29
  x[tmp]=x[tmp]+days(1)
}

microbenchmark(
  OP           =  OP(dates1),
  setBoth      =  setBoth(dates1),
  incrementDay =  incrementDay(dates1)
)

Unit: milliseconds
         expr    min      lq     mean  median     uq      max neval cld
           OP 1.1611 1.38955 1.839490 1.67015 1.8911   8.3134   100   a
      setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885   100   a
 incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587   7.8440   100   a

There doesn't seem to be a lot in it, my way likely slowest.

请注意,代码部分没有被翻译,只翻译了文本内容。

英文:

A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:

tochange &lt;- day(dates0) == 29 &amp; month(dates0) == 2
month(dates0[tochange]) &lt;- 3
day(dates0[tochange]) &lt;- 1

In terms of fastest:

dates1 &lt;- rep(dates0, times = 1000)

OP &lt;- \(x) {
  for(i in 1:length(x)){
    if(day(x[i])==29 &amp; month(x[i])==2) day(x[i]) &lt;- 1; month(x[i]) &lt;- 3
  }
}

setBoth &lt;- \(x) {
  tochange &lt;- day(x) == 29 &amp; month(x) == 2
  month(x[tochange]) &lt;- 3
  day(x[tochange]) &lt;- 1
}

incrementDay &lt;- \(x) {
  tmp=month(x)==2 &amp; day(x)==29
  x[tmp]=x[tmp]+days(1)
}

microbenchmark(
  OP           =  OP(dates1),
  setBoth      =  setBoth(dates1),
  incrementDay =  incrementDay(dates1)
)

Unit: milliseconds
         expr    min      lq     mean  median     uq      max neval cld
           OP 1.1611 1.38955 1.839490 1.67015 1.8911   8.3134   100   a
      setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885   100   a
 incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587   7.8440   100   a

There doesn't seem to be a lot in it, my way likely slowest.

答案3

得分: 2

如果不需要使用POSIXlt,可以在使用Date时提高速度,当找到2月29日时,只需添加1,这在base中可以正常工作。

i <- which(format(dates1, "%m%d") == "0229")
`[<-`(dates1, i, value = dates1[i] + 1)

数据

dates1 <- as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
                    "1921-06-22 UTC", "1926-05-16 UTC", "1920-02-29 UTC"))
dates0 <- as.POSIXlt(dates1)

基准测试

incrementDay <- function(x) {   #@user2974951
  tmp <- month(x) == 2 & day(x) == 29
  x[tmp] <- x[tmp] + days(1)
  x
}

idLub <- function(dates1) {
    i <- which(month(dates1) == 2 & day(dates1) == 29)
    `[<-`(dates1, i, value = dates1[i] + 1)
}

idBase <- function(dates1) {
    i <- which(format(dates1, "%m%d") == "0229")
    `[<-`(dates1, i, value = dates1[i] + 1)
}

bench::mark(incrementDay =  incrementDay(dates1),
            idLub =  idLub(dates1),
            idBase =  idBase(dates1))

结果

 expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 increment… 441.3µs 454.7µs     2161.      328B     14.4  1048     7      485ms
2 idLub       46.2µs  48.8µs    20060.    21.8KB     21.0  9531    10      475ms
3 idBase      28.8µs  30.9µs    31793.        0B     22.3  9993     7      314ms

在这种情况下,使用Date而不是POSIXlt可以将时间减少10倍。

英文:

In case there is no need to use POSIXlt it would gain much speed when using Date, where, when found the 29 Feb., simply 1 needs to be added and this works in base.

i &lt;- which(format(dates1, &quot;%m%d&quot;) == &quot;0229&quot;)
`[&lt;-`(dates1, i, value = dates1[i] + 1) }

Data

dates1 &lt;- as.Date(c(&quot;1945-04-09 UTC&quot;, &quot;1957-06-10 UTC&quot;, &quot;1924-02-28 UTC&quot;,
                    &quot;1921-06-22 UTC&quot;,&quot;1926-05-16 UTC&quot;, &quot;1920-02-29 UTC&quot;))
dates0 &lt;- as.POSIXlt(dates1)

Benchmark

incrementDay &lt;- \(x) {   #@user2974951
  tmp &lt;- month(x)==2 &amp; day(x)==29
  x[tmp] &lt;- x[tmp]+days(1)
  x }

idLub &lt;- \(dates1) {
    i &lt;- which(month(dates1)==2 &amp; day(dates1)==29)
    `[&lt;-`(dates1, i, value = dates1[i] + 1) }

idBase &lt;- \(dates1) {
    i &lt;- which(format(dates1, &quot;%m%d&quot;) == &quot;0229&quot;)
    `[&lt;-`(dates1, i, value = dates1[i] + 1) }

bench::mark(incrementDay =  incrementDay(dates1),
            idLub =  idLub(dates1),
            idBase =  idBase(dates1) )

Result

 expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  &lt;bch:expr&gt; &lt;bch:t&gt; &lt;bch:t&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;   &lt;bch:tm&gt;
1 increment… 441.3&#181;s 454.7&#181;s     2161.      328B     14.4  1048     7      485ms
2 idLub       46.2&#181;s  48.8&#181;s    20060.    21.8KB     21.0  9531    10      475ms
3 idBase      28.8&#181;s  30.9&#181;s    31793.        0B     22.3  9993     7      314ms

Using Date instead of POSIXlt decreases the time in this case by a factor of 10.

huangapple
  • 本文由 发表于 2023年6月22日 17:13:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76530297.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定