英文:
Fastest way to convert 29th February into 1st March in R?
问题
我有一个在R中的日期向量("YYYY-MM-DD"),其中包含一些2月29日。为了简化我的分析,我想将这些日期转换为3月1日。我正在使用一个for循环来做这个,但我想知道是否有一种更快的方法,因为我将把它应用到成千上万个日期。
这是一个可重现的示例:
library(lubridate)
dates0 <- as.POSIXlt(as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
"1921-06-22 UTC","1926-05-16 UTC", "1920-02-29 UTC")))
dates0[6]
for(i in 1:6){
if(day(dates0[i])==29 & month(dates0[i])==2) day(dates0[i]) <- 1; month(dates0[i]) <- 3
}
dates0[6]
希望这对你有帮助。
英文:
I have a vector of dates ("YYYY-MM-DD") in R, which contain some 29th February. For simplicity of my analysis, I want to convert these dates into 1st March. I am doing this using a for loop, but I wondered if there is a faster way as I will apply it to tens of thousands of dates.
Here is a reproducible example:
library(lubridate)
dates0 <- as.POSIXlt(as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
"1921-06-22 UTC","1926-05-16 UTC", "1920-02-29 UTC")))
dates0[6]
for(i in 1:6){
if(day(dates0[i])==29 & month(dates0[i])==2) day(dates0[i]) <- 1; month(dates0[i]) <- 3
}
dates0[6]
答案1
得分: 3
尝试
tmp = 月份(dates0) == 2 & 天数(dates0) == 29
dates0[tmp] = dates0[tmp] + 天数(1)
英文:
Try
tmp=month(dates0)==2 & day(dates0)==29
dates0[tmp]=dates0[tmp]+days(1)
答案2
得分: 3
以下是您要翻译的内容:
A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:
tochange < - day(dates0) == 29 & month(dates0) == 2
month(dates0[tochange]) < - 3
day(dates0[tochange]) < - 1
In terms of fastest:
dates1 < - rep(dates0, times = 1000)
OP < - \(x) {
for(i in 1:length(x)){
if(day(x[i])==29 & month(x[i])==2) day(x[i]) < - 1; month(x[i]) < - 3
}
}
setBoth < - \(x) {
tochange < - day(x) == 29 & month(x) == 2
month(x[tochange]) < - 3
day(x[tochange]) < - 1
}
incrementDay < - \(x) {
tmp=month(x)==2 & day(x)==29
x[tmp]=x[tmp]+days(1)
}
microbenchmark(
OP = OP(dates1),
setBoth = setBoth(dates1),
incrementDay = incrementDay(dates1)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 1.1611 1.38955 1.839490 1.67015 1.8911 8.3134 100 a
setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885 100 a
incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587 7.8440 100 a
There doesn't seem to be a lot in it, my way likely slowest.
请注意,代码部分没有被翻译,只翻译了文本内容。
英文:
A more onerous method than user2974951 offered, but more easily generalised if the date change is more than a single day step:
tochange <- day(dates0) == 29 & month(dates0) == 2
month(dates0[tochange]) <- 3
day(dates0[tochange]) <- 1
In terms of fastest:
dates1 <- rep(dates0, times = 1000)
OP <- \(x) {
for(i in 1:length(x)){
if(day(x[i])==29 & month(x[i])==2) day(x[i]) <- 1; month(x[i]) <- 3
}
}
setBoth <- \(x) {
tochange <- day(x) == 29 & month(x) == 2
month(x[tochange]) <- 3
day(x[tochange]) <- 1
}
incrementDay <- \(x) {
tmp=month(x)==2 & day(x)==29
x[tmp]=x[tmp]+days(1)
}
microbenchmark(
OP = OP(dates1),
setBoth = setBoth(dates1),
incrementDay = incrementDay(dates1)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 1.1611 1.38955 1.839490 1.67015 1.8911 8.3134 100 a
setBoth 1.2863 1.44650 3.456468 1.63335 1.7768 158.1885 100 a
incrementDay 1.1924 1.41305 1.765843 1.53635 1.7587 7.8440 100 a
There doesn't seem to be a lot in it, my way likely slowest.
答案3
得分: 2
如果不需要使用POSIXlt
,可以在使用Date
时提高速度,当找到2月29日时,只需添加1,这在base中可以正常工作。
i <- which(format(dates1, "%m%d") == "0229")
`[<-`(dates1, i, value = dates1[i] + 1)
数据
dates1 <- as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
"1921-06-22 UTC", "1926-05-16 UTC", "1920-02-29 UTC"))
dates0 <- as.POSIXlt(dates1)
基准测试
incrementDay <- function(x) { #@user2974951
tmp <- month(x) == 2 & day(x) == 29
x[tmp] <- x[tmp] + days(1)
x
}
idLub <- function(dates1) {
i <- which(month(dates1) == 2 & day(dates1) == 29)
`[<-`(dates1, i, value = dates1[i] + 1)
}
idBase <- function(dates1) {
i <- which(format(dates1, "%m%d") == "0229")
`[<-`(dates1, i, value = dates1[i] + 1)
}
bench::mark(incrementDay = incrementDay(dates1),
idLub = idLub(dates1),
idBase = idBase(dates1))
结果
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 increment… 441.3µs 454.7µs 2161. 328B 14.4 1048 7 485ms
2 idLub 46.2µs 48.8µs 20060. 21.8KB 21.0 9531 10 475ms
3 idBase 28.8µs 30.9µs 31793. 0B 22.3 9993 7 314ms
在这种情况下,使用Date
而不是POSIXlt
可以将时间减少10倍。
英文:
In case there is no need to use POSIXlt
it would gain much speed when using Date
, where, when found the 29 Feb., simply 1 needs to be added and this works in base.
i <- which(format(dates1, "%m%d") == "0229")
`[<-`(dates1, i, value = dates1[i] + 1) }
Data
dates1 <- as.Date(c("1945-04-09 UTC", "1957-06-10 UTC", "1924-02-28 UTC",
"1921-06-22 UTC","1926-05-16 UTC", "1920-02-29 UTC"))
dates0 <- as.POSIXlt(dates1)
Benchmark
incrementDay <- \(x) { #@user2974951
tmp <- month(x)==2 & day(x)==29
x[tmp] <- x[tmp]+days(1)
x }
idLub <- \(dates1) {
i <- which(month(dates1)==2 & day(dates1)==29)
`[<-`(dates1, i, value = dates1[i] + 1) }
idBase <- \(dates1) {
i <- which(format(dates1, "%m%d") == "0229")
`[<-`(dates1, i, value = dates1[i] + 1) }
bench::mark(incrementDay = incrementDay(dates1),
idLub = idLub(dates1),
idBase = idBase(dates1) )
Result
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 increment… 441.3µs 454.7µs 2161. 328B 14.4 1048 7 485ms
2 idLub 46.2µs 48.8µs 20060. 21.8KB 21.0 9531 10 475ms
3 idBase 28.8µs 30.9µs 31793. 0B 22.3 9993 7 314ms
Using Date
instead of POSIXlt
decreases the time in this case by a factor of 10.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论