英文:
When filtering with dates "%in%" and "==" are behaving differently
问题
以下是翻译好的部分:
这段代码尝试根据日期列中的值来筛选数据框,这些值在一个选择性的日期向量中。当我使用"%in%"来筛选时,没有返回任何行,但是当我使用"=="筛选给定元素时,确实返回了行。
以下是样本数据:
library(lubridate)
library(dplyr)
samp_df = data.frame(epoch = c(1671916189, 1652728555, 1657658906, 1662514742, 1670702851)) %>%
mutate(dt = as.POSIXct(epoch, origin = '1970-01-01', tz = 'America/Phoenix')) %>%
mutate(month = floor_date(dt, 'month'))
samp_df
epoch dt month
# 1 1671916189 2022-12-24 14:09:49 2022-12-01
# 2 1652728555 2022-05-16 12:15:55 2022-05-01
# 3 1657658906 2022-07-12 13:48:26 2022-07-01
# 4 1662514742 2022-09-06 18:39:02 2022-09-01
# 5 1670702851 2022-12-10 13:07:31 2022-12-01
months_to_check = c(as.Date('2022-12-01', tz = 'America/Phoenix'),
as.Date('2022-05-01', tz = 'America/Phoenix'))
months_to_check
# [1] "2022-12-01" "2022-05-01"
以下代码不返回任何行:
samp_df %>%
filter(month %in% months_to_check)
然而,以下代码确实返回行:
samp_df %>%
filter(month == months_to_check[1])
为什么会有这种差异呢?
英文:
I am trying to filter a dataframe based on a date column having values in a select vector of dates. When I filter using "%in%", I get no rows returned, however when I filter with "==" on a given element I do get rows returned.
Here is a sample data:
library(lubridate)
library(dplyr)
samp_df = data.frame(epoch = c(1671916189, 1652728555, 1657658906, 1662514742, 1670702851)) %>%
mutate(dt = as.POSIXct(epoch, origin = '1970-01-01', tz = 'America/Phoenix')) %>%
mutate(month = floor_date(dt, 'month'))
samp_df
epoch dt month
# 1 1671916189 2022-12-24 14:09:49 2022-12-01
# 2 1652728555 2022-05-16 12:15:55 2022-05-01
# 3 1657658906 2022-07-12 13:48:26 2022-07-01
# 4 1662514742 2022-09-06 18:39:02 2022-09-01
# 5 1670702851 2022-12-10 13:07:31 2022-12-01
months_to_check = c(as.Date('2022-12-01', tz = 'America/Phoenix'),
as.Date('2022-05-01', tz = 'America/Phoenix'))
months_to_check
# [1] "2022-12-01" "2022-05-01"
The following code returns no rows:
samp_df %>%
filter(month %in% months_to_check)
However, the following code does return rows:
samp_df %>%
filter(month == months_to_check[1])
Why is there this discrepancy?
答案1
得分: 4
The "month" variable is a "POSIXct" (you created it based on dt
), while months_to_check
is a "Date". Even if they print similar, they are not the same.
But, the "lubridate" package adds a special method to ==
that has a workaround to that. We can pull the code of this method with:
sloop::s3_get_method('==.POSIXt')
The code for the method is at the end of this answer, but see that we convert one date to the same format as the other before doing the logical comparison, which is why you get matches with ==
.
But %in%
doesn't have that; it doesn't convert the formats, thus not returning any matches.
The solution is quite simple, just keep a consistent date formatting across your variables, which you should always do to avoid this type of error.
Code for ==.POSIXt
:
function (e1, e2)
{
if (nargs() == 1) {
stop(gettextf("unary '%s' not defined for \"%s\" objects",
.Generic, class(e1)), domain = NA)
}
if (inherits(e1, "POSIXlt")) {
e1 <- as.POSIXct(e1)
}
if (inherits(e2, "POSIXlt")) {
e2 <- as.POSIXct(e2)
}
if (is.character(e1)) { #here
e1 <- if (is.Date(e2)) {
as.Date(e1)
}
else {
as.POSIXct(e1)
}
}
if (is.character(e2)) { #here
e2 <- if (is.Date(e1)) {
as.Date(e2)
}
else {
as.POSIXct(e2)
}
}
if (is.POSIXct(e1) && is.Date(e2)) {
e2 <- date_to_posix(e2, tz(e1))
base::check_tzones(e1, e2)
}
else if (is.Date(e1) && is.POSIXct(e2)) {
e1 <- date_to_posix(e1, tz = tz(e2))
base::check_tzones(e1, e2)
}
NextMethod(.Generic)
}
英文:
The "month" variable is a "POSIXct" (you created it based on dt
), while months_to_check
is a "Date". Even if they print similar, they are not the same.
But, the "lubridate" package adds a special method to ==
that has a workaround to that. We can pull the code of this method with:
sloop::s3_get_method('==.POSIXt')
The code for the method is at the end of this answer, but see that we convert one date to the same format as the other, before doing the logical comparison, that is why you get matches with ==
.
But %in%
doesn't have that, it doesn't convert the formats, thus not returning any matches.
The solution is quite simple, just keep a consistent date formatting across your variables, which you should always do, to avoid this type of error.
Code for `==.POSIXt`
:
function (e1, e2)
{
if (nargs() == 1) {
stop(gettextf("unary '%s' not defined for \"%s\" objects",
.Generic, class(e1)), domain = NA)
}
if (inherits(e1, "POSIXlt")) {
e1 <- as.POSIXct(e1)
}
if (inherits(e2, "POSIXlt")) {
e2 <- as.POSIXct(e2)
}
if (is.character(e1)) { #here
e1 <- if (is.Date(e2)) {
as.Date(e1)
}
else {
as.POSIXct(e1)
}
}
if (is.character(e2)) { #here
e2 <- if (is.Date(e1)) {
as.Date(e2)
}
else {
as.POSIXct(e2)
}
}
if (is.POSIXct(e1) && is.Date(e2)) {
e2 <- date_to_posix(e2, tz(e1))
base::check_tzones(e1, e2)
}
else if (is.Date(e1) && is.POSIXct(e2)) {
e1 <- date_to_posix(e1, tz = tz(e2))
base::check_tzones(e1, e2)
}
NextMethod(.Generic)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论