在使用日期进行筛选时,“%in%”和“==”的行为是不同的。

huangapple go评论72阅读模式
英文:

When filtering with dates "%in%" and "==" are behaving differently

问题

以下是翻译好的部分:

这段代码尝试根据日期列中的值来筛选数据框,这些值在一个选择性的日期向量中。当我使用"%in%"来筛选时,没有返回任何行,但是当我使用"=="筛选给定元素时,确实返回了行。

以下是样本数据:

library(lubridate)
library(dplyr)

samp_df = data.frame(epoch = c(1671916189, 1652728555, 1657658906, 1662514742, 1670702851)) %>%
    mutate(dt = as.POSIXct(epoch, origin = '1970-01-01', tz = 'America/Phoenix')) %>%
    mutate(month = floor_date(dt, 'month'))
samp_df

       epoch                  dt      month
# 1 1671916189 2022-12-24 14:09:49 2022-12-01
# 2 1652728555 2022-05-16 12:15:55 2022-05-01
# 3 1657658906 2022-07-12 13:48:26 2022-07-01
# 4 1662514742 2022-09-06 18:39:02 2022-09-01
# 5 1670702851 2022-12-10 13:07:31 2022-12-01

months_to_check = c(as.Date('2022-12-01', tz = 'America/Phoenix'),
                    as.Date('2022-05-01', tz = 'America/Phoenix'))

months_to_check

# [1] "2022-12-01" "2022-05-01"

以下代码不返回任何行:

samp_df %>%
    filter(month %in% months_to_check)

然而,以下代码确实返回行:

samp_df %>%
    filter(month == months_to_check[1])

为什么会有这种差异呢?

英文:

I am trying to filter a dataframe based on a date column having values in a select vector of dates. When I filter using "%in%", I get no rows returned, however when I filter with "==" on a given element I do get rows returned.

Here is a sample data:

library(lubridate)
library(dplyr)

samp_df = data.frame(epoch = c(1671916189, 1652728555, 1657658906, 1662514742, 1670702851)) %>%
	mutate(dt = as.POSIXct(epoch, origin = '1970-01-01', tz = 'America/Phoenix')) %>%
	mutate(month = floor_date(dt, 'month'))
samp_df

       epoch                  dt      month
# 1 1671916189 2022-12-24 14:09:49 2022-12-01
# 2 1652728555 2022-05-16 12:15:55 2022-05-01
# 3 1657658906 2022-07-12 13:48:26 2022-07-01
# 4 1662514742 2022-09-06 18:39:02 2022-09-01
# 5 1670702851 2022-12-10 13:07:31 2022-12-01

months_to_check = c(as.Date('2022-12-01', tz = 'America/Phoenix'),
	            as.Date('2022-05-01', tz = 'America/Phoenix'))

months_to_check

# [1] "2022-12-01" "2022-05-01"

The following code returns no rows:

samp_df %>%
	filter(month %in% months_to_check)

However, the following code does return rows:

samp_df %>%
	filter(month == months_to_check[1])

Why is there this discrepancy?

答案1

得分: 4

The "month" variable is a "POSIXct" (you created it based on dt), while months_to_check is a "Date". Even if they print similar, they are not the same.

But, the "lubridate" package adds a special method to == that has a workaround to that. We can pull the code of this method with:

sloop::s3_get_method('==.POSIXt')

The code for the method is at the end of this answer, but see that we convert one date to the same format as the other before doing the logical comparison, which is why you get matches with ==.

But %in% doesn't have that; it doesn't convert the formats, thus not returning any matches.

The solution is quite simple, just keep a consistent date formatting across your variables, which you should always do to avoid this type of error.

Code for ==.POSIXt:

function (e1, e2) 
{
    if (nargs() == 1) {
        stop(gettextf("unary '%s' not defined for \"%s\" objects", 
            .Generic, class(e1)), domain = NA)
    }
    if (inherits(e1, "POSIXlt")) {
        e1 <- as.POSIXct(e1)
    }
    if (inherits(e2, "POSIXlt")) {
        e2 <- as.POSIXct(e2)
    }
    if (is.character(e1)) { #here
        e1 <- if (is.Date(e2)) {
            as.Date(e1)
        }
        else {
            as.POSIXct(e1)
        }
    }
    if (is.character(e2)) { #here
        e2 <- if (is.Date(e1)) {
            as.Date(e2)
        }
        else {
            as.POSIXct(e2)
        }
    }
    if (is.POSIXct(e1) && is.Date(e2)) {
        e2 <- date_to_posix(e2, tz(e1))
        base::check_tzones(e1, e2)
    }
    else if (is.Date(e1) && is.POSIXct(e2)) {
        e1 <- date_to_posix(e1, tz = tz(e2))
        base::check_tzones(e1, e2)
    }
    NextMethod(.Generic)
}
英文:

The "month" variable is a "POSIXct" (you created it based on dt), while months_to_check is a "Date". Even if they print similar, they are not the same.

But, the "lubridate" package adds a special method to == that has a workaround to that. We can pull the code of this method with:

sloop::s3_get_method(&#39;==.POSIXt&#39;)

The code for the method is at the end of this answer, but see that we convert one date to the same format as the other, before doing the logical comparison, that is why you get matches with ==.

But %in% doesn't have that, it doesn't convert the formats, thus not returning any matches.

The solution is quite simple, just keep a consistent date formatting across your variables, which you should always do, to avoid this type of error.

Code for `==.POSIXt`:

function (e1, e2) 
{
    if (nargs() == 1) {
        stop(gettextf(&quot;unary &#39;%s&#39; not defined for \&quot;%s\&quot; objects&quot;, 
            .Generic, class(e1)), domain = NA)
    }
    if (inherits(e1, &quot;POSIXlt&quot;)) {
        e1 &lt;- as.POSIXct(e1)
    }
    if (inherits(e2, &quot;POSIXlt&quot;)) {
        e2 &lt;- as.POSIXct(e2)
    }
    if (is.character(e1)) { #here
        e1 &lt;- if (is.Date(e2)) {
            as.Date(e1)
        }
        else {
            as.POSIXct(e1)
        }
    }
    if (is.character(e2)) { #here
        e2 &lt;- if (is.Date(e1)) {
            as.Date(e2)
        }
        else {
            as.POSIXct(e2)
        }
    }
    if (is.POSIXct(e1) &amp;&amp; is.Date(e2)) {
        e2 &lt;- date_to_posix(e2, tz(e1))
        base::check_tzones(e1, e2)
    }
    else if (is.Date(e1) &amp;&amp; is.POSIXct(e2)) {
        e1 &lt;- date_to_posix(e1, tz = tz(e2))
        base::check_tzones(e1, e2)
    }
    NextMethod(.Generic)
}

huangapple
  • 本文由 发表于 2023年5月18日 00:30:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76274296.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定