在数据框中,如果日期在另一列中的两个日期之间,如何变异一个新列?

huangapple go评论115阅读模式
英文:

How to mutate a new column in dataframe ifelse date is between 2 dates in another column?

问题

我不确定为什么无法通过这种方式创建新列。有什么想法吗?

英文:

So I have a dataframe with some dates, and I have another dataframe with some dates of weather events.
I want to mutate a column onto original dataframe, if any of the dates are between the weather event dates in the second dataframe, paste the name of the Event, otherwise output a 'N'.

However I'm getting the error:

Error in `mutate()`:
Problem while computing `Event = ifelse(...)`.
x `Event` must be size 1, not 17.
i Did you mean: `Event = list(ifelse(...))` ?
i The error occurred in row 1.
Run `rlang::last_error()` to see where the error occurred.

My Code:

    data.frame(Events)
                          Name  StartDate    EndDate Flood_Storm_EventFlag EscapeofWater_EventFlag
1                       2007-6 2007-06-13 2007-07-02                     Y                       N
2                       2007-7 2007-07-20 2007-07-27                     Y                       N
3         February 2014 Storms 2014-02-03 2014-02-17                     Y                       N
4                      Desmond 2015-12-04 2015-12-07                     Y                       N
5                          Eva 2015-12-25 2015-12-28                     Y                       N
6                        Frank 2015-12-29 2016-01-08                     Y                       N
7                        Ciara 2020-02-08 2020-02-12                     Y                       N
8                       Dennis 2020-02-15 2020-02-28                     Y                       N
9     2021009A (July Floods 1) 2021-07-12 2021-07-19                     Y                       N
10             (July Floods 2) 2021-07-24 2021-08-02                     Y                       N
11 Dudley, Eunice and Franklin 2022-02-16 2022-02-28                     Y                       N
12                      2009-1 2009-01-02 2009-01-11                     N                       Y
13                     2009-12 2009-12-27 2010-01-19                     N                       Y
14                     2010-11 2010-11-28 2010-12-12                     N                       Y
15                     2010-12 2010-12-19 2011-01-08                     N                       Y
16                      2018-2 2018-02-28 2018-03-08                     N                       Y
17                     2022-12 2022-12-08 2022-12-18                     N                       Y


dates <- c("2007-06-13", "2007-05-13", "2007-06-14", "2009-01-04", "2009-01-15")
AccidentDate <- as.Date(dates)
PolicyType <- c("Home", "Home", "Motor", "Home", "Home")

df <- data.frame(AccidentDate, PolicyType)

x <- df %>%
mutate(Event = ifelse(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate & PolicyType == "Home", Events$Name, 'N'))

I'm not exactly sure why cannot I create a new column this way. Any ideas?

答案1

得分: 1

您的问题是您在两种情况下都获得了多个匹配项。您可以:

  • 使用any()来获得单个逻辑值,而不是在Events中的每一行中获得一个值。
  • 使用.by(或dplyr中的group_by < 1.1.0)在提取名称时一次传递一个AccidentDate
library(dplyr)

# dplyr >= 1.1.0

df %>%
  mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
                        Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
                        "N"), 
         .by = "AccidentDate")

# dplyr < 1.1.0 (更新于2022年3月3日)

df %>%
  group_by(AccidentDate) %>%
  mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
                        Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
                        "N")) %>%
  ungroup()

我假设只会有一个匹配项。您可能需要检查这一点。

输出:

  AccidentDate PolicyType  Event
1   2007-06-13       Home 2007-6
2   2007-05-13       Home      N
3   2007-06-14      Motor      N
4   2009-01-04       Home      N
5   2009-01-15       Home      N

更新: 您还可以考虑使用连接:

library(dplyr)

df %>%
  left_join(Events %>% select(Event = Name, StartDate, EndDate),
            join_by(between(AccidentDate, StartDate, EndDate))) %>%
  select(AccidentDate, PolicyType, Event) %>%
  mutate(Event = ifelse(PolicyType == "Home", Event, NA))

要使用between,而不是join_by(AccidentDate >= StartDate, AccidentDate <= EndDate),所有都需要是date类型:

Events <- mutate(Events, across(StartDate:EndDate, as.Date))
英文:

You problem is that you're getting multiple matches both ways. You could:

  • Use any() to get a single logical value instead of one per row in Events.
  • Use .by (or group_by in dplyr < 1.1.0) to pass only one AccidentDate at a time when you extract the name.
library(dplyr)

# dplyr &gt;= 1.1.0

df |&gt;
  mutate(Event = ifelse(any(AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate) &amp; PolicyType == &quot;Home&quot;,
                        Events$Name[AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate],
                        &quot;N&quot;), 
         .by = &quot;AccidentDate&quot;)

# dplyr &lt; 1.1.0 (Update 3/3-2022)

df |&gt;
  group_by(AccidentDate) |&gt;
  mutate(Event = ifelse(any(AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate) &amp; PolicyType == &quot;Home&quot;,
                        Events$Name[AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate],
                        &quot;N&quot;)) |&gt;
  ungroup()

I've assumed that there can only be a single match. You might want to check this.

Output:

  AccidentDate PolicyType  Event
1   2007-06-13       Home 2007-6
2   2007-05-13       Home      N
3   2007-06-14      Motor      N
4   2009-01-04       Home      N
5   2009-01-15       Home      N

Update: You could also consider a join:

library(dplyr)

df |&gt;
  left_join(Events |&gt; select(Event = Name, StartDate, EndDate),
            join_by(between(AccidentDate, StartDate, EndDate))) |&gt;
  select(AccidentDate, PolicyType, Event) |&gt;
  mutate(Event = ifelse(PolicyType == &quot;Home&quot;, Event, NA))

To use between, instead of join_by(AccidentDate &gt;= StartDate, AccidentDate &lt;= EndDate) it all need to be date-types:

Events &lt;- mutate(Events, across(StartDate:EndDate, as.Date))

huangapple
  • 本文由 发表于 2023年3月3日 20:08:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626889.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定