英文:
How to mutate a new column in dataframe ifelse date is between 2 dates in another column?
问题
我不确定为什么无法通过这种方式创建新列。有什么想法吗?
英文:
So I have a dataframe with some dates, and I have another dataframe with some dates of weather events.
I want to mutate a column onto original dataframe, if any of the dates are between the weather event dates in the second dataframe, paste the name of the Event, otherwise output a 'N'.
However I'm getting the error:
Error in `mutate()`:
Problem while computing `Event = ifelse(...)`.
x `Event` must be size 1, not 17.
i Did you mean: `Event = list(ifelse(...))` ?
i The error occurred in row 1.
Run `rlang::last_error()` to see where the error occurred.
My Code:
data.frame(Events)
Name StartDate EndDate Flood_Storm_EventFlag EscapeofWater_EventFlag
1 2007-6 2007-06-13 2007-07-02 Y N
2 2007-7 2007-07-20 2007-07-27 Y N
3 February 2014 Storms 2014-02-03 2014-02-17 Y N
4 Desmond 2015-12-04 2015-12-07 Y N
5 Eva 2015-12-25 2015-12-28 Y N
6 Frank 2015-12-29 2016-01-08 Y N
7 Ciara 2020-02-08 2020-02-12 Y N
8 Dennis 2020-02-15 2020-02-28 Y N
9 2021009A (July Floods 1) 2021-07-12 2021-07-19 Y N
10 (July Floods 2) 2021-07-24 2021-08-02 Y N
11 Dudley, Eunice and Franklin 2022-02-16 2022-02-28 Y N
12 2009-1 2009-01-02 2009-01-11 N Y
13 2009-12 2009-12-27 2010-01-19 N Y
14 2010-11 2010-11-28 2010-12-12 N Y
15 2010-12 2010-12-19 2011-01-08 N Y
16 2018-2 2018-02-28 2018-03-08 N Y
17 2022-12 2022-12-08 2022-12-18 N Y
dates <- c("2007-06-13", "2007-05-13", "2007-06-14", "2009-01-04", "2009-01-15")
AccidentDate <- as.Date(dates)
PolicyType <- c("Home", "Home", "Motor", "Home", "Home")
df <- data.frame(AccidentDate, PolicyType)
x <- df %>%
mutate(Event = ifelse(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate & PolicyType == "Home", Events$Name, 'N'))
I'm not exactly sure why cannot I create a new column this way. Any ideas?
答案1
得分: 1
您的问题是您在两种情况下都获得了多个匹配项。您可以:
- 使用
any()
来获得单个逻辑值,而不是在Events
中的每一行中获得一个值。 - 使用
.by
(或dplyr
中的group_by
< 1.1.0)在提取名称时一次传递一个AccidentDate
。
library(dplyr)
# dplyr >= 1.1.0
df %>%
mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
"N"),
.by = "AccidentDate")
# dplyr < 1.1.0 (更新于2022年3月3日)
df %>%
group_by(AccidentDate) %>%
mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
"N")) %>%
ungroup()
我假设只会有一个匹配项。您可能需要检查这一点。
输出:
AccidentDate PolicyType Event
1 2007-06-13 Home 2007-6
2 2007-05-13 Home N
3 2007-06-14 Motor N
4 2009-01-04 Home N
5 2009-01-15 Home N
更新: 您还可以考虑使用连接:
library(dplyr)
df %>%
left_join(Events %>% select(Event = Name, StartDate, EndDate),
join_by(between(AccidentDate, StartDate, EndDate))) %>%
select(AccidentDate, PolicyType, Event) %>%
mutate(Event = ifelse(PolicyType == "Home", Event, NA))
要使用between
,而不是join_by(AccidentDate >= StartDate, AccidentDate <= EndDate)
,所有都需要是date
类型:
Events <- mutate(Events, across(StartDate:EndDate, as.Date))
英文:
You problem is that you're getting multiple matches both ways. You could:
- Use
any()
to get a single logical value instead of one per row inEvents
. - Use
.by
(orgroup_by
indplyr
< 1.1.0) to pass only oneAccidentDate
at a time when you extract the name.
library(dplyr)
# dplyr >= 1.1.0
df |>
mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
"N"),
.by = "AccidentDate")
# dplyr < 1.1.0 (Update 3/3-2022)
df |>
group_by(AccidentDate) |>
mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
"N")) |>
ungroup()
I've assumed that there can only be a single match. You might want to check this.
Output:
AccidentDate PolicyType Event
1 2007-06-13 Home 2007-6
2 2007-05-13 Home N
3 2007-06-14 Motor N
4 2009-01-04 Home N
5 2009-01-15 Home N
Update: You could also consider a join:
library(dplyr)
df |>
left_join(Events |> select(Event = Name, StartDate, EndDate),
join_by(between(AccidentDate, StartDate, EndDate))) |>
select(AccidentDate, PolicyType, Event) |>
mutate(Event = ifelse(PolicyType == "Home", Event, NA))
To use between
, instead of join_by(AccidentDate >= StartDate, AccidentDate <= EndDate
) it all need to be date
-types:
Events <- mutate(Events, across(StartDate:EndDate, as.Date))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论