在数据框中,如果日期在另一列中的两个日期之间,如何变异一个新列?

huangapple go评论136阅读模式
英文:

How to mutate a new column in dataframe ifelse date is between 2 dates in another column?

问题

我不确定为什么无法通过这种方式创建新列。有什么想法吗?

英文:

So I have a dataframe with some dates, and I have another dataframe with some dates of weather events.
I want to mutate a column onto original dataframe, if any of the dates are between the weather event dates in the second dataframe, paste the name of the Event, otherwise output a 'N'.

However I'm getting the error:

  1. Error in `mutate()`:
  2. Problem while computing `Event = ifelse(...)`.
  3. x `Event` must be size 1, not 17.
  4. i Did you mean: `Event = list(ifelse(...))` ?
  5. i The error occurred in row 1.
  6. Run `rlang::last_error()` to see where the error occurred.

My Code:

  1. data.frame(Events)
  2. Name StartDate EndDate Flood_Storm_EventFlag EscapeofWater_EventFlag
  3. 1 2007-6 2007-06-13 2007-07-02 Y N
  4. 2 2007-7 2007-07-20 2007-07-27 Y N
  5. 3 February 2014 Storms 2014-02-03 2014-02-17 Y N
  6. 4 Desmond 2015-12-04 2015-12-07 Y N
  7. 5 Eva 2015-12-25 2015-12-28 Y N
  8. 6 Frank 2015-12-29 2016-01-08 Y N
  9. 7 Ciara 2020-02-08 2020-02-12 Y N
  10. 8 Dennis 2020-02-15 2020-02-28 Y N
  11. 9 2021009A (July Floods 1) 2021-07-12 2021-07-19 Y N
  12. 10 (July Floods 2) 2021-07-24 2021-08-02 Y N
  13. 11 Dudley, Eunice and Franklin 2022-02-16 2022-02-28 Y N
  14. 12 2009-1 2009-01-02 2009-01-11 N Y
  15. 13 2009-12 2009-12-27 2010-01-19 N Y
  16. 14 2010-11 2010-11-28 2010-12-12 N Y
  17. 15 2010-12 2010-12-19 2011-01-08 N Y
  18. 16 2018-2 2018-02-28 2018-03-08 N Y
  19. 17 2022-12 2022-12-08 2022-12-18 N Y
  20. dates <- c("2007-06-13", "2007-05-13", "2007-06-14", "2009-01-04", "2009-01-15")
  21. AccidentDate <- as.Date(dates)
  22. PolicyType <- c("Home", "Home", "Motor", "Home", "Home")
  23. df <- data.frame(AccidentDate, PolicyType)
  24. x <- df %>%
  25. mutate(Event = ifelse(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate & PolicyType == "Home", Events$Name, 'N'))

I'm not exactly sure why cannot I create a new column this way. Any ideas?

答案1

得分: 1

您的问题是您在两种情况下都获得了多个匹配项。您可以:

  • 使用any()来获得单个逻辑值,而不是在Events中的每一行中获得一个值。
  • 使用.by(或dplyr中的group_by < 1.1.0)在提取名称时一次传递一个AccidentDate
  1. library(dplyr)
  2. # dplyr >= 1.1.0
  3. df %>%
  4. mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
  5. Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
  6. "N"),
  7. .by = "AccidentDate")
  8. # dplyr < 1.1.0 (更新于2022年3月3日)
  9. df %>%
  10. group_by(AccidentDate) %>%
  11. mutate(Event = ifelse(any(AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate) & PolicyType == "Home",
  12. Events$Name[AccidentDate >= Events$StartDate & AccidentDate <= Events$EndDate],
  13. "N")) %>%
  14. ungroup()

我假设只会有一个匹配项。您可能需要检查这一点。

输出:

  1. AccidentDate PolicyType Event
  2. 1 2007-06-13 Home 2007-6
  3. 2 2007-05-13 Home N
  4. 3 2007-06-14 Motor N
  5. 4 2009-01-04 Home N
  6. 5 2009-01-15 Home N

更新: 您还可以考虑使用连接:

  1. library(dplyr)
  2. df %>%
  3. left_join(Events %>% select(Event = Name, StartDate, EndDate),
  4. join_by(between(AccidentDate, StartDate, EndDate))) %>%
  5. select(AccidentDate, PolicyType, Event) %>%
  6. mutate(Event = ifelse(PolicyType == "Home", Event, NA))

要使用between,而不是join_by(AccidentDate >= StartDate, AccidentDate <= EndDate),所有都需要是date类型:

  1. Events <- mutate(Events, across(StartDate:EndDate, as.Date))
英文:

You problem is that you're getting multiple matches both ways. You could:

  • Use any() to get a single logical value instead of one per row in Events.
  • Use .by (or group_by in dplyr < 1.1.0) to pass only one AccidentDate at a time when you extract the name.
  1. library(dplyr)
  2. # dplyr &gt;= 1.1.0
  3. df |&gt;
  4. mutate(Event = ifelse(any(AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate) &amp; PolicyType == &quot;Home&quot;,
  5. Events$Name[AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate],
  6. &quot;N&quot;),
  7. .by = &quot;AccidentDate&quot;)
  8. # dplyr &lt; 1.1.0 (Update 3/3-2022)
  9. df |&gt;
  10. group_by(AccidentDate) |&gt;
  11. mutate(Event = ifelse(any(AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate) &amp; PolicyType == &quot;Home&quot;,
  12. Events$Name[AccidentDate &gt;= Events$StartDate &amp; AccidentDate &lt;= Events$EndDate],
  13. &quot;N&quot;)) |&gt;
  14. ungroup()

I've assumed that there can only be a single match. You might want to check this.

Output:

  1. AccidentDate PolicyType Event
  2. 1 2007-06-13 Home 2007-6
  3. 2 2007-05-13 Home N
  4. 3 2007-06-14 Motor N
  5. 4 2009-01-04 Home N
  6. 5 2009-01-15 Home N

Update: You could also consider a join:

  1. library(dplyr)
  2. df |&gt;
  3. left_join(Events |&gt; select(Event = Name, StartDate, EndDate),
  4. join_by(between(AccidentDate, StartDate, EndDate))) |&gt;
  5. select(AccidentDate, PolicyType, Event) |&gt;
  6. mutate(Event = ifelse(PolicyType == &quot;Home&quot;, Event, NA))

To use between, instead of join_by(AccidentDate &gt;= StartDate, AccidentDate &lt;= EndDate) it all need to be date-types:

  1. Events &lt;- mutate(Events, across(StartDate:EndDate, as.Date))

huangapple
  • 本文由 发表于 2023年3月3日 20:08:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626889.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定