dplyr中的if else没有else/在一个块中进行条件mutate

huangapple go评论65阅读模式
英文:

dplyr if else without the else/ conditional mutate in one chunk

问题

I am trying to add a column to my dataframe based on if a string is detected in another column. I have done this in two chunks of code and then merged them together, but I am trying to streamline my code so that there is less to type out in the future. I also noticed I performed a join incorrectly on a dataset I've been working with for months, so the fewer joins, the better.

Here is what currently works for me, but feels unnecessarily long.

dtc_final2022 <- dtc_final1 %>%
  filter(str_detect(detection_timestamp_utc, "2022")) %>%
  mutate(Year = "2022")

dtc_final2021 <- dtc_final1 %>%
  filter(str_detect(detection_timestamp_utc, "2021")) %>%
  mutate(Year = "2021")

dtc_final2 <- full_join(dtc_final2021, dtc_final2022)

dtc_final1 is a dataset with timestamps from many years. I am only interested in adding a "year" to timestamps that contain 2021 and 2022. In the future, I will add 2023 and 2024.

This is what I would like to do, but in doing so, I replace the previous year with NA. Is there a way to run an ifelse function without the 'else'? Also, please remember that I can't use the other year as the 'else' since in the future, I will have 4 years to deal with, and not just 2.

dtc_final2 <- dtc_final1 %>%
  mutate(Year = ifelse(str_detect(detection_timestamp_utc, "2021"), "2021", NA),
         Year = ifelse(str_detect(detection_timestamp_utc, "2022"), "2022", NA))

I try to do everything in dplyr, but if a for loop does the trick, then I guess I'll buck up.

Thanks in advance!

英文:

I am trying to add a column to my dataframe based on if a string is detected in another column. I have done this in two chunks of code and then merged them together, but I am trying to streamline my code so that there is less to type out in the future. I also noticed I performed a join incorrectly on a dataset I've been working with for months, so the fewers joins, the better.

Here is what currently works for me, but feels unnecessarily long.

dtc_final2022&lt;- dtc_final1 %&gt;% 
  filter (str_detect(detection_timestamp_utc, &quot;2022&quot;)) %&gt;%
  mutate(Year = &quot;2022&quot;) 

dtc_final2021 &lt;-  dtc_final1 %&gt;% 
  filter (str_detect(detection_timestamp_utc, &quot;2021&quot;)) %&gt;%
  mutate(Year = &quot;2021&quot;)

dtc_final2 &lt;- full_join(dtc_final2021, dtc_final2022)

dtc_final1 is a dataset with timestamps from many years. I am only interested in adding a "year" to timestamps that contain 2021 and 2022. In the future, I will add 2023 and 2024.

This is what I would like to do, but in doing so, I replace the previous year with NA. Is there a way to run an ifelse function without the 'else'? Also, please remember that I cant use the other year as the 'else' since in the future, I will have 4 years to deal with, and not just 2.

dtc_final2 &lt;- dtc_final1 %&gt;%
  mutate(Year = ifelse(str_detect(detection_timestamp_utc, &quot;2021&quot;), &quot;2021&quot;, NA),
         Year = ifelse(str_detect(detection_timestamp_utc, &quot;2022&quot;), &quot;2022&quot;, NA))

I try to do everyling in dplyr but if a for loop does the trick, then I guess I'll buck up.

Thanks in advance!

答案1

得分: 1

For multiple sequential/nested ifelses, we can use case_when.

dtc_final2 <- dtc_final1 %>%
    mutate(Year = case_when(str_detect(detection_timestamp_utc, "2021") ~ 2021,
                            str_detect(detection_timestamp_utc, "2022") ~ 2022,
                            TRUE ~ NA)

There are likely better options for this specific case, hard to tell without seeing the data.
if detection_timestamp_utc is a proper date object, we likely have a better way with the lubridate package.

dtc_final2 <- dtc_final1 %>%
    mutate(Year = lubridate::year(as.Date(detection_timestamp_utc)))
英文:

For multiple sequential/nested ifelses, we can use case_when.

dtc_final2 &lt;- dtc_final1 %&gt;% 
    mutate(Year = case_when(str_detect(detection_timestamp_utc, &quot;2021&quot;) ~ 2021,
                            str_detect(detection_timestamp_utc, &quot;2022&quot;) ~ 2022,
                            TRUE ~ NA)

There are likely better options for this specific case, hard to tell without seeing the data.
if detection_timestamp_utc is a proper date object, we likely hava a better way with the lubridate package.

dtc_final2 &lt;- dtc_final1 %&gt;% 
    mutate(Year = lubidate::year(as.Date(detection_timestamp_utc)))

</details>



# 答案2
**得分**: 1

使用`str_extract()`而不是`str_detect()`,并使用一个正则表达式来捕获感兴趣的两个年份:

```R
mutate(dtc_final1, Year=str_extract(detection_timestamp_utc, "^202[12]"))
英文:

You may use str_extract() rather than str_detect() here, and use a regular expression that captures both of the years of interest:

mutate(dtc_final1, Year=str_extract(detection_timestamp_utc, &quot;^202[12]&quot;))

huangapple
  • 本文由 发表于 2023年3月21日 02:06:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定