英文:
dplyr if else without the else/ conditional mutate in one chunk
问题
I am trying to add a column to my dataframe based on if a string is detected in another column. I have done this in two chunks of code and then merged them together, but I am trying to streamline my code so that there is less to type out in the future. I also noticed I performed a join incorrectly on a dataset I've been working with for months, so the fewer joins, the better.
Here is what currently works for me, but feels unnecessarily long.
dtc_final2022 <- dtc_final1 %>%
filter(str_detect(detection_timestamp_utc, "2022")) %>%
mutate(Year = "2022")
dtc_final2021 <- dtc_final1 %>%
filter(str_detect(detection_timestamp_utc, "2021")) %>%
mutate(Year = "2021")
dtc_final2 <- full_join(dtc_final2021, dtc_final2022)
dtc_final1 is a dataset with timestamps from many years. I am only interested in adding a "year" to timestamps that contain 2021 and 2022. In the future, I will add 2023 and 2024.
This is what I would like to do, but in doing so, I replace the previous year with NA. Is there a way to run an ifelse function without the 'else'? Also, please remember that I can't use the other year as the 'else' since in the future, I will have 4 years to deal with, and not just 2.
dtc_final2 <- dtc_final1 %>%
mutate(Year = ifelse(str_detect(detection_timestamp_utc, "2021"), "2021", NA),
Year = ifelse(str_detect(detection_timestamp_utc, "2022"), "2022", NA))
I try to do everything in dplyr, but if a for loop does the trick, then I guess I'll buck up.
Thanks in advance!
英文:
I am trying to add a column to my dataframe based on if a string is detected in another column. I have done this in two chunks of code and then merged them together, but I am trying to streamline my code so that there is less to type out in the future. I also noticed I performed a join incorrectly on a dataset I've been working with for months, so the fewers joins, the better.
Here is what currently works for me, but feels unnecessarily long.
dtc_final2022<- dtc_final1 %>%
filter (str_detect(detection_timestamp_utc, "2022")) %>%
mutate(Year = "2022")
dtc_final2021 <- dtc_final1 %>%
filter (str_detect(detection_timestamp_utc, "2021")) %>%
mutate(Year = "2021")
dtc_final2 <- full_join(dtc_final2021, dtc_final2022)
dtc_final1 is a dataset with timestamps from many years. I am only interested in adding a "year" to timestamps that contain 2021 and 2022. In the future, I will add 2023 and 2024.
This is what I would like to do, but in doing so, I replace the previous year with NA. Is there a way to run an ifelse function without the 'else'? Also, please remember that I cant use the other year as the 'else' since in the future, I will have 4 years to deal with, and not just 2.
dtc_final2 <- dtc_final1 %>%
mutate(Year = ifelse(str_detect(detection_timestamp_utc, "2021"), "2021", NA),
Year = ifelse(str_detect(detection_timestamp_utc, "2022"), "2022", NA))
I try to do everyling in dplyr but if a for loop does the trick, then I guess I'll buck up.
Thanks in advance!
答案1
得分: 1
For multiple sequential/nested ifelse
s, we can use case_when
.
dtc_final2 <- dtc_final1 %>%
mutate(Year = case_when(str_detect(detection_timestamp_utc, "2021") ~ 2021,
str_detect(detection_timestamp_utc, "2022") ~ 2022,
TRUE ~ NA)
There are likely better options for this specific case, hard to tell without seeing the data.
if detection_timestamp_utc
is a proper date object, we likely have a better way with the lubridate
package.
dtc_final2 <- dtc_final1 %>%
mutate(Year = lubridate::year(as.Date(detection_timestamp_utc)))
英文:
For multiple sequential/nested ifelse
s, we can use case_when
.
dtc_final2 <- dtc_final1 %>%
mutate(Year = case_when(str_detect(detection_timestamp_utc, "2021") ~ 2021,
str_detect(detection_timestamp_utc, "2022") ~ 2022,
TRUE ~ NA)
There are likely better options for this specific case, hard to tell without seeing the data.
if detection_timestamp_utc
is a proper date object, we likely hava a better way with the lubridate
package.
dtc_final2 <- dtc_final1 %>%
mutate(Year = lubidate::year(as.Date(detection_timestamp_utc)))
</details>
# 答案2
**得分**: 1
使用`str_extract()`而不是`str_detect()`,并使用一个正则表达式来捕获感兴趣的两个年份:
```R
mutate(dtc_final1, Year=str_extract(detection_timestamp_utc, "^202[12]"))
英文:
You may use str_extract()
rather than str_detect()
here, and use a regular expression that captures both of the years of interest:
mutate(dtc_final1, Year=str_extract(detection_timestamp_utc, "^202[12]"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论