英文:
R: Dplyr: How to Check if the Value of One Variable is Contained in Another
问题
我有数百条记录,其中包含"state_name"(阿拉斯加、亚拉巴马等),需要确定"state_name"的值是否包含在另一个变量"jurisdiction_name"中的任何位置。我知道如何搜索字符串以查找单个值,例如"Alabama",可以使用类似以下的方法:
mutate(type_state=ifelse(grepl("Alabama", jurisd_name), 1, 0)) %>%
如何在每一行中搜索以确定州名(每行不同)是否包含在管辖区名称中?换句话说,我正在搜索"state_name"的不断变化的值,而不是单个州。
是否可以像这样做:
df2 <- df %>%
mutate(state_val = get(state_name)) %>%
mutate(type_state = ifelse(grepl(state_val, jurisd_name), 1, 0))
显然,这段代码不起作用,因为grepl需要一个字符串模式,例如grepl("Alabama", jurisdiction_name)。
但是,我不知道如何搜索每行数据中的变化值。
英文:
I have hundreds of records with "state_name" (Alaska, Alabama etc.) and need to determine whether the value of state_name is contained anywhere in another variable "jurisdiction_name". I know how to search a string for a SINGLE value e.g. "Alabama" using something like:
mutate(type_state=ifelse(grepl("Alabama",jurisd_name),1,0)) %>%
How can I search each row to determine whether the state name (differing on each row) is contained in the jurisdiction name? In other words, I am searching for the changing VALUE of state_name, not a single state.
Is there a way to do something like:
df2 <- df %>%
mutate(state_val=get(state_name))%>%
mutate(type_state=ifelse(grepl(state_val,jurisd_name),1,0))
Obviously, this code doesn't work because grepl is expecting a string pattern e.g. grepl("Alabama",jurisdiction_name)
However, I don't know how to search for a VALUE that changes on each row of data.
答案1
得分: 1
你可以使用内置常量 state.name
并将该向量中的元素转换为交替模式:
mutate(type_state = ifelse(grepl(str_c(state.name, collapse = "|"), jurisd_name), 1, 0))
或者始终使用 stringr
:
mutate(type_state = ifelse(str_detect(jurisd_name, str_c(state.name, collapse = "|")), 1, 0))
英文:
You can use the built-in constant state.name
and turn the elements in that vector into an alternation pattern:
mutate(type_state = ifelse(grepl(str_c(state.name, collapse = "|"),jurisd_name),1,0))
or to use stringr
consistently:
mutate(type_state = ifelse(str_detect(jurisd_name, str_c(state.name, collapse = "|"), 1, 0))
答案2
得分: 0
If I understand correctly your issue, here is a solution that should easily be adapted to your case:
df <- tibble::tibble(a = month.name, b = c(letters[1:6], letters[1:6]))
df |>
dplyr::mutate(check = stringr::str_detect(string = a, pattern = b))
#> # A tibble: 12 × 3
#> a b check
#> <chr> <chr> <lgl>
#> 1 January a TRUE
#> 2 February b TRUE
#> 3 March c TRUE
#> 4 April d FALSE
#> 5 May e FALSE
#> 6 June f FALSE
#> 7 July a FALSE
#> 8 August b FALSE
#> 9 September c FALSE
#> 10 October d FALSE
#> 11 November e TRUE
#> 12 December f FALSE
Created on 2023-05-14 with reprex v2.0.2
Basically, if I understood correctly what you are trying to achieve, you'd probably just need to replace a
with state_val
and b
with jurisd_name
.
If you want to use grepl
, you can do so by grouping, and inverting the order of the parameters:
df |>
dplyr::group_by(a, b) |>
dplyr::mutate(check = grepl(b, a)) |>
dplyr::ungroup()
英文:
If I understand correctly your issue, here is a solution that should easily be adapted to your case:
df <- tibble::tibble(a = month.name, b = c(letters[1:6], letters[1:6]))
df |>
dplyr::mutate(check = stringr::str_detect(string = a, pattern = b))
#> # A tibble: 12 × 3
#> a b check
#> <chr> <chr> <lgl>
#> 1 January a TRUE
#> 2 February b TRUE
#> 3 March c TRUE
#> 4 April d FALSE
#> 5 May e FALSE
#> 6 June f FALSE
#> 7 July a FALSE
#> 8 August b FALSE
#> 9 September c FALSE
#> 10 October d FALSE
#> 11 November e TRUE
#> 12 December f FALSE
<sup>Created on 2023-05-14 with reprex v2.0.2</sup>
Basically, if I understood correctly what you are trying to achieve, you'd probably just need to replace a
with state_val
and b
with 'jurisd_name`.
If you want to use grepl
, you can do so by grouping, and inverting the order of the parameters:
df |>
dplyr::group_by(a, b) |>
dplyr::mutate(check = grepl(b, a)) |>
dplyr::ungroup()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论