R: Dplyr:如何检查一个变量的值是否包含在另一个变量中

huangapple go评论64阅读模式
英文:

R: Dplyr: How to Check if the Value of One Variable is Contained in Another

问题

我有数百条记录,其中包含"state_name"(阿拉斯加、亚拉巴马等),需要确定"state_name"的值是否包含在另一个变量"jurisdiction_name"中的任何位置。我知道如何搜索字符串以查找单个值,例如"Alabama",可以使用类似以下的方法:

mutate(type_state=ifelse(grepl("Alabama", jurisd_name), 1, 0)) %>%

如何在每一行中搜索以确定州名(每行不同)是否包含在管辖区名称中?换句话说,我正在搜索"state_name"的不断变化的值,而不是单个州。

是否可以像这样做:

df2 <- df %>%
  mutate(state_val = get(state_name)) %>%
  mutate(type_state = ifelse(grepl(state_val, jurisd_name), 1, 0))

显然,这段代码不起作用,因为grepl需要一个字符串模式,例如grepl("Alabama", jurisdiction_name)。

但是,我不知道如何搜索每行数据中的变化值。

英文:

I have hundreds of records with "state_name" (Alaska, Alabama etc.) and need to determine whether the value of state_name is contained anywhere in another variable "jurisdiction_name". I know how to search a string for a SINGLE value e.g. "Alabama" using something like:

mutate(type_state=ifelse(grepl(&quot;Alabama&quot;,jurisd_name),1,0)) %&gt;% 

How can I search each row to determine whether the state name (differing on each row) is contained in the jurisdiction name? In other words, I am searching for the changing VALUE of state_name, not a single state.

Is there a way to do something like:

df2 &lt;- df %&gt;%
  mutate(state_val=get(state_name))%&gt;%
  mutate(type_state=ifelse(grepl(state_val,jurisd_name),1,0))

Obviously, this code doesn't work because grepl is expecting a string pattern e.g. grepl("Alabama",jurisdiction_name)

However, I don't know how to search for a VALUE that changes on each row of data.

答案1

得分: 1

你可以使用内置常量 state.name 并将该向量中的元素转换为交替模式:

mutate(type_state = ifelse(grepl(str_c(state.name, collapse = "|"), jurisd_name), 1, 0))

或者始终使用 stringr

mutate(type_state = ifelse(str_detect(jurisd_name, str_c(state.name, collapse = "|")), 1, 0))
英文:

You can use the built-in constant state.name and turn the elements in that vector into an alternation pattern:

mutate(type_state = ifelse(grepl(str_c(state.name, collapse = &quot;|&quot;),jurisd_name),1,0))

or to use stringrconsistently:

mutate(type_state = ifelse(str_detect(jurisd_name, str_c(state.name, collapse = &quot;|&quot;), 1, 0))

答案2

得分: 0

If I understand correctly your issue, here is a solution that should easily be adapted to your case:

df <- tibble::tibble(a = month.name, b = c(letters[1:6], letters[1:6]))

df |&gt; 
  dplyr::mutate(check = stringr::str_detect(string = a, pattern = b))
#&gt; # A tibble: 12 &#215; 3
#&gt;    a         b     check
#&gt;    &lt;chr&gt;     &lt;chr&gt; &lt;lgl&gt;
#&gt;  1 January   a     TRUE 
#&gt;  2 February  b     TRUE 
#&gt;  3 March     c     TRUE 
#&gt;  4 April     d     FALSE
#&gt;  5 May       e     FALSE
#&gt;  6 June      f     FALSE
#&gt;  7 July      a     FALSE
#&gt;  8 August    b     FALSE
#&gt;  9 September c     FALSE
#&gt; 10 October   d     FALSE
#&gt; 11 November  e     TRUE 
#&gt; 12 December  f     FALSE

Created on 2023-05-14 with reprex v2.0.2

Basically, if I understood correctly what you are trying to achieve, you'd probably just need to replace a with state_val and b with jurisd_name.

If you want to use grepl, you can do so by grouping, and inverting the order of the parameters:

df |&gt; 
  dplyr::group_by(a, b) |&gt; 
  dplyr::mutate(check = grepl(b, a)) |&gt; 
  dplyr::ungroup()
英文:

If I understand correctly your issue, here is a solution that should easily be adapted to your case:

df &lt;- tibble::tibble(a = month.name, b = c(letters[1:6], letters[1:6]))

df |&gt; 
  dplyr::mutate(check = stringr::str_detect(string = a, pattern = b))
#&gt; # A tibble: 12 &#215; 3
#&gt;    a         b     check
#&gt;    &lt;chr&gt;     &lt;chr&gt; &lt;lgl&gt;
#&gt;  1 January   a     TRUE 
#&gt;  2 February  b     TRUE 
#&gt;  3 March     c     TRUE 
#&gt;  4 April     d     FALSE
#&gt;  5 May       e     FALSE
#&gt;  6 June      f     FALSE
#&gt;  7 July      a     FALSE
#&gt;  8 August    b     FALSE
#&gt;  9 September c     FALSE
#&gt; 10 October   d     FALSE
#&gt; 11 November  e     TRUE 
#&gt; 12 December  f     FALSE

<sup>Created on 2023-05-14 with reprex v2.0.2</sup>

Basically, if I understood correctly what you are trying to achieve, you'd probably just need to replace a with state_val and b with 'jurisd_name`.

If you want to use grepl, you can do so by grouping, and inverting the order of the parameters:

df |&gt; 
  dplyr::group_by(a, b) |&gt; 
  dplyr::mutate(check = grepl(b, a)) |&gt; 
  dplyr::ungroup()

huangapple
  • 本文由 发表于 2023年5月15日 03:35:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76249347.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定