Count how many times each string from a column appear (no exact match) in another column in R

huangapple go评论80阅读模式
英文:

Count how many times each string from a column appear (no exact match) in another column in R

问题

我的数据如下:

  1. df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"), pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"))

我正在尝试这样做:

  1. id <- df$id
  2. for (i in length(id)) {
  3. df$id_in_pedi <- sum(grepl(i, df$pedi))
  4. }

但它不起作用。我想要的结果是这样的:

  1. df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"),
  2. pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"),
  3. id_in_pedi = c(3, 3, 1, 0, 1, 0))

谢谢。

英文:

My data looks like this <br /> <br /> df &lt;- data.frame(id = c(&quot;p3&quot;, &quot;p5&quot;, &quot;p8&quot;, &quot;p9&quot;, &quot;p10&quot;, &quot;p11&quot;), pedi = c(&quot;p1/p2&quot;, &quot;p3/p4&quot;, &quot;p3/p5&quot;, &quot;(p3/p4)/p5&quot;, &quot;p5/p8&quot;, &quot;p4/p10&quot;))<br /> <br /> I am trying this <br /> <br />

  1. id &lt;- df$id
  2. for (i in length(id)) {
  3. df$id_in_pedi &lt;- sum(grepl(i, df$pedi))
  4. }

<br /> But it does not work. The result I am looking for is this: <br />

  1. df &lt;- data.frame(id = c(&quot;p3&quot;, &quot;p5&quot;, &quot;p8&quot;, &quot;p9&quot;, &quot;p10&quot;, &quot;p11&quot;),
  2. pedi = c(&quot;p1/p2&quot;, &quot;p3/p4&quot;, &quot;p3/p5&quot;, &quot;(p3/p4)/p5&quot;, &quot;p5/p8&quot;, &quot;p4/p10&quot;),
  3. id_in_pedi = c(3,3,1,0,1,0))

<br /> Thanks

答案1

得分: 3

在tidyverse中:

  1. library(tidyverse)
  2. df %>%
  3. mutate(id_in_pedi = str_count(toString(pedi), id))

在Base R中,使用sapply

  1. transform(df, id_in_pedi = colSums(sapply(id, grepl, pedi, USE.NAMES = FALSE)))

或者使用Vectorize

  1. colSums(Vectorize(grepl)(df$id, list(df$pedi)))

翻译完成。

英文:

In tidyverse:

  1. library(tidyverse)
  2. df %&gt;%
  3. mutate(id_in_pedi = str_count(toString(pedi), id))
  4. id pedi id_in_pedi
  5. 1 p3 p1/p2 3
  6. 2 p5 p3/p4 3
  7. 3 p8 p3/p5 1
  8. 4 p9 (p3/p4)/p5 0
  9. 5 p10 p5/p8 1
  10. 6 p11 p4/p10 0

in Base R:
Using sapply:

  1. transform(df, id_in_pedi = colSums(sapply(id, grepl, pedi, USE.NAMES = FALSE)))
  2. id pedi id_in_pedi
  3. 1 p3 p1/p2 3
  4. 2 p5 p3/p4 3
  5. 3 p8 p3/p5 1
  6. 4 p9 (p3/p4)/p5 0
  7. 5 p10 p5/p8 1
  8. 6 p11 p4/p10 0

Using Vectorize:

  1. colSums(Vectorize(grepl)(df$id, list(df$pedi)))
  2. p3 p5 p8 p9 p10 p11
  3. 3 3 1 0 1 0

答案2

得分: 0

使用base R

  1. table(factor(unlist(strsplit(df$pedi, "[/()]")), levels = df$id))

输出

  1. p3 p5 p8 p9 p10 p11
  2. 3 3 1 0 1 0
英文:

Using base R

  1. table(factor(unlist(strsplit(df$pedi, &quot;[/()]&quot;)), levels = df$id))

-output

  1. p3 p5 p8 p9 p10 p11
  2. 3 3 1 0 1 0

huangapple
  • 本文由 发表于 2023年2月16日 07:46:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466473.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定