Count how many times each string from a column appear (no exact match) in another column in R

huangapple go评论65阅读模式
英文:

Count how many times each string from a column appear (no exact match) in another column in R

问题

我的数据如下:

df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"), pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"))

我正在尝试这样做:

id <- df$id
for (i in length(id)) {
  df$id_in_pedi <- sum(grepl(i, df$pedi))
}

但它不起作用。我想要的结果是这样的:

df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"),
                 pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"),
                 id_in_pedi = c(3, 3, 1, 0, 1, 0))

谢谢。

英文:

My data looks like this <br /> <br /> df &lt;- data.frame(id = c(&quot;p3&quot;, &quot;p5&quot;, &quot;p8&quot;, &quot;p9&quot;, &quot;p10&quot;, &quot;p11&quot;), pedi = c(&quot;p1/p2&quot;, &quot;p3/p4&quot;, &quot;p3/p5&quot;, &quot;(p3/p4)/p5&quot;, &quot;p5/p8&quot;, &quot;p4/p10&quot;))<br /> <br /> I am trying this <br /> <br />

id &lt;- df$id 
for (i in length(id)) {
  df$id_in_pedi &lt;- sum(grepl(i, df$pedi))
}

<br /> But it does not work. The result I am looking for is this: <br />

df &lt;- data.frame(id = c(&quot;p3&quot;, &quot;p5&quot;, &quot;p8&quot;, &quot;p9&quot;, &quot;p10&quot;, &quot;p11&quot;),
                 pedi = c(&quot;p1/p2&quot;, &quot;p3/p4&quot;, &quot;p3/p5&quot;, &quot;(p3/p4)/p5&quot;, &quot;p5/p8&quot;, &quot;p4/p10&quot;),
                 id_in_pedi = c(3,3,1,0,1,0))

<br /> Thanks

答案1

得分: 3

在tidyverse中:

library(tidyverse)
df %>%
  mutate(id_in_pedi = str_count(toString(pedi), id))

在Base R中,使用sapply

transform(df, id_in_pedi = colSums(sapply(id, grepl, pedi, USE.NAMES = FALSE)))

或者使用Vectorize

colSums(Vectorize(grepl)(df$id, list(df$pedi)))

翻译完成。

英文:

In tidyverse:

library(tidyverse)
df %&gt;%
   mutate(id_in_pedi = str_count(toString(pedi), id))

   id       pedi id_in_pedi
1  p3      p1/p2          3
2  p5      p3/p4          3
3  p8      p3/p5          1
4  p9 (p3/p4)/p5          0
5 p10      p5/p8          1
6 p11     p4/p10          0

in Base R:
Using sapply:

transform(df, id_in_pedi = colSums(sapply(id, grepl, pedi, USE.NAMES = FALSE)))

   id       pedi id_in_pedi
1  p3      p1/p2          3
2  p5      p3/p4          3
3  p8      p3/p5          1
4  p9 (p3/p4)/p5          0
5 p10      p5/p8          1
6 p11     p4/p10          0

Using Vectorize:

colSums(Vectorize(grepl)(df$id, list(df$pedi)))
 p3  p5  p8  p9 p10 p11 
  3   3   1   0   1   0 

答案2

得分: 0

使用base R

table(factor(unlist(strsplit(df$pedi, "[/()]")), levels = df$id))

输出

p3  p5  p8  p9 p10 p11 
3   3   1   0   1   0 
英文:

Using base R

 table(factor(unlist(strsplit(df$pedi, &quot;[/()]&quot;)), levels = df$id))

-output

  p3  p5  p8  p9 p10 p11 
  3   3   1   0   1   0 

huangapple
  • 本文由 发表于 2023年2月16日 07:46:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466473.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定