Vectorising stringr::str_count

huangapple go评论94阅读模式
英文:

Vectorising stringr::str_count

问题

I'm trying to vectorize the 'pattern' argument of stringr::str_count in R as follows:

  1. library(stringr)
  2. # Define the patterns you want to count
  3. patterns <- c("apple", "banana", "orange")
  4. # Create a vectorized version of str_count
  5. vectorized_str_count <- Vectorize(str_count, vectorize.args = "pattern")
  6. # Input string
  7. string <- "I like apples, bananas, and oranges. Apples are my favorite."
  8. # Count the occurrences of patterns in the string
  9. counts <- vectorized_str_count(string, pattern = patterns)
  10. # Print the counts
  11. print(counts)

I am expecting to get 2, 1, 1 as outputs since there are two occurrences of 'apple' in the original string. However, it returns 1, 1, 1.

How can I amend the code to get what I'm after? I know I could do this by constructing a regex search term but there are various reasons I don't want to do this due to other problems. Many thanks.

英文:

I'm trying to vectorise the 'pattern' argument of stringr::str_count in R as follows:

  1. # Define the patterns you want to count
  2. patterns &lt;- c(&quot;apple&quot;, &quot;banana&quot;, &quot;orange&quot;)
  3. # Create a vectorized version of str_count
  4. vectorized_str_count &lt;- Vectorize(str_count, vectorize.args = &quot;pattern&quot;)
  5. # Input string
  6. string &lt;- &quot;I like apples, bananas, and oranges. Apples are my favorite.&quot;
  7. # Count the occurrences of patterns in the string
  8. counts &lt;- vectorized_str_count(string, pattern = patterns)
  9. # Print the counts
  10. print(counts)

I am expecting to get 2, 1, 1 as outputs since there are two occurences of 'apple' in the original string. However it returns 1, 1, 1

How can I amend the code to get what I'm after? I know I could do this by constructing a regex search term but there are various reasons I don't want to do this due to other problems. Many thanks

答案1

得分: 1

str_count() 函数已对 pattern 参数进行了向量化处理。

对于第一个模式,您只会得到一个匹配,因为模式是区分大小写的:apple 不匹配 Apple。添加 (?i) 或使用 regex() 使模式大小写不敏感:

  1. library(stringr)
  2. x <- "我喜欢苹果,香蕉和橙子。苹果是我最喜欢的。"
  3. str_count(x, c("(?i)苹果", "香蕉", "橙子"))
  4. #> [1] 1 1 1
  5. str_count(x, regex(c("苹果", "香蕉", "橙子"), ignore_case = TRUE))
  6. #> [1] 1 1 1
英文:

str_count() is already vectorised over the pattern argument.

You get only
one match for the first pattern because the pattern is case-sensitive: apple
does not match Apple. Add (?i) or use regex() to make a pattern case-insensitive:

  1. library(stringr)
  2. x &lt;- &quot;I like apples, bananas, and oranges. Apples are my favorite.&quot;
  3. str_count(x, c(&quot;(?i)apple&quot;, &quot;banana&quot;, &quot;orange&quot;))
  4. #&gt; [1] 2 1 1
  5. str_count(x, regex(c(&quot;apple&quot;, &quot;banana&quot;, &quot;orange&quot;), ignore_case = TRUE))
  6. #&gt; [1] 2 1 1

huangapple
  • 本文由 发表于 2023年5月25日 00:41:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76325750.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定