Vectorising stringr::str_count

huangapple go评论64阅读模式
英文:

Vectorising stringr::str_count

问题

I'm trying to vectorize the 'pattern' argument of stringr::str_count in R as follows:

library(stringr)

# Define the patterns you want to count
patterns <- c("apple", "banana", "orange")

# Create a vectorized version of str_count
vectorized_str_count <- Vectorize(str_count, vectorize.args = "pattern")

# Input string
string <- "I like apples, bananas, and oranges. Apples are my favorite."

# Count the occurrences of patterns in the string
counts <- vectorized_str_count(string, pattern = patterns)

# Print the counts
print(counts)

I am expecting to get 2, 1, 1 as outputs since there are two occurrences of 'apple' in the original string. However, it returns 1, 1, 1.

How can I amend the code to get what I'm after? I know I could do this by constructing a regex search term but there are various reasons I don't want to do this due to other problems. Many thanks.

英文:

I'm trying to vectorise the 'pattern' argument of stringr::str_count in R as follows:


# Define the patterns you want to count
patterns &lt;- c(&quot;apple&quot;, &quot;banana&quot;, &quot;orange&quot;)

# Create a vectorized version of str_count
vectorized_str_count &lt;- Vectorize(str_count, vectorize.args = &quot;pattern&quot;)

# Input string
string &lt;- &quot;I like apples, bananas, and oranges. Apples are my favorite.&quot;

# Count the occurrences of patterns in the string
counts &lt;- vectorized_str_count(string, pattern = patterns)

# Print the counts
print(counts)

I am expecting to get 2, 1, 1 as outputs since there are two occurences of 'apple' in the original string. However it returns 1, 1, 1

How can I amend the code to get what I'm after? I know I could do this by constructing a regex search term but there are various reasons I don't want to do this due to other problems. Many thanks

答案1

得分: 1

str_count() 函数已对 pattern 参数进行了向量化处理。

对于第一个模式,您只会得到一个匹配,因为模式是区分大小写的:apple 不匹配 Apple。添加 (?i) 或使用 regex() 使模式大小写不敏感:

library(stringr)

x <- "我喜欢苹果,香蕉和橙子。苹果是我最喜欢的。"
str_count(x, c("(?i)苹果", "香蕉", "橙子"))
#> [1] 1 1 1
str_count(x, regex(c("苹果", "香蕉", "橙子"), ignore_case = TRUE))
#> [1] 1 1 1
英文:

str_count() is already vectorised over the pattern argument.

You get only
one match for the first pattern because the pattern is case-sensitive: apple
does not match Apple. Add (?i) or use regex() to make a pattern case-insensitive:

library(stringr)

x &lt;- &quot;I like apples, bananas, and oranges. Apples are my favorite.&quot;
str_count(x, c(&quot;(?i)apple&quot;, &quot;banana&quot;, &quot;orange&quot;))
#&gt; [1] 2 1 1
str_count(x, regex(c(&quot;apple&quot;, &quot;banana&quot;, &quot;orange&quot;), ignore_case = TRUE))
#&gt; [1] 2 1 1

huangapple
  • 本文由 发表于 2023年5月25日 00:41:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76325750.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定