如何在R中检测列表中的字符串是否包含关键词列表中的单词

huangapple go评论64阅读模式
英文:

How to detect which strings in a list contain words from a list of keywords in R

问题

非常新于R,并希望得到帮助。

我有一个包含1000个产品名称的列表,还有一个包含80个关键词或短语的列表。我需要确定这1000个产品名称中有多少包含一个或多个这些关键词或短语。

示例:如果1000多个产品名称中的一个是"honey bunches of oats",而80多个关键词之一是"honey",我需要它显示为TRUE,出现在"honey bunches of oats"旁边的新列中。

将两个列表都上传为CSV文件。我为每个列表创建了一个向量,并尝试使用以下代码:

str_detect(products, regex(".keywords.", ignore_case = TRUE))

这返回了全部为false的结果。我还尝试使用grepl(keywords, products),但也没有返回任何结果。

我确信应该存在包含这些关键词的情况。它是在寻找精确匹配吗?我需要它显示部分匹配。

英文:

Very new to R and hoping for help.

I have a list of 1000 product names, and I have a list of 80 key words or phrases. I need to determine how many of the 1000 product names contain one or more of those key words or phrases.

Example: if one of the 1000+ product names was "honey bunches of oats" and one of the 80+ keywords is "honey", I need it to show up as TRUE in a new column next to "honey bunches of oats"

Uploaded both lists as csv files. I made a vector for each list, and tried to use the following:

str_detect(products, regex(".keywords.", ignore_case = TRUE))

This came back with all false results. I also tried to use grepl(keywords, products) which returned zero results as well.

I am confident there should be instances where the keywords are contained within these strings. Is it looking for exact matches? I need it to show partial matches.

答案1

得分: 0

尝试:

products <- c('apple hello', 'banana', 'peach', 'a')
.keywords. <- c('apple', 'each')

library(stringr)
str_detect(products, paste0(.keywords., collapse = '|'))

# [1] TRUE FALSE TRUE FALSE
英文:

Try:

products &lt;- c(&#39;apple hello&#39;, &#39;banana&#39;, &#39;peach&#39;, &#39;a&#39;)
.keywords. &lt;- c(&#39;apple&#39;, &#39;each&#39;)

library(stringr)
str_detect(products, paste0(.keywords., collapse = &#39;|&#39;))

# [1]  TRUE FALSE  TRUE FALSE

huangapple
  • 本文由 发表于 2023年2月8日 13:51:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定