2023年2月8日 13:51:48go评论96阅读模式

英文:

How to detect which strings in a list contain words from a list of keywords in R

问题

非常新于R，并希望得到帮助。

我有一个包含1000个产品名称的列表，还有一个包含80个关键词或短语的列表。我需要确定这1000个产品名称中有多少包含一个或多个这些关键词或短语。

示例：如果1000多个产品名称中的一个是"honey bunches of oats"，而80多个关键词之一是"honey"，我需要它显示为TRUE，出现在"honey bunches of oats"旁边的新列中。

将两个列表都上传为CSV文件。我为每个列表创建了一个向量，并尝试使用以下代码：

str_detect(products, regex(".keywords.", ignore_case = TRUE))

这返回了全部为false的结果。我还尝试使用grepl(keywords, products)，但也没有返回任何结果。

我确信应该存在包含这些关键词的情况。它是在寻找精确匹配吗？我需要它显示部分匹配。

英文:

Very new to R and hoping for help.

I have a list of 1000 product names, and I have a list of 80 key words or phrases. I need to determine how many of the 1000 product names contain one or more of those key words or phrases.

Example: if one of the 1000+ product names was "honey bunches of oats" and one of the 80+ keywords is "honey", I need it to show up as TRUE in a new column next to "honey bunches of oats"

Uploaded both lists as csv files. I made a vector for each list, and tried to use the following:

str_detect(products, regex(&quot;.keywords.&quot;, ignore_case = TRUE))

This came back with all false results. I also tried to use grepl(keywords, products) which returned zero results as well.

I am confident there should be instances where the keywords are contained within these strings. Is it looking for exact matches? I need it to show partial matches.

答案1

得分: 0

尝试：

products <- c('apple hello', 'banana', 'peach', 'a')
.keywords. <- c('apple', 'each')
library(stringr)
str_detect(products, paste0(.keywords., collapse = '|'))
# [1] TRUE FALSE TRUE FALSE

英文:

Try:

products &lt;- c(&#39;apple hello&#39;, &#39;banana&#39;, &#39;peach&#39;, &#39;a&#39;)
.keywords. &lt;- c(&#39;apple&#39;, &#39;each&#39;)
library(stringr)
str_detect(products, paste0(.keywords., collapse = &#39;|&#39;))
# [1]  TRUE FALSE  TRUE FALSE

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中检测列表中的字符串是否包含关键词列表中的单词

问题

答案1

如何找到一个对象的长度

找不到gt中的cols_add

验证URL中的斜杠（/）的正则表达式

使用annotation_custom在ggplot上以编程方式定位图像。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。