2023年2月7日 01:39:42go评论97阅读模式

英文:

R - Number of occurrences of a string in a column of excel

问题

I'm using the stringr library to count the number of occurrences of an array of strings in a column in excel.

Sample data:

As you can see from the Sample data, there are two kinds of apostrophes used ' and ’. However, in R, I'm only able to use ' while creating the string.arr. Consequently, the code (below) is not counting the strings which have ’ in them.

It's not feasible to modify the data. Can I solve this in the code such that both ' and ’ in the data are detected by ' in the code.

I'm open to using any other package in R.

英文:

I'm using the stringrlibrary to count the number of occurrences of an array of strings in a column in excel.

string.arr =  c(
    &quot;I can&#39;t handle this.&quot;,
    &quot;I shouldn&#39;t be this stressed out.&quot;,
    ... more possible strings ...
)

Sample data:

1 col_name
2 “I’m never going to succeed.”,“The professor will be disappointed in me.”,“Other students won’t want to work with me.”,“I shouldn&#39;t be this stressed out.&quot;,“Other people can handle this situation - what&#39;s wrong with me?&quot;
3 “Everyone will think I am dumb.”,“People will make jokes about me if I get the wrong answer.”,“I shouldn&#39;t be this stressed out.&quot;,“Other people can handle this situation - what&#39;s wrong with me?&quot;
4 ... more such rows ...

for (string in string.arr) {
 sum(str_count(deidentified_data_text_df$col_name, string), na.rm=TRUE)
}

It's not feasible to modify the data. Can I solve this in the code such that both ' and ’ in the data are detected by ' in the code.

I'm open to using any other package in R.

答案1

得分: 1

如果 string.arr 包含的实际上是要在较大文本中匹配的关键词（或句子），并且问题在于较大文本可能包含两种不同类型的撇号，那么您可以简单地使用正则表达式的选择组将 string.arr 中的所有撇号替换为：

string.arr <- gsub("’|&#39;","(’|&#39;)",string.arr)

结果：

string.arr
[1] "I can(’|&#39;)t handle this."              
[2] "They won(’|&#39;)t handle this"            
[3] "I shouldn(’|&#39;)t be this stressed out."
[4] "no apostrophe"

数据：

string.arr =  c(
  "I can’t handle this.",                          # 弯曲的撇号
  "They won&#39;t handle this",                        # 直撇号
  "I shouldn&#39;t be this stressed out.",             # 直撇号
  "no apostrophe"                                  # 没有撇号
)

英文:

EDIT:

If string.arr contains what is essentially a list of key words (or sentences) that you want to match in larger text and the problem is that that larger text may contain two kinds of apostrophes, then you might simply replace all apostrophes in string.arr by a regex alternation group:

string.arr &lt;- gsub(&quot;’|&#39;&quot;, &quot;(’|&#39;)&quot;, string.arr)

Result:

string.arr
[1] &quot;I can(’|&#39;)t handle this.&quot;              
[2] &quot;They won(’|&#39;)t handle this&quot;            
[3] &quot;I shouldn(’|&#39;)t be this stressed out.&quot;
[4] &quot;no apostrophe&quot;

Data:

string.arr =  c(
  &quot;I can’t handle this.&quot;,                          # bent apostrophe
  &quot;They won&#39;t handle this&quot;,                        # straight apostrophe
  &quot;I shouldn&#39;t be this stressed out.&quot;,             # straight apostrophe
  &quot;no apostrophe&quot;                                  # no apostrophe
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Excel列中字符串的出现次数 – R

问题

答案1

上传文本文档到 R

在R中使用ggplot2的geom_text()函数来使用变量设置斜体文本。

从多个较大的组中减去指定子组的值。

How to create a function in R to add NBER recession shades to different facets in ggplot?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。