英文:
R - Number of occurrences of a string in a column of excel
问题
I'm using the stringr
library to count the number of occurrences of an array of strings in a column in excel.
Sample data:
As you can see from the Sample data, there are two kinds of apostrophes used '
and ’
. However, in R, I'm only able to use '
while creating the string.arr
. Consequently, the code (below) is not counting the strings which have ’
in them.
It's not feasible to modify the data. Can I solve this in the code such that both '
and ’
in the data are detected by '
in the code.
I'm open to using any other package in R.
英文:
I'm using the stringr
library to count the number of occurrences of an array of strings in a column in excel.
string.arr = c(
"I can't handle this.",
"I shouldn't be this stressed out.",
... more possible strings ...
)
Sample data:
1 col_name
2 “I’m never going to succeed.”,“The professor will be disappointed in me.”,“Other students won’t want to work with me.”,“I shouldn't be this stressed out.",“Other people can handle this situation - what's wrong with me?"
3 “Everyone will think I am dumb.”,“People will make jokes about me if I get the wrong answer.”,“I shouldn't be this stressed out.",“Other people can handle this situation - what's wrong with me?"
4 ... more such rows ...
As you can see from the Sample data, there are two kinds of apostrophes used '
and ’
. However, in R, I'm only able to use '
while creating the string.arr
. Consequently, the code (below) is not counting the strings which have ’
in them.
for (string in string.arr) {
sum(str_count(deidentified_data_text_df$col_name, string), na.rm=TRUE)
}
It's not feasible to modify the data. Can I solve this in the code such that both '
and ’
in the data are detected by '
in the code.
I'm open to using any other package in R.
答案1
得分: 1
如果 string.arr
包含的实际上是要在较大文本中匹配的关键词(或句子),并且问题在于较大文本可能包含两种不同类型的撇号,那么您可以简单地使用正则表达式的选择组将 string.arr
中的所有撇号替换为:
string.arr <- gsub("’|'","(’|')",string.arr)
结果:
string.arr
[1] "I can(’|')t handle this."
[2] "They won(’|')t handle this"
[3] "I shouldn(’|')t be this stressed out."
[4] "no apostrophe"
数据:
string.arr = c(
"I can’t handle this.", # 弯曲的撇号
"They won't handle this", # 直撇号
"I shouldn't be this stressed out.", # 直撇号
"no apostrophe" # 没有撇号
)
英文:
EDIT:
If string.arr
contains what is essentially a list of key words (or sentences) that you want to match in larger text and the problem is that that larger text may contain two kinds of apostrophes, then you might simply replace all apostrophes in string.arr
by a regex alternation group:
string.arr <- gsub("’|'", "(’|')", string.arr)
Result:
string.arr
[1] "I can(’|')t handle this."
[2] "They won(’|')t handle this"
[3] "I shouldn(’|')t be this stressed out."
[4] "no apostrophe"
Data:
string.arr = c(
"I can’t handle this.", # bent apostrophe
"They won't handle this", # straight apostrophe
"I shouldn't be this stressed out.", # straight apostrophe
"no apostrophe" # no apostrophe
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论