Count number of words with 3 or more letters from a string in R

huangapple go评论117阅读模式
英文:

Count number of words with 3 or more letters from a string in R

问题

I can help you with the translation of the code parts you provided. Here they are:

Python code:

  1. # Python中的代码
  2. sentence = 'I have a string sentence but i do not know how to get three lettered words from it'
  3. # 总单词数 = 18
  4. # 3个或更多字母的单词 = 12
  5. # 在Python中的基本方式:
  6. words = sentence.split(' ') # 这创建了一个单词列表
  7. count = 0
  8. for each in words:
  9. if len(each) >= 3:
  10. count = count + 1
  11. print(count)
  12. # Python中的另一种方式(有点粗糙但有效):
  13. print(len(list(filter(lambda word: len(word) >= 3, words))))

R code:

  1. # R中的代码
  2. sentence <- 'I have a string sentence but i do not know how to get three lettered words from it'
  3. # 以下代码在R中引发错误:
  4. words <- strsplit(sentence, split = ' ')
  5. count <- 0
  6. for (word in words) {
  7. l <- nchar(word)
  8. if (l >= 3) {
  9. count <- count + 1
  10. }
  11. }
  12. print(count)
  13. # 这导致了一个错误:
  14. # Error in if (l >= 3) { : the condition has length > 1
  15. # ERROR!
  16. # Execution halted

请注意,R中的错误似乎是因为strsplit函数返回一个嵌套的列表,而不是一个简单的单词列表,因此条件if (l >= 3)应用于整个嵌套列表而不是单个单词。解决方法是将嵌套列表展平为一个单词列表,然后应用条件。

英文:

I have a string sentence. I need to find the count of words from the sentence which have more than or equal to 3 letters.

For example:

  1. sentence &lt;- &#39;I have a string sentence but i do not know how to get three lettered words from it&#39;
  2. # Total words = 18
  3. # 3 or more lettered words = 12

How I do it in Python in a basic way:

  1. words = sentence.split(&#39; &#39;) #this creates a list of words
  2. count = 0
  3. for each in words:
  4. if len(each) &gt;= 3:
  5. count = count + 1
  6. print(count)

Alternative way in python (a little crude but):

  1. print(len(list(filter(lambda word: len(word)&gt;= 3, words))))

I tried doing the same thing in R:

  1. words &lt;- strsplit(sentence, split = &#39; &#39;)
  2. count &lt;- 0
  3. for (word in words) {
  4. l &lt;- nchar(word)
  5. if (l &gt;= 3) {
  6. count &lt;- count + 1
  7. }
  8. }
  9. print(count)

This results in an error for me:

  1. # Error in if (l &gt;= 3) { : the condition has length &gt; 1
  2. # ERROR!
  3. # Execution halted

When I checked this error on the web, it says that if we provide a vector to the if condition, then this error occurs. But I provided it with a simple numeric variable, so I do not understand what is causing this error.

Can someone please explain and help me out?

P.s.: I do not want to use any external package for this. I am learning R so want to do it with basics.

答案1

得分: 4

You can use lengths

  1. sentence &lt;- '我有一个字符串句子,但我不知道如何从中获取三个字母的单词'
  2. lengths(strsplit(sentence, '\\s+'))
  3. # [1] 18

To count words with min. three chars, we use the first element of the resulting list, test if nchar is &gt;= three and sum.

  1. sum(nchar(el(strsplit(sentence, "\\s+"))) &gt;= 3)
  2. # [1] 12

or using pipes:

  1. strsplit(sentence, '\\s+') |&gt; el() |&gt; nchar() |&gt; base::`&gt;=`(3) |&gt; sum()
  2. # [1] 12

The regex \\s+,一个或多个空格,而不是`` cares for (accidentally) multiple whitespaces.

Note:

To clarify lengths vs. length:

  1. length(list(1:3))
  2. # [1] 1
  3. lengths(list(1:3))
  4. # [1] 3
  5. sapply(list(1:3), length) ## equiv.
  6. # [1] 3
英文:

You can use lengths

  1. sentence &lt;- &#39;I have a string sentence but i do not know how to get three lettered words from it&#39;
  2. lengths(strsplit(sentence, &#39;\\s+&#39;))
  3. # [1] 18

To count words with min. three chars, we use the first element of the resulting list, test if nchar is &gt;= three and sum.

  1. sum(nchar(el(strsplit(sentence, &quot;\\s+&quot;))) &gt;= 3)
  2. # [1] 12

or using pipes:

  1. strsplit(sentence, &#39;\\s+&#39;) |&gt; el() |&gt; nchar() |&gt; base::`&gt;=`(3) |&gt; sum()
  2. # [1] 12

The regex &#39;\\s+&#39;, one or more spaces, instead of &#39; &#39; cares for (accidentally) multiple whitespaces.

Note:

To clarify lengths vs. length:

  1. length(list(1:3))
  2. # [1] 1
  3. lengths(list(1:3))
  4. # [1] 3
  5. sapply(list(1:3), length) ## equiv.
  6. # [1] 3

huangapple
  • 本文由 发表于 2023年6月29日 12:31:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76578071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定