Count number of words with 3 or more letters from a string in R

huangapple go评论69阅读模式
英文:

Count number of words with 3 or more letters from a string in R

问题

I can help you with the translation of the code parts you provided. Here they are:

Python code:

# Python中的代码
sentence = 'I have a string sentence but i do not know how to get three lettered words from it'
# 总单词数 = 18
# 3个或更多字母的单词 = 12

# 在Python中的基本方式:
words = sentence.split(' ') # 这创建了一个单词列表
count = 0
for each in words:
    if len(each) >= 3:
        count = count + 1

print(count)

# Python中的另一种方式(有点粗糙但有效):
print(len(list(filter(lambda word: len(word) >= 3, words))))

R code:

# R中的代码
sentence <- 'I have a string sentence but i do not know how to get three lettered words from it'

# 以下代码在R中引发错误:
words <- strsplit(sentence, split = ' ')
count <- 0
for (word in words) {
    l <- nchar(word)
    if (l >= 3) {
        count <- count + 1
    }
}

print(count)

# 这导致了一个错误:
# Error in if (l >= 3) { : the condition has length > 1
# ERROR!
# Execution halted

请注意,R中的错误似乎是因为strsplit函数返回一个嵌套的列表,而不是一个简单的单词列表,因此条件if (l >= 3)应用于整个嵌套列表而不是单个单词。解决方法是将嵌套列表展平为一个单词列表,然后应用条件。

英文:

I have a string sentence. I need to find the count of words from the sentence which have more than or equal to 3 letters.

For example:

sentence &lt;- &#39;I have a string sentence but i do not know how to get three lettered words from it&#39;
# Total words = 18
# 3 or more lettered words = 12

How I do it in Python in a basic way:

words = sentence.split(&#39; &#39;) #this creates a list of words
count = 0
for each in words:
    if len(each) &gt;= 3:
        count = count + 1

print(count)

Alternative way in python (a little crude but):

print(len(list(filter(lambda word: len(word)&gt;= 3, words))))

I tried doing the same thing in R:

words &lt;- strsplit(sentence, split = &#39; &#39;)
count &lt;- 0
for (word in words) {
    l &lt;- nchar(word)
    if (l &gt;= 3) {
        count &lt;- count + 1
    }
}

print(count)

This results in an error for me:

# Error in if (l &gt;= 3) { : the condition has length &gt; 1
# ERROR!
# Execution halted

When I checked this error on the web, it says that if we provide a vector to the if condition, then this error occurs. But I provided it with a simple numeric variable, so I do not understand what is causing this error.

Can someone please explain and help me out?

P.s.: I do not want to use any external package for this. I am learning R so want to do it with basics.

答案1

得分: 4

You can use lengths

sentence &lt;- '我有一个字符串句子,但我不知道如何从中获取三个字母的单词'
lengths(strsplit(sentence, '\\s+'))
# [1] 18

To count words with min. three chars, we use the first element of the resulting list, test if nchar is &gt;= three and sum.

sum(nchar(el(strsplit(sentence, "\\s+"))) &gt;= 3)
# [1] 12

or using pipes:

strsplit(sentence, '\\s+') |&gt; el() |&gt; nchar() |&gt; base::`&gt;=`(3) |&gt; sum()
# [1] 12

The regex \\s+,一个或多个空格,而不是`` cares for (accidentally) multiple whitespaces.

Note:

To clarify lengths vs. length:

length(list(1:3))
# [1] 1

lengths(list(1:3))
# [1] 3

sapply(list(1:3), length)  ## equiv.
# [1] 3
英文:

You can use lengths

sentence &lt;- &#39;I have a string sentence but i do not know how to get three lettered words from it&#39;

lengths(strsplit(sentence, &#39;\\s+&#39;))
# [1] 18

To count words with min. three chars, we use the first element of the resulting list, test if nchar is &gt;= three and sum.

sum(nchar(el(strsplit(sentence, &quot;\\s+&quot;))) &gt;= 3)
# [1] 12

or using pipes:

strsplit(sentence, &#39;\\s+&#39;) |&gt; el() |&gt; nchar() |&gt; base::`&gt;=`(3) |&gt; sum()
# [1] 12

The regex &#39;\\s+&#39;, one or more spaces, instead of &#39; &#39; cares for (accidentally) multiple whitespaces.

Note:

To clarify lengths vs. length:

length(list(1:3))
# [1] 1

lengths(list(1:3))
# [1] 3

sapply(list(1:3), length)  ## equiv.
# [1] 3

huangapple
  • 本文由 发表于 2023年6月29日 12:31:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76578071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定