检查一个字符串是否包含英语单词在 NodeJS 中

huangapple go评论71阅读模式
英文:

Check if a string contains an english word in NodeJS

问题

在NodeJS中创建一个函数,该函数返回true,当一个字符串包含一个长度超过3个字母的英文单词时,可以采用以下方式:

function containsLongEnglishWord(inputString) {
  const words = inputString.split(/\W+/); // Split the string into words
  for (const word of words) {
    if (word.length > 3 && /[a-zA-Z]/.test(word)) {
      return true;
    }
  }
  return false;
}

这个函数会将输入字符串分割成单词,并检查每个单词的长度是否大于3,并且是否包含英文字母。如果找到满足条件的单词,函数返回true,否则返回false。这个方法相对高效,并且不需要额外的字典文件。

英文:

What is the best way in NodeJS to create a function that returns true iff a string contains an english word longer than 3 letters?

The code will be placed in a lambda, so I'm looking for the most efficient solution. The best solution I've got so far is to use dictionary-en and iterate over every word calling .includes(word) on the source string, I was wondering if you can think of better approach to implement this.

Some examples of strings which should return true:

  • y89nsdadhomea98qwoi
  • :_5678aSD.boTTleads.
  • yfugdnuagybdasglassesmidwqihhniwqnhi

Some examples of strings which should return false:

  • y89nsdadhasa98qwoi
  • :_5678aSD.b0TTle4ds.
  • yfugdnuagybdasmidwqihhniwqnhi

答案1

得分: 1

循环遍历一个包含数十万个单词的字典不是一个好主意。由于您的字符串长度在10到200个字符之间,迭代源字符串中的每个字符会在时间复杂度上获得更好的结果。如果您不关心空间复杂度,还有一种更好的方法:

  1. 预先构建一个特殊的字典哈希表,这会花费O(m)的时间(其中m是您的字典单词的数量)。哈希表将类似于:
// 由于JavaScript中的对象类似于哈希映射
dictionaryMap = {
   'hom': 'e',
   'cat': '',
   'bot': 'tle',
   'gla': ['ss', 'cier'], // 包含'glass'和'glacier'
};
  1. 遍历源字符串中的每个字符,查找单词,这样,迭代的时间复杂度为O(n),查找的时间复杂度为O(1):
for (i=0; i<n.length; i++) {
   lookupStr := n[i] + n[i+1] + n[i+2]; // &lt;-- 我知道这有点蠢,只是一个示例 :)))
   if (dictionaryMap.hasOwnProperty(lookupStr) {
      console.log(lookupStr + dictionaryMap[lookupStr])
      return 'hell yeah';
   }
}
  1. 现在您知道源字符串很可能包含一个大于3的英文单词,您可以应用动态规划,构建树状结构或更改dictionaryMap并执行步骤2的递归,如果要查找精确的单词:
dictionaryMap = {
   'gla': 'ss|cier'
}
// 应用动态规划或记忆化来查找最长的连续公共子序列...

或者

// 将映射更改为树状结构
dictionaryMap = {
   'gla': {
      's': {'s': ''},
      'c': {'i': {'e': {'r': ''}}}
   }
// 继续执行步骤2...
// 或者自己构建一个树并搜索精确的单词

=> 总共:O(m) + O(n) + O(1) = O(m) 时间复杂度和O(m)O(m*longestWordCharacters)的空间复杂度。

英文:

Looping over a dictionary (which may contain hundreds of thousands of words) is not a good idea. As your string ranges 10-200 chars, iterating over every characters in the source string gives a better result of time complexity. And if you don't care about space complexity, there's a better approach:

  1. Build an ahead-of-time special dictionary hashmap, this costs you O(m) (which m is the number of your dictionary words). The hashmap will be something like:
// As object is hashmap-like in javascript
dictionaryMap = {
   &#39;hom&#39;: &#39;e&#39;,
   &#39;cat&#39;: &#39;&#39;,
   &#39;bot&#39;: &#39;tle&#39;,
   &#39;gla&#39;: [&#39;ss&#39;, &#39;cier&#39;], // contains &#39;glass&#39; and &#39;glacier&#39;
};
  1. Iterate over every characters in the source string and look for the word, that way, you have O(n) time for the iteration and O(1) for the lookup:
for (i=0; i&lt;n.length; i++) {
   lookupStr := n[i] + n[i+1] + n[i+2]; // &lt;-- I know it&#39;s dump, just a sample :)))
   if (dictionaryMap.hasOwnProperty(lookupStr) {
      console.log(lookupStr + dictionaryMap[lookupStr])
      return &#39;hell yeah&#39;;
   }
}
  1. As now you know that the source string has a high chance that it contains an English word larger than 3, you can apply dynamic programming, building a tree or change the dictionaryMap and do step 2 recursion if you want to look for an exact word:
dictionaryMap = {
   &#39;gla&#39;: &#39;ss|cier&#39;
}
// Apply dynamic programming or memoization to find the longest common continuous subsequence...

OR

// Change the map to be a tree-like structure
dictionaryMap = {
   &#39;gla&#39;: {
      &#39;s&#39;: {&#39;s&#39;: &#39;&#39;},
      &#39;c&#39;: {&#39;i&#39;: {&#39;e&#39;: {&#39;r&#39;: &#39;&#39;}}}
   }
// Continue doing Step 2... 
// Or build a tree yourself and search for the exact word

=> Total: O(m) + O(n) + O(1) = O(m) time complexity and O(m) or O(m*longestWordCharacters) space complexity

答案2

得分: -3

Huh? 这不是 node.js 的工作,而是 JavaScript!

什么是 node.js...

https://en.wikipedia.org/wiki/Node.js

接下来是问题...

一个英语单词,超过3个字母... 但有成千上万个这样的单词!

您有一个包含所有这些单词的文本文件,以便我们可以将它们加载到数组中进行操作吗?

没有?好的,在此期间这是我们最好的选择...

const Valid = ['home', 'boTTle', 'glasses', 'GodKnows'];

var S = 'y89nsdadhomea98qwoi'.toLowerCase(), Ok = false;

for (var i = 0; i < Valid.length; i++) {

 if (S.includes(Valid[i].toLowerCase())) { Ok = true; break;}

}


if (Ok) {

 // 是的,字符串没问题,现在怎么办?

}

抱歉,我喝了几杯苏格兰威士忌,看完您的帖子后哈哈大笑。 检查一个字符串是否包含英语单词在 NodeJS 中

英文:

Huh? This is not a job for node.js but JavaScript!

What is node.js...

https://en.wikipedia.org/wiki/Node.js

Onto the problem at hand...

An English word greater than 3 letters... but there are tens of THOUSANDS of them!

Do you have a text file with all these words included so that we can load them into an array to operate?

No? OK, in the meantime here's the best we've got...

const Valid=[&#39;home&#39;,&#39;boTTle&#39;,&#39;glasses&#39;,&#39;GodKnows&#39;];

var S=&#39;y89nsdadhomea98qwoi&#39;.toLowerCase(), Ok=false;

for(var i=0; i&lt;Valid.length; i++){

 if(S.includes(Valid[i].toLowerCase())){Ok=true; break;}

}


if(Ok){

 // yeah the string is ok, what now?

}

Sorry I’ve had a few scotches and LMAO after reading your post. 检查一个字符串是否包含英语单词在 NodeJS 中

huangapple
  • 本文由 发表于 2023年6月16日 05:22:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485605.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定