检查一个字符串是否包含英语单词在 NodeJS 中

huangapple go评论96阅读模式
英文:

Check if a string contains an english word in NodeJS

问题

在NodeJS中创建一个函数,该函数返回true,当一个字符串包含一个长度超过3个字母的英文单词时,可以采用以下方式:

  1. function containsLongEnglishWord(inputString) {
  2. const words = inputString.split(/\W+/); // Split the string into words
  3. for (const word of words) {
  4. if (word.length > 3 && /[a-zA-Z]/.test(word)) {
  5. return true;
  6. }
  7. }
  8. return false;
  9. }

这个函数会将输入字符串分割成单词,并检查每个单词的长度是否大于3,并且是否包含英文字母。如果找到满足条件的单词,函数返回true,否则返回false。这个方法相对高效,并且不需要额外的字典文件。

英文:

What is the best way in NodeJS to create a function that returns true iff a string contains an english word longer than 3 letters?

The code will be placed in a lambda, so I'm looking for the most efficient solution. The best solution I've got so far is to use dictionary-en and iterate over every word calling .includes(word) on the source string, I was wondering if you can think of better approach to implement this.

Some examples of strings which should return true:

  • y89nsdadhomea98qwoi
  • :_5678aSD.boTTleads.
  • yfugdnuagybdasglassesmidwqihhniwqnhi

Some examples of strings which should return false:

  • y89nsdadhasa98qwoi
  • :_5678aSD.b0TTle4ds.
  • yfugdnuagybdasmidwqihhniwqnhi

答案1

得分: 1

循环遍历一个包含数十万个单词的字典不是一个好主意。由于您的字符串长度在10到200个字符之间,迭代源字符串中的每个字符会在时间复杂度上获得更好的结果。如果您不关心空间复杂度,还有一种更好的方法:

  1. 预先构建一个特殊的字典哈希表,这会花费O(m)的时间(其中m是您的字典单词的数量)。哈希表将类似于:
  1. // 由于JavaScript中的对象类似于哈希映射
  2. dictionaryMap = {
  3. 'hom': 'e',
  4. 'cat': '',
  5. 'bot': 'tle',
  6. 'gla': ['ss', 'cier'], // 包含'glass'和'glacier'
  7. };
  1. 遍历源字符串中的每个字符,查找单词,这样,迭代的时间复杂度为O(n),查找的时间复杂度为O(1):
  1. for (i=0; i<n.length; i++) {
  2. lookupStr := n[i] + n[i+1] + n[i+2]; // &lt;-- 我知道这有点蠢,只是一个示例 :)))
  3. if (dictionaryMap.hasOwnProperty(lookupStr) {
  4. console.log(lookupStr + dictionaryMap[lookupStr])
  5. return 'hell yeah';
  6. }
  7. }
  1. 现在您知道源字符串很可能包含一个大于3的英文单词,您可以应用动态规划,构建树状结构或更改dictionaryMap并执行步骤2的递归,如果要查找精确的单词:
  1. dictionaryMap = {
  2. 'gla': 'ss|cier'
  3. }
  4. // 应用动态规划或记忆化来查找最长的连续公共子序列...

或者

  1. // 将映射更改为树状结构
  2. dictionaryMap = {
  3. 'gla': {
  4. 's': {'s': ''},
  5. 'c': {'i': {'e': {'r': ''}}}
  6. }
  7. // 继续执行步骤2...
  8. // 或者自己构建一个树并搜索精确的单词

=> 总共:O(m) + O(n) + O(1) = O(m) 时间复杂度和O(m)O(m*longestWordCharacters)的空间复杂度。

英文:

Looping over a dictionary (which may contain hundreds of thousands of words) is not a good idea. As your string ranges 10-200 chars, iterating over every characters in the source string gives a better result of time complexity. And if you don't care about space complexity, there's a better approach:

  1. Build an ahead-of-time special dictionary hashmap, this costs you O(m) (which m is the number of your dictionary words). The hashmap will be something like:
  1. // As object is hashmap-like in javascript
  2. dictionaryMap = {
  3. &#39;hom&#39;: &#39;e&#39;,
  4. &#39;cat&#39;: &#39;&#39;,
  5. &#39;bot&#39;: &#39;tle&#39;,
  6. &#39;gla&#39;: [&#39;ss&#39;, &#39;cier&#39;], // contains &#39;glass&#39; and &#39;glacier&#39;
  7. };
  1. Iterate over every characters in the source string and look for the word, that way, you have O(n) time for the iteration and O(1) for the lookup:
  1. for (i=0; i&lt;n.length; i++) {
  2. lookupStr := n[i] + n[i+1] + n[i+2]; // &lt;-- I know it&#39;s dump, just a sample :)))
  3. if (dictionaryMap.hasOwnProperty(lookupStr) {
  4. console.log(lookupStr + dictionaryMap[lookupStr])
  5. return &#39;hell yeah&#39;;
  6. }
  7. }
  1. As now you know that the source string has a high chance that it contains an English word larger than 3, you can apply dynamic programming, building a tree or change the dictionaryMap and do step 2 recursion if you want to look for an exact word:
  1. dictionaryMap = {
  2. &#39;gla&#39;: &#39;ss|cier&#39;
  3. }
  4. // Apply dynamic programming or memoization to find the longest common continuous subsequence...

OR

  1. // Change the map to be a tree-like structure
  2. dictionaryMap = {
  3. &#39;gla&#39;: {
  4. &#39;s&#39;: {&#39;s&#39;: &#39;&#39;},
  5. &#39;c&#39;: {&#39;i&#39;: {&#39;e&#39;: {&#39;r&#39;: &#39;&#39;}}}
  6. }
  7. // Continue doing Step 2...
  8. // Or build a tree yourself and search for the exact word

=> Total: O(m) + O(n) + O(1) = O(m) time complexity and O(m) or O(m*longestWordCharacters) space complexity

答案2

得分: -3

Huh? 这不是 node.js 的工作,而是 JavaScript!

什么是 node.js...

https://en.wikipedia.org/wiki/Node.js

接下来是问题...

一个英语单词,超过3个字母... 但有成千上万个这样的单词!

您有一个包含所有这些单词的文本文件,以便我们可以将它们加载到数组中进行操作吗?

没有?好的,在此期间这是我们最好的选择...

  1. const Valid = ['home', 'boTTle', 'glasses', 'GodKnows'];
  2. var S = 'y89nsdadhomea98qwoi'.toLowerCase(), Ok = false;
  3. for (var i = 0; i < Valid.length; i++) {
  4. if (S.includes(Valid[i].toLowerCase())) { Ok = true; break;}
  5. }
  6. if (Ok) {
  7. // 是的,字符串没问题,现在怎么办?
  8. }

抱歉,我喝了几杯苏格兰威士忌,看完您的帖子后哈哈大笑。 检查一个字符串是否包含英语单词在 NodeJS 中

英文:

Huh? This is not a job for node.js but JavaScript!

What is node.js...

https://en.wikipedia.org/wiki/Node.js

Onto the problem at hand...

An English word greater than 3 letters... but there are tens of THOUSANDS of them!

Do you have a text file with all these words included so that we can load them into an array to operate?

No? OK, in the meantime here's the best we've got...

  1. const Valid=[&#39;home&#39;,&#39;boTTle&#39;,&#39;glasses&#39;,&#39;GodKnows&#39;];
  2. var S=&#39;y89nsdadhomea98qwoi&#39;.toLowerCase(), Ok=false;
  3. for(var i=0; i&lt;Valid.length; i++){
  4. if(S.includes(Valid[i].toLowerCase())){Ok=true; break;}
  5. }
  6. if(Ok){
  7. // yeah the string is ok, what now?
  8. }

Sorry I’ve had a few scotches and LMAO after reading your post. 检查一个字符串是否包含英语单词在 NodeJS 中

huangapple
  • 本文由 发表于 2023年6月16日 05:22:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485605.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定