比较两个句子并检查它们是否有相似的词。

huangapple go评论75阅读模式
英文:

Compare two sentences and check if they have a similar word

问题

两个句子中的共同单词是 "test"。

英文:

I'm trying to take two sentences and see if they have words in common. Example:
A- "Hello world this is a test"
B- "Test to create things"

The common word here is "test"

I tried using .contains() but it doesn't work because I can only search for one word.

text1.toLowerCase ().contains(sentence1.toLowerCase ())

答案1

得分: 2

你可以在分割空格后从这两个单词中创建HashSet。你可以使用Set#retainAll来找到交集(共同的单词)。

final String a = "Hello world this is a test", b = "Test to create things";
final Set<String> words = new HashSet<>(Arrays.asList(a.toLowerCase().split("\\s+")));
final Set<String> words2 = new HashSet<>(Arrays.asList(b.toLowerCase().split("\\s+")));
words.retainAll(words2);
System.out.println(words); //[test]
英文:

You can create HashSets from both of the words after splitting on whitespace. You can use Set#retainAll to find the intersection (common words).

final String a = &quot;Hello world this is a test&quot;, b = &quot;Test to create things&quot;;
final Set&lt;String&gt; words = new HashSet&lt;&gt;(Arrays.asList(a.toLowerCase().split(&quot;\\s+&quot;)));
final Set&lt;String&gt; words2 = new HashSet&lt;&gt;(Arrays.asList(b.toLowerCase().split(&quot;\\s+&quot;)));
words.retainAll(words2);
System.out.println(words); //[test]

答案2

得分: 0

你可以按空格拆分句子,并将单词收集到列表中,然后在另一个列表中搜索一个列表项并收集共同的单词。

这里是一个使用Java Stream API的示例。首先将第一个句子的单词收集到Set中,以加快对每个单词的搜索操作(O(1)

String a = "Hello world this is a test";
String b = "Test to create things";
Set<String> aWords = Arrays.stream(a.toLowerCase().split(" "))
                            .collect(Collectors.toSet());
List<String> commonWords = Arrays.stream(b.toLowerCase().split(" "))
                                 .filter(bw -> aWords.contains(bw))
                                 .collect(Collectors.toList());
System.out.println(commonWords);

输出:test

英文:

You can split the sentence by space and collect the word as list and then search one list item in another list and collect the common words.

Here an example using Java Stream API. Here first sentence words collect as Set to faster the search operation for every words (O(1))

String a = &quot;Hello world this is a test&quot;;
String b = &quot;Test to create things&quot;;
Set&lt;String&gt; aWords = Arrays.stream(a.toLowerCase().split(&quot; &quot;))
                            .collect(Collectors.toSet());
List&lt;String&gt; commonWords = Arrays.stream(b.toLowerCase().split(&quot; &quot;))
                                 .filter(bw -&gt; aWords.contains(bw))
                                 .collect(Collectors.toList());
System.out.println(commonWords);

Output: test

答案3

得分: 0

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class Sample {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        String str1 = "Hello world this is a test";
        String str2 = "Test to create things";
        str1 = str1.toLowerCase();
        str2 = str2.toLowerCase();
        String[] str1words = str1.split(" ");
        String[] str2words = str2.split(" ");
        boolean flag = true;
        Set<String> set = new HashSet<String>(Arrays.asList(str1words));
        for(int i = 0; i < str2words.length; i++) {
            flag = set.add(str2words[i]);
            if(flag == false)
                System.out.println(str2words[i] + " is common word");
        }
    }

}
英文:

Spilt the two sentences by space and add each word from first string in a Set. Now in a loop, try adding words from second string in the set. If add operation returns false then it is a common word.

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class Sample {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String str1 = &quot;Hello world this is a test&quot;;
		String str2 = &quot;Test to create things&quot;;
		str1 = str1.toLowerCase();
		str2 = str2.toLowerCase();
		String[] str1words = str1.split(&quot; &quot;);
		String[] str2words = str2.split(&quot; &quot;);
		boolean flag = true;
		Set&lt;String&gt; set = new HashSet&lt;String&gt;(Arrays.asList(str1words));
		for(int i = 0;i&lt;str2words.length;i++) {
			flag = set.add(str2words[i]);
			if(flag == false)
				System.out.println(str2words[i]+&quot; is common word&quot;);
		}
	}

}

答案4

得分: 0

以下是一种方法:

    // 通过空格分割提取句子中的单词
    String[] sentence1Words = sentence1.toLowerCase().split("\\s+");
    String[] sentence2Words = sentence2.toLowerCase().split("\\s+");
        
    // 从两个单词数组创建集合
    Set<String> sentence1WordSet = new HashSet<String>(Arrays.asList(sentence1Words));
    Set<String> sentence2WordSet = new HashSet<String>(Arrays.asList(sentence2Words));
        
    // 获取两个单词集合的交集
    Set<String> commonWords = new HashSet<String>(sentence1WordSet); 
    commonWords.retainAll(sentence2WordSet);        

这将生成一个包含两个句子之间共同单词的小写版本的集合。如果集合为空,表示没有相似性。如果您不关心一些词语,比如介词,您可以在最终的相似性集合中过滤掉这些词,或者更好的办法是预处理您的句子以先删除这些词。

请注意,实际世界中(即有用的)相似性检查的实现通常要复杂得多,因为通常要检查具有轻微差异的相似但不同的单词。一些有用的起点,用于这种类型的字符串相似性检查是Levenshtein距离metaphones

请注意,在上面的代码中,我在创建commonWords集合时存在一个冗余的副本,因为交集是原地执行的,所以您可以通过在sentence1WordSet上执行交集来提高性能,但我更看重代码清晰度而不是性能。

英文:

Here's one approach:

    // extract the words from the sentences by splitting on white space
    String[] sentence1Words = sentence1.toLowerCase().split(&quot;\\s+&quot;);
    String[] sentence2Words = sentence2.toLowerCase().split(&quot;\\s+&quot;);
        
    // make sets from the two word arrays
    Set&lt;String&gt; sentence1WordSet = new HashSet&lt;String&gt;(Arrays.asList(sentence1Words));
    Set&lt;String&gt; sentence2WordSet = new HashSet&lt;String&gt;(Arrays.asList(sentence2Words));
        
    // get the intersection of the two word sets
    Set&lt;String&gt; commonWords = new HashSet&lt;String&gt;(sentence1WordSet); 
    commonWords.retainAll(sentence2WordSet);        

This will yield a Set containing lower case versions of the common words between the two sentences. If it is empty there is no similarity. If you don't care about some words like prepositions you can filter those out of the final similarity set or, better yet, preprocess your sentences to remove those words first.

Note that a real-world (ie. useful) implementation of similarity checking is usually far more complex, as you usually want to check for words that are similar but with minor discrepancies. Some useful starting points to look into for these type of string similarity checking are Levenshtein distance and metaphones.

Note there is a redundant copy of the Set in the code above where I create the commonWords set because intersection is performed in-place, so you could improve performance by simply performing the intersection on sentence1WordSet, but I have favoured code clarity over performance.

答案5

得分: 0

请尝试以下代码。

static boolean contains(String text1, String text2) {
    String text1LowerCase = text1.toLowerCase();
    return Arrays.stream(text2.toLowerCase().split("\\s+"))
        .anyMatch(word -> text1LowerCase.contains(word));
}

String text1 = "Hello world this is a test";
String text2 = "Test to create things";
System.out.println(contains(text1, text2));

输出

true
英文:

Try this.

static boolean contains(String text1, String text2) {
    String text1LowerCase = text1.toLowerCase();
    return Arrays.stream(text2.toLowerCase().split(&quot;\\s+&quot;))
        .anyMatch(word -&gt; text1LowerCase.contains(word));
}

and

String text1 = &quot;Hello world this is a test&quot;;
String text2 = &quot;Test to create things&quot;;
System.out.println(contains(text1, text2));

output:

true

huangapple
  • 本文由 发表于 2020年8月7日 03:45:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/63290733.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定