如何检查字符串中是否存在列表中的任何单词?

huangapple go评论81阅读模式
英文:

How to check any of the list of words present in the string?

问题

我有一个单词列表,我需要检查列表中的任何单词是否出现在字符串中,但字符串中的单词可以是任何格式,比如说我有单词列表 {:carloan:,creditcard},但在字符串中它可以是像 car-loancarloan 或者 :carloan 中的任意一种格式。

我正在使用 Java 中的 lambda 函数来查找任何接近的匹配,但它的效果不好,如下所示:

List<String> list = new ArrayList<>();

list.add(":carloan:");
list.add(":creditcard:");
String inputString = "我想要carloan";
boolean match = list.stream().anyMatch(s -> inputString.contains(s));

但是上面的方法只有在子字符串与列表中的单词完全匹配时才会返回布尔值 true

是否有办法使其在部分匹配的情况下也返回 true?比如说用户输入了 car-loan,但在列表中是 :carloan:,我不想使用循环遍历列表并进行匹配。请告诉我是否有办法使用 Java 中的 lambda 函数实现这一点。

英文:

I have list of words which i need to check if any of the words in list is present in string or not but word in the string can be in any format let say i have list of words {:carloan:,creditcard} but in string it can be like car-loan or carloan or :carloan in any of this formats.

I am using lambda function in java to find the any near match but its not working like below:

List&lt;String&gt; list = new ArrayList&lt;&gt;();

list.add(&quot;:carloan:&quot;)
list.add(&quot;:creditcard:&quot;)
String inputString = &quot;i want carloan&quot;
boolean match = list.stream().anyMatch(s -&gt; inputString.contains(s));

But above method is giving boolean true only if the substring is matching exactly same with the word in the list.

Is there way i can give true even if it match partially let say the user entered car-loan but in list it's like :carloan: i don't want to use iterate over a list and do matching. Please suggest me way i can do using lambda function in java.

答案1

得分: 2

您可以在这里使用正则表达式方法:

List<String> list = new ArrayList<>();
list.add("carloan");
list.add("creditcard");

String regex = ".*(?:" + String.join("|", list) + ").*";
String input = "I am looking for a carloan or creditcard";
if (input.matches(regex)) {
    System.out.println("MATCH");
}

对上面的代码进行一些可能的修改是在交替项周围添加单词边界。也就是说,您可能想使用这个正则表达式模式:

.*\b(?:carloan|creditcard)\b.*

这将避免匹配例如 carloans,而您实际上只想完全匹配单数形式的 carloan

编辑:

这是一个使用正则表达式接近您最初的起点的版本:

boolean result = list.stream().anyMatch(s -> input.matches(".*\\b" + s + "\\b.*"));
if (result) {
    System.out.println("MATCH");
}

我们可以对您的术语列表进行流处理,然后使用正则表达式断言输入字符串是否与任何术语匹配。但请注意,这种方法意味着对具有 N 个术语的列表调用 String#matches N 次,而上述方法只对该 API 进行一次调用。我会赌交替方法在这里更有效。

英文:

You could use a regex approach here:

List&lt;String&gt; list = new ArrayList&lt;&gt;();
list.add(&quot;carloan&quot;);
list.add(&quot;creditcard&quot;);

String regex = &quot;.*(?:&quot; + String.join(&quot;|&quot;, list) + &quot;).*&quot;;
String input = &quot;I am looking for a carloan or creditcard&quot;;
if (input.matches(regex)) {
    System.out.println(&quot;MATCH&quot;);
}

Some possible changes you might want to make to the above would be to add word boundaries around the alternation. That is, you might want to use this regex pattern:

.*\b(?:carloan|creditcard)\b.*

This would avoid matching e.g. carloans when you really want to exactly match only the singular carloan.

Edit:

Here is a version using regex closer to your original starting point:

boolean result = list.stream().anyMatch(s -&gt; input.matches(&quot;.*\\b&quot; + s + &quot;\\b.*&quot;));
if (result) {
    System.out.println(&quot;MATCH&quot;);
}

We can stream your list of terms, and then assert whether the input string matches any term using regex. But note that this approach means calling String#matches N times, for a list of N terms, while the above approach just makes a single call to that API. I would bet on the alternation approach being more efficient here.

huangapple
  • 本文由 发表于 2020年9月14日 14:30:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/63879137.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定