英文:
Is there an efficient way to detect if a string contains a substring which is in a large set of characteristic strings?
问题
例如,给定字符串 aaaaaaaaaXyz
,我想要找出它是否包含在一个特征字符串集合 {'xy','xyz','zzz','cccc','dddd',....}
中的子字符串,该集合可能有一百万个成员。是否有一种高效的方法?
英文:
For example, given a string aaaaaaaaaXyz
, I want to find out if it contains a substring which is in a characteristic string set {'xy','xyz','zzz','cccc','dddd',....}
, which may have one million members. Is there an efficient way?
答案1
得分: 2
鉴于您的搜索集可能非常大,我建议只需迭代该集合并检查潜在的子字符串匹配:
public boolean containsSubstring(String input, Set<String> subs) {
boolean match = false;
for (String sub : subs) {
if (input.contains(sub)) {
match = true;
break;
}
}
return match;
}
英文:
Given that your search set might be very large, I would recommend just iterating that set and checking for a potential substring match:
public boolean containsSubstring(String input, Set<String> subs) {
boolean match = false;
for (String sub : subs) {
if (input.contains(sub)) {
match = true;
break;
}
}
return match;
}
答案2
得分: 2
首先,您准备好dictionary
,就像这样:
Set<String> stringSet = Set.of("xy", "xyz", "zzz", "zzy", "cccc", "dddd");
Map<Character, List<String>> dictionary = new HashMap<>();
for (String word : stringSet)
dictionary.computeIfAbsent(word.charAt(0), k -> new ArrayList<>()).add(word);
System.out.println(dictionary);
输出:
{c=[cccc], d=[dddd], x=[xyz, xy], z=[zzy, zzz]}
然后,您可以使用以下方法来查找:
static boolean contains(String input, Map<Character, List<String>> dictionary) {
for (int i = 0, max = input.length(); i < max; ++i) {
char first = input.charAt(i);
if (dictionary.containsKey(first))
for (String word : dictionary.get(first))
if (input.startsWith(word, i))
return true;
}
return false;
}
英文:
First of all, you prepare the dictionary
. just like this
Set<String> stringSet = Set.of("xy", "xyz", "zzz", "zzy", "cccc", "dddd");
Map<Character, List<String>> dictionary = new HashMap<>();
for (String word : stringSet)
dictionary.computeIfAbsent(word.charAt(0), k -> new ArrayList<>()).add(word);
System.out.println(dictionary);
output:
{c=[cccc], d=[dddd], x=[xyz, xy], z=[zzy, zzz]}
And you can use this method to find out.
static boolean contains(String input, Map<Character, List<String>> dictionary) {
for (int i = 0, max = input.length(); i < max; ++i) {
char first = input.charAt(i);
if (dictionary.containsKey(first))
for (String word : dictionary.get(first))
if (input.startsWith(word, i))
return true;
}
return false;
}
答案3
得分: 0
我找到了Aho-Corasick算法的Java实现,这正是我想要的。感谢Clashsoft的提示。
英文:
With the hint of Clashsoft,I found the java implementation of Aho-Corasick algorithm , which is the one i want ,thanks for Clashsoft
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论