英文:
Find the list of starting indexes of occurring of a string in another string in O(N) time complexity
问题
给定两个字符串:str1 和 str2。找到 str1 在 str2 中的所有起始索引。示例:
I/p = str1: abc, str2: abckdabcgfacabc
O/p = [0, 5, 12]
public static List<Integer> firstMatchingIndexes(String str1, String str2) {
List<Integer> indexes = new ArrayList<>();
int end = 0, n = str2.length();
for (; end < n; end++) {
int index = str2.substring(end, n).indexOf(str1) + end;
if (index != -1)
indexes.add(index);
else
break;
end = index + str1.length() - 1;
}
return indexes;
}
但这种方法使用了内部具有O(N)时间复杂度的indexOf()。KMP算法能在这里工作吗?
英文:
Given are two strings : str1 and str2. Find all the starting indexes of str1 in str2. Example:
I/p = str1: abc, str2: abckdabcgfacabc
O/p = [0, 5, 12]
public static List<Integer> firstMatchingIndexes(String str1, String str2) {
List<Integer> indexes = new ArrayList<>();
int end = 0, n = str2.length();
for(; end<n; end++) {
int index = str2.substring(end, n).indexOf(str1)+end;
if(index!=-1)
indexes.add(index);
else
break;
end =index + str1.length()-1;
}
return indexes;
}
But this approach uses indexOf() which internally has O(N) time complexity. Can KMP algorithm work here?
答案1
得分: 1
我能够使以下算法渲染出来。
纠正我如果我错了; 我相信这是O(n)复杂度的一个例子。
至于KMP算法,我不确定。
List<Integer> firstMatchingIndexes(String stringA, String stringB) {
List<Integer> indices = new ArrayList<>();
boolean checking = false;
int indexA = 0, indexB = 0;
for (char character : stringB.toCharArray()) {
if (checking) {
if (character == stringA.charAt(indexA))
if (indexA != stringA.length() - 1)
indexA++;
else {
indices.add(indexB - (stringA.length() - 1));
checking = false;
indexA = 0;
}
else {
checking = false;
indexA = 0;
}
} else if (character == stringA.charAt(indexA)) {
checking = true;
indexA++;
}
indexB++;
}
return indices;
}
输出
[0, 5, 12]
英文:
I was able to get the following algorithm to render.
Correct me if I'm wrong; I believe this is an example of O(n) complexity.
As for a KMP algorithm, I'm unsure.
List<Integer> firstMatchingIndexes(String stringA, String stringB) {
List<Integer> indices = new ArrayList<>();
boolean checking = false;
int indexA = 0, indexB = 0;
for (char character : stringB.toCharArray()) {
if (checking) {
if (character == stringA.charAt(indexA))
if (indexA != stringA.length() - 1)
indexA++;
else {
indices.add(indexB - (stringA.length() - 1));
checking = false;
indexA = 0;
}
else {
checking = false;
indexA = 0;
}
} else if (character == stringA.charAt(indexA)) {
checking = true;
indexA++;
}
indexB++;
}
return indices;
}
Output
[0, 5, 12]
答案2
得分: 1
是的,Knuth-Morris-Pratt算法在这里可以派上用场。
关于Knuth-Morris-Pratt算法的维基百科文章提供了该算法的伪代码。以下是将该伪代码转换为Java的版本:
static int[] kmpTable(String pattern) {
int n = pattern.length();
int[] partialMatchTable = new int[n+1];
int j = 0;
partialMatchTable[0] = -1;
for (int i = 1; i < n; i++, j++) {
if (pattern.charAt(i) == pattern.charAt(j)) {
partialMatchTable[i] = partialMatchTable[j];
} else {
partialMatchTable[i] = j;
while (j >= 0 && pattern.charAt(i) != pattern.charAt(j)) {
j = partialMatchTable[j];
}
}
}
partialMatchTable[n] = j;
return partialMatchTable;
}
static List<Integer> kmpSearch(String needle, String haystack) {
List<Integer> matches = new ArrayList<>();
int m = haystack.length();
int n = needle.length();
if (n > m) { // 为了避免O(m)运行时添加了这个条件
return matches; // 当needle太大时返回空列表
}
int[] partialMatchTable = kmpTable(needle);
int j = 0, k = 0;
while (j < m) {
if (needle.charAt(k) == haystack.charAt(j)) {
j++;
k++;
if (k == n) {
matches.add(j - k);
k = partialMatchTable[k];
}
} else {
k = partialMatchTable[k];
if (k < 0) {
j++;
k++;
}
}
}
return matches;
}
public static void main(String args[])
{
System.out.println("matches: " + kmpSearch("abc", "abckdabcgfacabc"));
}
这将输出:
matches: [0, 5, 12]
维基百科对于时间复杂度的说明如下:
Knuth-Morris-Pratt算法的复杂度为O(𝑛+𝑚),其中𝑛是模式的长度。
当考虑到构建长度为𝑚的模式的部分匹配表所需的时间时:
由于算法的两部分分别具有O(𝑚)和O(𝑛)的复杂度,整体算法的复杂度为O(𝑚+𝑛)。
然而,当𝑛 ≤ 𝑚 时,我们可以说它是O(𝑚)。当使用比要搜索的字符串更长的模式进行搜索时,该算法可以跳过构建此搜索字符串的部分匹配表,并以空列表退出(参见维基百科伪代码中不存在的已注释代码)。因此,它总是O(𝑛)。
英文:
Yes, the Knuth–Morris–Pratt algorithm can be of use here.
The Wikipedia article on the Knuth–Morris–Pratt algorithm provides pseudocode for the algorithm. Here is that pseudocode ported to Java:
static int[] kmpTable(String pattern) {
int n = pattern.length();
int[] partialMatchTable = new int[n+1];
int j = 0;
partialMatchTable[0] = -1;
for (int i = 1; i < n; i++, j++) {
if (pattern.charAt(i) == pattern.charAt(j)) {
partialMatchTable[i] = partialMatchTable[j];
} else {
partialMatchTable[i] = j;
while (j >= 0 && pattern.charAt(i) != pattern.charAt(j)) {
j = partialMatchTable[j];
}
}
}
partialMatchTable[n] = j;
return partialMatchTable;
}
static List<Integer> kmpSearch(String needle, String haystack) {
List<Integer> matches = new ArrayList<>();
int m = haystack.length();
int n = needle.length();
if (n > m) { // Added this to avoid O(m) runtime
return matches; // Return empty list when needle is too large
}
int[] partialMatchTable = kmpTable(needle);
int j = 0, k = 0;
while (j < m) {
if (needle.charAt(k) == haystack.charAt(j)) {
j++;
k++;
if (k == n) {
matches.add(j - k);
k = partialMatchTable[k];
}
} else {
k = partialMatchTable[k];
if (k < 0) {
j++;
k++;
}
}
}
return matches;
}
public static void main(String args[])
{
System.out.println("matches: " + kmpSearch("abc", "abckdabcgfacabc"));
}
This outputs:
matches: [0, 5, 12]
Wikipedia says about the time complexity:
> the Knuth–Morris–Pratt algorithm has complexity O(𝑛), where 𝑛 is the length of 𝑆.
And when taking into account the time needed for building the partial-match table for a pattern of length 𝑘:
> Since the two portions of the algorithm have, respectively, complexities of O(𝑘) and O(𝑛), the complexity of the overall algorithm is O(𝑛 + 𝑘).
We can however say it is O(𝑛) when 𝑘 ≤ 𝑛. When searching with a pattern that is longer than the string to search in, the algorithm could just skip building the partial match table for such a search string and exit with an empty list (See commented code that is not present in Wikipedia's pseudocode). That way it is always O(𝑛).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论