在O(N)时间复杂度中找到一个字符串在另一个字符串中出现的起始索引列表。

huangapple go评论58阅读模式
英文:

Find the list of starting indexes of occurring of a string in another string in O(N) time complexity

问题

给定两个字符串:str1 和 str2。找到 str1 在 str2 中的所有起始索引。示例:
I/p = str1: abc, str2: abckdabcgfacabc
O/p = [0, 5, 12]

public static List<Integer> firstMatchingIndexes(String str1, String str2) {
    List<Integer> indexes = new ArrayList<>();
    int end = 0, n = str2.length();
    
    for (; end < n; end++) {
        int index = str2.substring(end, n).indexOf(str1) + end;
        if (index != -1)
            indexes.add(index);
        else
            break;
        end = index + str1.length() - 1;
    }
    
    return indexes;
}

但这种方法使用了内部具有O(N)时间复杂度的indexOf()。KMP算法能在这里工作吗?

英文:

Given are two strings : str1 and str2. Find all the starting indexes of str1 in str2. Example:
I/p = str1: abc, str2: abckdabcgfacabc
O/p = [0, 5, 12]

public static List&lt;Integer&gt; firstMatchingIndexes(String str1, String str2) {
	List&lt;Integer&gt; indexes = new ArrayList&lt;&gt;();
	int end = 0, n = str2.length();
	
	for(; end&lt;n; end++) {
		int index = str2.substring(end, n).indexOf(str1)+end;
		if(index!=-1)
			indexes.add(index);
		else
			break;
		end =index + str1.length()-1;
		
	}
	
	return indexes;
}

But this approach uses indexOf() which internally has O(N) time complexity. Can KMP algorithm work here?

答案1

得分: 1

我能够使以下算法渲染出来。

纠正我如果我错了; 我相信这是O(n)复杂度的一个例子。
至于KMP算法,我不确定。

List<Integer> firstMatchingIndexes(String stringA, String stringB) {
    List<Integer> indices = new ArrayList<>();
    boolean checking = false;
    int indexA = 0, indexB = 0;
    for (char character : stringB.toCharArray()) {
        if (checking) {
            if (character == stringA.charAt(indexA))
                if (indexA != stringA.length() - 1)
                    indexA++;
                else {
                    indices.add(indexB - (stringA.length() - 1));
                    checking = false;
                    indexA = 0;
                }
            else {
                checking = false;
                indexA = 0;
            }
        } else if (character == stringA.charAt(indexA)) {
            checking = true;
            indexA++;
        }
        indexB++;
    }
    return indices;
}

输出

[0, 5, 12]
英文:

I was able to get the following algorithm to render.

Correct me if I'm wrong; I believe this is an example of O(n) complexity.
As for a KMP algorithm, I'm unsure.

List&lt;Integer&gt; firstMatchingIndexes(String stringA, String stringB) {
    List&lt;Integer&gt; indices = new ArrayList&lt;&gt;();
    boolean checking = false;
    int indexA = 0, indexB = 0;
    for (char character : stringB.toCharArray()) {
        if (checking) {
            if (character == stringA.charAt(indexA))
                if (indexA != stringA.length() - 1)
                    indexA++;
                else {
                    indices.add(indexB - (stringA.length() - 1));
                    checking = false;
                    indexA = 0;
                }
            else {
                checking = false;
                indexA = 0;
            }
        } else if (character == stringA.charAt(indexA)) {
            checking = true;
            indexA++;
        }
        indexB++;
    }
    return indices;
}

Output

[0, 5, 12]

答案2

得分: 1

是的,Knuth-Morris-Pratt算法在这里可以派上用场。

关于Knuth-Morris-Pratt算法的维基百科文章提供了该算法的伪代码。以下是将该伪代码转换为Java的版本:

    static int[] kmpTable(String pattern) {
        int n = pattern.length();
        int[] partialMatchTable = new int[n+1];
        int j = 0;

        partialMatchTable[0] = -1;

        for (int i = 1; i < n; i++, j++) {
            if (pattern.charAt(i) == pattern.charAt(j)) {
                partialMatchTable[i] = partialMatchTable[j];
            } else {
                partialMatchTable[i] = j;
                while (j >= 0 && pattern.charAt(i) != pattern.charAt(j)) {
                    j = partialMatchTable[j];
                }
            }
        }
        partialMatchTable[n] = j;
        return partialMatchTable;
    }
 
    static List<Integer> kmpSearch(String needle, String haystack) {
        List<Integer> matches = new ArrayList<>();
        int m = haystack.length();
        int n = needle.length();
        if (n > m) { // 为了避免O(m)运行时添加了这个条件
            return matches; // 当needle太大时返回空列表
        }
        int[] partialMatchTable = kmpTable(needle);
        int j = 0, k = 0;

        while (j < m) {
            if (needle.charAt(k) == haystack.charAt(j)) {
                j++;
                k++;
                if (k == n) {
                    matches.add(j - k);
                    k = partialMatchTable[k];
                }
            } else {
                k = partialMatchTable[k];
                if (k < 0) {
                    j++;
                    k++;
                }
            }
        }
        return matches;
    }

    public static void main(String args[])
    {
        System.out.println("matches: " + kmpSearch("abc", "abckdabcgfacabc"));
    }

这将输出:

matches: [0, 5, 12]

维基百科对于时间复杂度的说明如下:

Knuth-Morris-Pratt算法的复杂度为O(𝑛+𝑚),其中𝑛是模式的长度。

当考虑到构建长度为𝑚的模式的部分匹配表所需的时间时:

由于算法的两部分分别具有O(𝑚)和O(𝑛)的复杂度,整体算法的复杂度为O(𝑚+𝑛)。

然而,当𝑛 ≤ 𝑚 时,我们可以说它是O(𝑚)。当使用比要搜索的字符串更长的模式进行搜索时,该算法可以跳过构建此搜索字符串的部分匹配表,并以空列表退出(参见维基百科伪代码中不存在的已注释代码)。因此,它总是O(𝑛)。

英文:

Yes, the Knuth–Morris–Pratt algorithm can be of use here.

The Wikipedia article on the Knuth–Morris–Pratt algorithm provides pseudocode for the algorithm. Here is that pseudocode ported to Java:

    static int[] kmpTable(String pattern) {
int n = pattern.length();
int[] partialMatchTable = new int[n+1];
int j = 0;
partialMatchTable[0] = -1;
for (int i = 1; i &lt; n; i++, j++) {
if (pattern.charAt(i) == pattern.charAt(j)) {
partialMatchTable[i] = partialMatchTable[j];
} else {
partialMatchTable[i] = j;
while (j &gt;= 0 &amp;&amp; pattern.charAt(i) != pattern.charAt(j)) {
j = partialMatchTable[j];
}
}
}
partialMatchTable[n] = j;
return partialMatchTable;
}
static List&lt;Integer&gt; kmpSearch(String needle, String haystack) {
List&lt;Integer&gt; matches = new ArrayList&lt;&gt;();
int m = haystack.length();
int n = needle.length();
if (n &gt; m) { // Added this to avoid O(m) runtime
return matches; // Return empty list when needle is too large
}
int[] partialMatchTable = kmpTable(needle);
int j = 0, k = 0;
while (j &lt; m) {
if (needle.charAt(k) == haystack.charAt(j)) {
j++;
k++;
if (k == n) {
matches.add(j - k);
k = partialMatchTable[k];
}
} else {
k = partialMatchTable[k];
if (k &lt; 0) {
j++;
k++;
}
}
}
return matches;
}
public static void main(String args[])
{
System.out.println(&quot;matches: &quot; + kmpSearch(&quot;abc&quot;, &quot;abckdabcgfacabc&quot;));
}

This outputs:

matches: [0, 5, 12]

Wikipedia says about the time complexity:

> the Knuth–Morris–Pratt algorithm has complexity O(𝑛), where 𝑛 is the length of 𝑆.

And when taking into account the time needed for building the partial-match table for a pattern of length 𝑘:

> Since the two portions of the algorithm have, respectively, complexities of O(𝑘) and O(𝑛), the complexity of the overall algorithm is O(𝑛 + 𝑘).

We can however say it is O(𝑛) when 𝑘 ≤ 𝑛. When searching with a pattern that is longer than the string to search in, the algorithm could just skip building the partial match table for such a search string and exit with an empty list (See commented code that is not present in Wikipedia's pseudocode). That way it is always O(𝑛).

huangapple
  • 本文由 发表于 2023年5月22日 08:36:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302458.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定