基于元素的最长公共前缀

huangapple go评论78阅读模式
英文:

Longest common prefix based on elements

问题

我有一个字符串元素的数组:

["000", "1110", "01", "001", "110", "11"]

对于数组中的一个元素,

  • 我想找到具有最长公共前缀的前一个元素索引。

  • 如果有多个匹配的元素,则选择最近的元素索引。

  • 如果没有找到,则选择前一个索引。

示例:

["000", "1110", "01", "001", "110", "11"]

输出:
[0, 1, 1, 1, 2, 5]

a) "000" - 输出为 0,因为我们没有任何先前的元素。
b) "1110" - 输出为 1,没有具有最长前缀的前一个元素,因此选择前一个索引。
c) "01" - 输出为 1,"000" 具有最长前缀,因此它的索引为 1。
d) "001" - 输出为 1,"000" 具有最长前缀,因此它的索引为 1。
e) "110" - 输出为 2,"1110" 具有最长前缀,因此它的索引为 2。
f) "11" - 输出为 5,"110" 是具有最长前缀的最近元素,因此它的索引为 5。

我无法理解需要采取哪种方法来完成此任务。你能帮助我吗?

英文:

I have an array of string elements in an array:

["000", "1110", "01", "001", "110", "11"]

For an element in the array,

  • I want to find the previous element index with longest common prefix.

  • If I have multiple matching elements then select the nearest element
    index.

  • If none found then just select previous index.

Example:

["000", "1110", "01", "001", "110", "11"]

Output:
[0,1,1,1,2,5]

a) "000" - output is 0, because we do not have any previous elements.
b) "1110" - output is 1, no previous element with longest prefix so select previous index.
c) "01" - output is 1,"000" has longest prefix, so its index is 1.
d) "001" - output is 1, "000" has longest prefix, so its index is 1.
e) "110" - output is 2,  "1110" has longest prefix, so its index 2.
f) "11" - output 5, "110" is most nearest element with longest prefix so its index 5.

I am not able to understand what approach I need to take for this task. Can you please help me.

答案1

得分: 1

Naive solution based on the former question

commonPrefix is supposed to (according to the comment) the longest prefix in the array up to index n. What does that mean? You need to calculate all prefixes and pick the longest.

static String commonPrefix(String arr[], int n) {
	String longestPrefix = "";
	for (int i = 0; i < n; i++) {
		final String currentPrefix = commonPrefixUtil(arr[i], arr[n]);
		if (currentPrefix.length() > longestPrefix.length()) {
			longestPrefix = currentPrefix;
		}
	}
	return longestPrefix;
}

So that would yield "00" for arr = ["000", "1110", "01", "001", "110", "11"]; n = 3.

Now that we have the longest prefix, what? We need to find the closest index to n that starts with that prefix...

static int closestIndex(String[] arr, String longestPrefix, int n) {
	for (int i = n - 1; i >= 0; i--) {
		if (arr[i].startsWith(longestPrefix)) {
			return i + 1; // + 1 because the solution wants starting index with 1
		}
	}
	return 0;
}

How to put it together? Just call the two methods for each input

public static void main(String[] args) {
	String[] words = {"000", "1110", "01", "001", "110", "11"};
	int[] output = new int[words.length];

	for (int i = 0; i < words.length; i++) {
		final String longestPrefix = commonPrefix(words, i);
		output[i] = closestIndex(words, longestPrefix, i);
	}

	System.out.println(Arrays.toString(output));
}

You have deleted your commonPrefixUtil implementation that worked from the question, so I add my own:

static String commonPrefixUtil(String str1, String str2) {
	int shorterStringLength = Math.min(str1.length(), str2.length());
	int length = 0;
	for (; length < shorterStringLength; length++) {
		if (str1.charAt(length) != str2.charAt(length)) {
			break;
		}
	}
	return str1.substring(0, length);
}

Optimalized solution

I created a new solution using dynamic programming with tabulation (if I understand correctly), that is I use a hashmap of already all prefixes pointing to the indexes of the words where the prefix is from. The value of the Map is a sorted tree, so it can be really easily determined which word with the common prefix is closest to the current index. The HashMap ensures constant-time operations and the TreeSet guarantees log(n) time cost.

Explained more simply, I process the first letter of all words, and then the next etc. During this process I memorise where all prefix substrings are, while they are automatically sorted. I stop the loop after processing the last letter of the longest word.

public static void main(String[] args) {
	String[] words = {"000", "1110", "01", "001", "110", "11"};

	int[] result = new int[words.length];
	final int firstWordLength = words.length > 0 ? words[0].length() : 8;
	// prefix -> [indexes of prefix occurrence]
	HashMap<String, TreeSet<Integer>> prefixes = new HashMap<>(words.length * (firstWordLength + 1) * 2);
	int wordLength = 1;
	boolean isUpdatedResult;
	do { // O(k)
		isUpdatedResult = false;
		for (int wordIdx = 0; wordIdx < words.length; wordIdx++) { // O(n)
			if (words[wordIdx].length() < wordLength) {
				continue;
			}
			final String currentPrefix = words[wordIdx].substring(0, wordLength); // Java >= 7 update 6 ? O(k) : O(1)
			final TreeSet<Integer> prefixIndexes = prefixes.get(currentPrefix); // O(1)
			if (prefixIndexes != null) {
				// floor instead of lower, because we have put `wordIdx + 1` inside
				final Integer closestPrefixIndex = prefixIndexes.floor(wordIdx); // O(log n)
				if (closestPrefixIndex != null) {
					result[wordIdx] = closestPrefixIndex;
					isUpdatedResult = true;
				}
			}
			// take the previous index for the result if no match
			if (result[wordIdx] == 0) {
				result[wordIdx] = wordIdx;
			}
			final TreeSet<Integer> newPrefixIndexes = prefixes.computeIfAbsent(currentPrefix, p -> new TreeSet<>()); // O(1)
			// the result index must be + 1
			newPrefixIndexes.add(wordIdx + 1); // O(log n)
		}
		wordLength++;
	} while (isUpdatedResult);

	System.out.println(Arrays.toString(result));
}

I have marked all the operations with their big O time complexity. n is the number of words in the input array, i.e. words.length and k is the length of the longest word. According to Jon Skeet's post the time complexity of substring has changed in Java 7 update 6 to linear.

So we can calculate:

O(k) * O(n) * (O(log n) + O(k))

Hopefully, the code is understandable and I calculated the complexity correctly.

英文:

Naive solution based on the former question

commonPrefix is supposed to (according to the comment) the longest prefix in the array up to index n. What does that mean? You need to calculate all prefixes and pick the longest.

static String commonPrefix(String arr[], int n) {
	String longestPrefix = &quot;&quot;;
	for (int i = 0; i &lt; n; i++) {
		final String currentPrefix = commonPrefixUtil(arr[i], arr[n]);
		if (currentPrefix.length() &gt; longestPrefix.length()) {
			longestPrefix = currentPrefix;
		}
	}
	return longestPrefix;
}

So that would yield &quot;00&quot; for arr = [&quot;000&quot;, &quot;1110&quot;, &quot;01&quot;, &quot;001&quot;, &quot;110&quot;, &quot;11&quot;]; n = 3.

Now that we have the longest prefix, what? We need to find the closest index to n that starts with that prefix...

static int closestIndex(String[] arr, String longestPrefix, int n) {
	for (int i = n - 1; i &gt;= 0; i--) {
		if (arr[i].startsWith(longestPrefix)) {
			return i + 1; // + 1 because the solution wants starting index with 1
		}
	}
	return 0;
}

How to put it together? Just call the two methods for each input

public static void main(String[] args) {
	String[] words = { &quot;000&quot;, &quot;1110&quot;, &quot;01&quot;, &quot;001&quot;, &quot;110&quot;, &quot;11&quot; };
	int[] output = new int[words.length];

	for (int i = 0; i &lt; words.length; i++) {
		final String longestPrefix = commonPrefix(words, i);
		output[i] = closestIndex(words, longestPrefix, i);
	}

	System.out.println(Arrays.toString(output));
}

You have deleted your commonPrefixUtil implementation that worked from the question, so I add my own:

static String commonPrefixUtil(String str1, String str2) {
	int shorterStringLength = Math.min(str1.length(), str2.length());
	int length = 0;
	for (; length &lt; shorterStringLength; length++) {
		if (str1.charAt(length) != str2.charAt(length)) {
			break;
		}
	}
	return str1.substring(0, length);
}

Optimalized solution

I created a new solution using dynamic programming with tabulation (if I understand correctly), that is I use a hashmap of already all prefixes pointing to the indexes of the words where the prefix is from. The value of the Map is a sorted tree, so it can be really easily determined which word with the common prefix is closest to the current index. The HashMap ensures constant-time operations and the TreeSet guarantees log(n) time cost.

Explained more simply, I process the first letter of all words, and then the next etc. During this process I memorise where all prefix substrings are, while they are automatically sorted. I stop the loop after processing the last letter of the longest word.

public static void main(String[] args) {
	String[] words = { &quot;000&quot;, &quot;1110&quot;, &quot;01&quot;, &quot;001&quot;, &quot;110&quot;, &quot;11&quot; };

	int[] result = new int[words.length];
	final int firstWordLength = words.length &gt; 0 ? words[0].length() : 8;
	// prefix -&gt; [indexes of prefix occurrence]
	HashMap&lt;String, TreeSet&lt;Integer&gt;&gt; prefixes = new HashMap&lt;&gt;(words.length * (firstWordLength + 1) * 2);
	int wordLength = 1;
	boolean isUpdatedResult;
	do { // O(k)
		isUpdatedResult = false;
		for (int wordIdx = 0; wordIdx &lt; words.length; wordIdx++) { // O(n)
			if (words[wordIdx].length() &lt; wordLength) {
				continue;
			}
			final String currentPrefix = words[wordIdx].substring(0, wordLength); // Java &gt;= 7 update 6 ? O(k) : O(1)
			final TreeSet&lt;Integer&gt; prefixIndexes = prefixes.get(currentPrefix); // O(1)
			if (prefixIndexes != null) {
				// floor instead of lower, because we have put `wordIdx + 1` inside
				final Integer closestPrefixIndex = prefixIndexes.floor(wordIdx); // O(log n)
				if (closestPrefixIndex != null) {
					result[wordIdx] = closestPrefixIndex;
					isUpdatedResult = true;
				}
			}
			// take the previous index for the result if no match
			if (result[wordIdx] == 0) {
				result[wordIdx] = wordIdx;
			}
			final TreeSet&lt;Integer&gt; newPrefixIndexes = prefixes.computeIfAbsent(currentPrefix, p -&gt; new TreeSet&lt;&gt;()); // O(1)
			// the result index must be + 1
			newPrefixIndexes.add(wordIdx + 1); // O(log n)
		}
		wordLength++;
	} while (isUpdatedResult);

	System.out.println(Arrays.toString(result));
}

I have marked all the operations with their big O time complexity. n is the number of words in the input array, i.e. words.length and k is the length of the longest word. According to Jon Skeet's post the time complexity of substring has changed in Java 7 update 6 to linear.

So we can calculate:

O(k) * O(n) * (O(log n) + O(k))

Hopefully, the code is understandable and I calculated the complexity correctly.

答案2

得分: 1

使用字典树(Trie)可以在输入字符总数的线性时间内相对容易地找到最长公共前缀。以下是Python代码(抱歉):

import collections

class Trie:
    def __init__(self):
        self._children = collections.defaultdict(Trie)
        self._previous_index = 0

    # 找到单词中在字典树中出现的最长前缀,返回该节点上的_previous_index值。
    def previous_index(self, word):
        node = self
        for letter in word:
            child = node._children.get(letter)
            if child is None:
                break
            node = child
        return node._previous_index

    # 确保单词的每个前缀都存在于字典树中。
    # 对应于前缀的每个节点,将其_previous_index设置为索引。
    def insert(self, index, word):
        self._previous_index = index
        node = self
        for letter in word:
            node = node._children[letter]
            node._previous_index = index

def longest_common_prefix_indexes(words):
    trie = Trie()
    for index_minus_one, word in enumerate(words):
        print(trie.previous_index(word))
        trie.insert(index_minus_one + 1, word)

longest_common_prefix_indexes(["000", "1110", "01", "001", "110", "11"])
英文:

Using a trie makes it pretty easy to find the longest common prefix so far in time linear in the total number of characters in the input. In Python (sorry):

import collections
class Trie:
def __init__(self):
self._children = collections.defaultdict(Trie)
self._previous_index = 0
# Find the longest prefix of word that appears in the trie,
# return the value of _previous_index at that node.
def previous_index(self, word):
node = self
for letter in word:
child = node._children.get(letter)
if child is None:
break
node = child
return node._previous_index
# Ensure that each of the prefixes of word exists in the trie.
# At each node corresponding to a prefix, set its _previous_index to index.
def insert(self, index, word):
self._previous_index = index
node = self
for letter in word:
node = node._children[letter]
node._previous_index = index
def longest_common_prefix_indexes(words):
trie = Trie()
for index_minus_one, word in enumerate(words):
print(trie.previous_index(word))
trie.insert(index_minus_one + 1, word)
longest_common_prefix_indexes([&quot;000&quot;, &quot;1110&quot;, &quot;01&quot;, &quot;001&quot;, &quot;110&quot;, &quot;11&quot;])

答案3

得分: 0

以下是已翻译的内容:

我认为这对你可能有用

这是一个返回两个字符串最长公共前缀长度的函数:

int commonPrefixLength(String s, String t) {
    if (s.length() == 0 || t.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;
    int shorter = Math.min(s.length(), t.length());

    for (int i = 0; i < shorter; i++) {
        if (s.charAt(i) != t.charAt(i)) {
            break;
        }
        commonPrefixLength++;
    }

    return commonPrefixLength;
}

对于数组中的每个字符串,您可以检查前面的字符串并选择具有最长公共前缀的字符串:

// 从1开始索引
String[] strings = new String[] {"", "000", "1110", "01", "001", "110", "11"};

for (int i = 1; i < strings.length; i++) {
    int longestCommonPrefix = 0;
    int answer = 0;

    // 对于每个先前的字符串
    for (int j = i - 1; j > 0; j--) {
        int commonPrefix = commonPrefixLength(strings[j], strings[i]);
        if (commonPrefix > longestCommonPrefix) {
            longestCommonPrefix = commonPrefix;
            answer = j;
        }
    }

    // 没有找到公共前缀
    if (answer == 0) {
        answer = i - 1;
    }

    System.out.println(answer);
}
英文:

I think this might work for you

This is funcion that returns the length of the longest common prefix of two Strings:

int commonPrefixLength(String s, String t) {
if (s.length() == 0 || t.length() == 0) {
return 0;
}
int commonPrefixLength = 0;
int shorter = Math.min(s.length(), t.length());
for (int i = 0; i &lt; shorter; i++) {
if (s.charAt(i) != t.charAt(i)) {
break;
}
commonPrefixLength++;
}
return commonPrefixLength;
}

For every string in array you can check previous strings and choose the one with to longest common prefix:

    //indexing from 1
String[] strings = new String[] {&quot;&quot;, &quot;000&quot;, &quot;1110&quot;, &quot;01&quot;, &quot;001&quot;, &quot;110&quot;, &quot;11&quot;};
for (int i = 1; i &lt; strings.length; i++) {
int longestCommonPrefix = 0;
int answer = 0;
//for every previous string
for (int j = i - 1; j &gt; 0; j--) {
int commonPrefix = commonPrefixLength(strings[j], strings[i]);
if (commonPrefix &gt; longestCommonPrefix) {
longestCommonPrefix = commonPrefix;
answer = j;
}
}
//no common prefix found
if (answer == 0) {
answer = i - 1;
}
System.out.println(answer);
}

huangapple
  • 本文由 发表于 2020年8月26日 04:58:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/63587004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定