寻找一组元素中的复杂元素

huangapple go评论73阅读模式
英文:

Find a complex element in a set of elements

问题

我有一个函数,可以在一个集合中找到一个不完整元素与至少一个元素匹配的情况。一个不完整的元素示例是 22.2.X.13,其中有一个项目(用X表示),可以是任何值。

这个函数的目标是在一组元素中找到至少一个元素,该元素在第一个位置有22,在第二个位置有2,并且在第四个位置有13。

例如,如果我们考虑以下集合:

{
    20.8.31.13,
    32.3.29.13, 
    24.2.12.13, 
    19.2.37.13, 
    22.2.22.13, 
    27.17.22.13, 
    26.22.32.13, 
    22.3.22.13, 
    20.19.12.13, 
    17.4.37.13, 
    31.8.34.13
} 

函数的输出返回 True,因为存在元素 22.2.22.13 对应于 22.2.X.13

我的函数将每对元素作为字符串进行比较,将元素的每个项目作为整数比较:

public boolean containsElement(String element) {
    StringTokenizer strow = null, st = null;
    boolean check = true;
    String nextrow = "", next = "";
    
    for(String row : setOfElements) {
        strow = new StringTokenizer(row, ".");
        st = new StringTokenizer(element, ".");
        
        check = true;
        while(st.hasMoreTokens()) {
            next = st.nextToken();
            if(!strow.hasMoreTokens()) {
                break;
            }
            nextrow = strow.nextToken();
            if(next.compareTo("X") != 0) {
                int x = Integer.parseInt(next);
                int y = Integer.parseInt(nextrow);
                if(x != y) {
                    check = false;
                    break;
                }
            }
        }
        if(check) return true;
    }
    return false;
}

然而,这是一种昂贵的操作,特别是如果字符串的大小增加。你能否为我提供另一种策略或数据结构,以便快速执行这个操作?

我的解决方案与字符串密切相关。然而,我们可以考虑其他类型的元素(例如数组,列表,树节点等)。

感谢大家的回答。我几乎尝试了所有的函数和测试结果:

myFunction: 0ms
hasMatch: 2ms
Stream API: 5ms
isIPMatch: 2ms

我认为正则表达式的主要问题在于创建模式和匹配字符串所需的时间。

英文:

I have a function that allows me to find a match between an incomplete element and at least one element in a set. An example of an incomplete element is 22.2.X.13, in which there is an item (defined with X) that could assume any value.

The goal of this function is to find at least one element in a set of elements that has 22 in the first position, 2 on the second, and 13 on the fourth.

For example, if we consider the set:

{
	20.8.31.13,
	32.3.29.13, 
	24.2.12.13, 
	19.2.37.13, 
	22.2.22.13, 
	27.17.22.13, 
	26.22.32.13, 
	22.3.22.13, 
	20.19.12.13, 
	17.4.37.13, 
	31.8.34.13
} 

The output of the function return True since there are elements 22.2.22.13 which correspond to 22.2.X.13.

My function compares each pair of elements like strings and each item of the elements as an integer:

public boolean containsElement(String element) {
	StringTokenizer strow = null, st = null;
	boolean check = true;
	String nextrow = "", next = "";
	
	for(String row : setOfElements) {
		strow = new StringTokenizer(row, ".");
		st = new StringTokenizer(element, ".");
		
		check = true;
		while(st.hasMoreTokens()) {
			next = st.nextToken();
			if(!strow.hasMoreTokens()) {
				break;
			}
			nextrow = strow.nextToken();
			if(next.compareTo("X") != 0) {
				int x = Integer.parseInt(next);
				int y = Integer.parseInt(nextrow);
				if(x != y) {
					check = false;
					break;
				}
			}
		}
		if(check) return true;
	}
	return false;

However, it is an expensive operation, particularly if the size of the string increases. Can you suggest to me another strategy or data structure to quickly perform this operation?

My solution is closely related to strings. However, we can consider other types for elements (e.g. array, list, tree node, etc)

Thanks to all for your answers. I have tried almost all the functions, and the bench:

myFunction: 0ms
hasMatch: 2ms
Stream API: 5ms
isIPMatch; 2ms

I think that the main problem of the regular expression is the time to create the pattern and match the strings.

答案1

得分: 2

你想要使用正则表达式,这正是用于此类任务的工具。查看演示

22\.2\.\d+\.13

Java 8 及更高版本

你可以使用 Java 8 的 Stream API 通过 PatternMatcher 类来查找至少一个与正则表达式匹配的内容:

Set<String> set = ... // 字符串集合(可以是任何集合类型)

Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13"); // 编译后的 Pattern
boolean matches = set.stream()                           // Stream<String>
                     .map(pattern::matcher)              // Stream<Matcher>
                     .anyMatch(Matcher::matches);        // 如果至少有一个匹配则为 true

Java 7 及更低版本

与 Stream API 的方法类似:使用短路的 for-each 循环,如果找到匹配项,则使用 break 语句终止循环。

boolean matches = false;
        
Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13");
for (String str: set) {
    Matcher matcher = pattern.matcher(str);
    if (matcher.matches()) {
        matches = true;
        break;
    }
}
英文:

You want to use Regex which is made exactly for tasks like this. Check out the demo.

22\.2\.\d+\.13

Java 8 and higher

You can use Stream API as of Java 8 to find at least one matching the Regex using Pattern and Matcher classes:

Set&lt;String&gt; set = ... // the set of Strings (can be any collection)

Pattern pattern = Pattern.compile(&quot;22\\.2\\.\\d+\\.13&quot;); // compiled Pattern
boolean matches = set.stream()                           // Stream&lt;String&gt;
                     .map(pattern::matcher)              // Stream&lt;Matcher&gt;
                     .anyMatch(Matcher::matches);        // true if at least one matches

Java 7 and lower

The way is equal to Stream API: a short-circuit for-each loop with a break statement in case the match is found.

boolean matches = false;
        
Pattern pattern = Pattern.compile(&quot;22\\.2\\.\\d+\\.13&quot;);
for (String str: set) {
    Matcher matcher = pattern.matcher(str);
    if (matcher.matches()) {
        matches = true;
        break;
    }
}

答案2

得分: 2

你可以通过使用基于正则表达式的方法来解决这个问题,正如Nikolas Charalambidis提出的那样,或者你可以采取不同的方法。为了避免与另一个答案重复,我将专注于另一种替代方法,使用split方法。

public boolean isIPMatch(String pattern[], String input[]) {
    if ((pattern == null) || (input == null) || (pattern.length != input.length)) return false; //边界情况
    for (int index = 0; index < pattern.length; index++) {
        if ((!pattern[index].equals("X")) && (!pattern[index].equals(input[index]))) return false; //不同之处
    }
    return true; //所有内容匹配
}

你可以在循环中调用上述方法,在将要比较的项转换为String数组后通过split方法进行比较。

英文:

You can solve this by approaching the problem in a regex-based manner, as suggested by Nikolas Charalambidis (+1), or you can do it differently. To avoid being redundant with another answer, I will focus on an alternative approach here, using the split method.

public boolean isIPMatch(String pattern[], String input[]) {
    if ((pattern == null) || (input == null) || (pattern.length &lt;&gt; input.length)) return false; //edge cases
    for (int index = 0; index &lt; pattern.length; index++) {
        if ((!pattern[index].equals(&quot;X&quot;)) &amp;&amp; (!pattern[index].equals(input[index]))) return false; //difference
    }
    return true; //everything matched
}

And you can call the method above in your loop, after converting the items to compare to String arrays via split.

答案3

得分: 1

对于字符串来说,正则表达式可以更好地解决这个任务:

private boolean hasMatch(String[] haystack, String partial) {
    String patternString = partial.replace("X", "[0-9]+").replace(".", "\\.");
    // "22.2.X.13" 变为 "22\\.2\\.[0-9]+\\.13" 
    Pattern p = Pattern.compile(patternString);
    for (String s : haystack) {
        if (p.matcher(s).matches()) return true;
    }
    return false;
}

对于其他类型的对象,取决于它们的结构。

  • 如果存在某种顺序,您可以考虑让您的元素实现Comparable接口,然后将它们放入TreeSet(或作为TreeMap中的键),这将始终保持排序。这样,您只需与可能匹配的元素进行比较:mySortedSet.subSet(fromElement, toElement)仅返回位于这两者之间的元素。
  • 如果没有顺序,您将不得不将所有元素与您的“模式”进行比较。

请注意,字符串可比较的,但它们的默认排序顺序忽略了您的 . 分隔符的特殊语义。因此,通过一些小心处理,您可以实现基于 treeset 的方法来使搜索优于线性搜索。

英文:

For strings, regular expressions solve the task a lot better:

private boolean hasMatch(String[] haystack, String partial) {
    String patternString = partial.replace(&quot;X&quot;, &quot;[0-9]+&quot;).replace(&quot;.&quot;, &quot;\\.&quot;);
    // &quot;22.2.X.13&quot; becomes &quot;22\\.2\\.[0-9]+\\.13&quot; 
    Pattern p = Pattern.compile(patternString);
    for (String s : haystack) {
        if (p.matcher(s).matches()) return true;
    }
    return false;
}

For other types of objects, it depends on their structure.

  • If there is some kind of order, you could consider making your elements implement Comparable - and then you can place them into a TreeSet (or as keys in a TreeMap), which will always be kept sorted. This way, you can compare only against the elements that can match: mySortedSet.subSet(fromElement, toElement) returns only the elements between those two.
  • If there is no order, you will simply have to compare all elements against your "pattern".

Note that strings are comparable, but their default sorting order ignores the special semantics of your .-separators. So, with some care you can implement a treeset-based approach to make the search better-than-linear.

答案4

得分: 1

其他回答已经讨论了使用正则表达式,将例如 22.2.X.13 转换为 22\.2\.\d+\.13(不要忘记转义 .,否则它们表示任意字符)。但是,虽然这样做肯定会更简单,而且可能也会快上许多,但它并没有降低总体复杂性。您仍然需要检查集合中的每个元素。

相反,您可以尝试将IP集合转换为嵌套的 Map,形式如下:

{20: {8: {31: {13: null}}, 19: {12: {13: null}}}, 22: {2: {...}, 3: {...}}, ...}

(当然,您应该只创建这个结构一次,而不是每次搜索查询都创建。)

然后,您可以编写一个递归函数 match,大致如下(伪代码):

boolean match(ip: String, map: Map<String, Map<...>>) {
    if (ip.empty) return true // 完成
    first, rest = ip.splitfirst
    if (first == "X") {
        return map.values().any(submap -> match(rest, submap))
    } else {
        return first in map && match(rest, map[first])
    }
}

这应该将复杂性从 O(n) 降低到 O(log n);分支得越频繁,复杂性就会降低得越多,但对于 X.X.X.123,复杂性最多为 O(n)(X.X.X.X 再次变得简单)。对于小型集合,正则表达式可能仍然更快,因为它的开销较小,但对于较大的集合,这应该更快。

英文:

Other answers have already discussed using a regular expression by converting e.g. 22.2.X.13 to 22\.2\.\d+\.13 (don't forget to also escape the . or they mean "anything"). But while this will definitely be simpler and probably also a good bit faster, it does not lower the overall complexity. You still have to check each element in the set.

Instead, you might try to convert your set of IPs to a nested Map in this form:

{20: {8: {31: {13: null}}, 19: {12: {13: null}}}, 22: {2: {...}, 3: {...}}, ...}

(Of course, you should create this structure just once, and not for each search query.)

You can then write a recursive function match that works roughly as follows (pseudocode):

boolean match(ip: String, map: Map&lt;String, Map&lt;...&gt;&gt;) {
    if (ip.empty) return true // done
    first, rest = ip.splitfirst
    if (first == &quot;X&quot;) {
        return map.values().any(submap -&gt; match(rest, submap))
    } else {
        return first in map &amp;&amp; match(rest, map[first])
    }
}

This should reduce the complexity from O(n) to O(log n); more than that the more often you have to branch out, but at most O(n) for X.X.X.123 (X.X.X.X is trivial again). For small sets, a regular expression might still be faster, as it has less overhead, but for larger sets, this should be faster.

huangapple
  • 本文由 发表于 2020年10月21日 16:16:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/64459450.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定