在一个整数范围的有序集合中高效地查找一个整数。

huangapple go评论108阅读模式
英文:

Efficiently find an integer in a sorted collection of integer ranges

问题

问题

我有一个大型的IP地址范围列表,我想要高效地找到包含给定IP地址的范围。范围之间可能重叠。为了简化并将此问题泛化为Stackoverflow的问题,我将IP地址替换为整数。但基本上,它可以是任何自定义类,对其可以应用范围和范围排序。

问题示例

// 注意:此类具有与equals不一致的自然排序。
class IntRange implements Comparable<IntRange> {
    private int start;
    private int end;

    public IntRange(int start, int end) {
        this.start = start;
        this.end = end;
    }

    public boolean inRange(int i) {
        return i >= start && i <= end;
    }

    @Override
    public int compareTo(IntRange other) {
        if (start < other.start) {
            return -1;
        } else if (start <= other.start && end >= other.end) {
            return 0;
        } else {
            return 1;
        }
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        IntRange intRange = (IntRange) o;
        return start == intRange.start && end == intRange.end;
    }

    @Override
    public int hashCode() {
        return Objects.hash(start, end);
    }
}

class Program {
    private static List<IntRange> findRanges(IntRange[] ranges, int i) {
        // 如何实现这个?
    }

    public static void main(String[] args) {
        IntRange[] ranges = {
                new IntRange(-10, 5),
                new IntRange(8, 11),
                new IntRange(9, 13),
                new IntRange(20, 30),
                new IntRange(800, 1000)
        };

        // 应该包含IntRange(8, 12)和IntRange(9, 13)作为结果
        List<IntRange> matchingRanges = findRanges(ranges, 10); 
    }
}

在上面给定的范围列表中,我想要找到包含给定整数(例如10)的范围。在这种情况下,只有范围[8, 12]会匹配,因此这将是结果。

问题

如果可能的话,我该如何使用Java Collection API解决这个问题?解决方案应该是高效的,所以通过列表的暴力N搜索不够有效。

我还可以手动创建一个二进制搜索树,但我希望能够使用Java Collection API来实现这样的功能,使用比较器和类似TreeSet的东西是否可能?

通常情况下,使用TreeSet时,我会搜索相同类型的元素,例如搜索一个Person对象,其中名字和姓氏必须匹配才能相等。但在这种情况下,我想在IntRange的TreeSet中搜索一个整数,所以equals方法不适用。

使用IP地址而不是整数的示例

为了保持问题的普遍性和简单性,可以为整数提供解决方案而不是IP地址。但如果您想尝试IP地址,可以使用此代码来表示IP地址范围:

class IpRange {
    private byte[] start; // 4 bytes for IPv4, 16 bytes for IPv6
    private byte[] end;

    // 仅供测试目的
    public IpRange(int start, int end) {
        this.start = BigInteger.valueOf(start).toByteArray();
        this.end = BigInteger.valueOf(end).toByteArray();
    }

    public IpRange(byte[] start, byte[] end) {
        this.start = start;
        this.end = end;
    }

    public boolean inRange(byte[] ip) {
        return Arrays.compare(start, ip) <= 0 && Arrays.compare(end, ip) >= 0;
    }

    public static void main(String[] args) {
        // 测试1:测试inRange函数
        IpRange ir = new IpRange(40, 60);
        System.out.println(ir.inRange(BigInteger.valueOf(39).toByteArray())); // false
        System.out.println(ir.inRange(BigInteger.valueOf(50).toByteArray())); // true
        System.out.println(ir.inRange(BigInteger.valueOf(61).toByteArray())); // false

        // 测试2
        // 在生产中,此范围包含数千个条目
        IpRange[] ranges = {
                new IpRange(-10, 5),
                new IpRange(8, 12),
                new IpRange(20, 30),
                new IpRange(800, 1000)
        };

        // 如何高效地检查ip位于哪些范围内?
        int ip = 25;
    }
}
英文:

Problem

I have a large list of IP-address ranges, and I want to efficiently find the ranges to which a given IP-address is in range. Overlap of ranges is possible. For simplicity and generalization of this problem for Stackoverflow, I substitute IP-address with an integer. (But basically, it could be any custom class for which a range and ordering of ranges could apply.)

Problem example

// Note: this class has a natural ordering that is inconsistent with equals.
class IntRange implements Comparable&lt;IntRange&gt; {
private int start;
private int end;
public IntRange(int start, int end) {
this.start = start;
this.end = end;
}
public boolean inRange(int i) {
return i &gt;= start &amp;&amp; i &lt;= end;
}
@Override
public int compareTo(IntRange other) {
if (start &lt; other.start) {
return -1;
} else if (start &lt;= other.start &amp;&amp; end &gt;= other.end) {
return 0;
} else {
return 1;
}
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
IntRange intRange = (IntRange) o;
return start == intRange.start &amp;&amp; end == intRange.end;
}
@Override
public int hashCode() {
return Objects.hash(start, end);
}
}
class Program {
private static List&lt;IntRange&gt; findRanges(IntRange[] ranges, int i) {
// How to implement this?
}
public static void main(String[] args) {
IntRange[] ranges = {
new IntRange(-10, 5),
new IntRange(8, 11),
new IntRange(9, 13),
new IntRange(20, 30),
new IntRange(800, 1000)
};
// Should contain IntRange(8, 12) and IntRange(9, 13) as result
List&lt;IntRange&gt; matchingRanges = findRanges(ranges,10); 
}
}

Given the list of ranges above, I would like to find the ranges which contains a given integer, for example 10. In that case, only the range [8, 12] would match, so that would be the result.

Question

How can I solve this problem with the Java Collection API, if possible?
The solution should be efficient, so a brute force N search through a list is not effective enough.

I could also manually create a binary search tree, but I would expect that something like this should be somehow possible using the Java Collection API using comparators and things like a TreeSet?

Normally, when using a TreeSet, I would search for the same type of element, for example, search for a Person object, where the firstname and lastname have to match to be equal. But in this case, I want to search for an integer in a TreeSet of IntRanges, so the equals method is not suitable.

Example with IP-addresses instead of integers

Solutions can be provided for integers instead of IP-addresses, to keep the question general and simple. But in case you want to try it for IP-addresses, can this code can be used to represent IP-address ranges:

class IpRange {
private byte[] start; // 4 bytes for IPv4, 16 bytes for IPv6
private byte[] end;
// Only for testing purposes
public IpRange(int start, int end) {
this.start = BigInteger.valueOf(start).toByteArray();
this.end = BigInteger.valueOf(end).toByteArray();
}
public IpRange(byte[] start, byte[] end) {
this.start = start;
this.end = end;
}
public boolean inRange(byte[] ip) {
return Arrays.compare(start, ip) &lt;= 0 &amp;&amp; Arrays.compare(end, ip) &gt;= 0;
}
public static void main(String[] args) {
// Test 1: test inRange function
IpRange ir = new IpRange(40, 60);
System.out.println(ir.inRange(BigInteger.valueOf(39).toByteArray())); // false
System.out.println(ir.inRange(BigInteger.valueOf(50).toByteArray())); // true
System.out.println(ir.inRange(BigInteger.valueOf(61).toByteArray())); // false
// Test 2
// In production, this range contains thousands of entries
IpRange[] ranges = {
new IpRange(-10, 5),
new IpRange(8, 12),
new IpRange(20, 30),
new IpRange(800, 1000)
};
// How to efficiently check in which ranges ip is &#39;inRange&#39;?
int ip = 25;
}
}

答案1

得分: 3

如果范围是不相交的(一个范围从未与其他范围重叠或包含),可以使用TreeMap轻松解决此问题。

创建一个将范围的开始与范围的结束关联起来的TreeMap:

var map = new TreeMap<Integer, Integer>();
map.put(-10, 5);
map.put(8, 12);
map.put(20, 30);
map.put(800, 1000);

然后,您可以使用floorEntry方法查找一个数字是否可能位于某个范围内。例如,floorEntry(25)将返回键为20和值为30的映射条目,对应范围20-30。然后,您只需检查您的数字是否小于您找到的范围的结束。

boolean isContainedInRange(int value) {
    Map.Entry<Integer, Integer> entry = map.floorEntry(value);
    return entry != null && value < entry.getValue();
}

对于一般情况,范围可能重叠并且您要查找所有范围的情况,一种解决方法是使用两个TreeMap:一个将范围的开始与范围的结束关联起来,另一个则反之。

var reverseMap = new TreeMap<Integer, Integer>();
reverseMap.put(5, -10);
reverseMap.put(12, 8);
reverseMap.put(13, 9);
reverseMap.put(30, 20);

现在,给定一个值,使用这两个映射,您可以使用map.headMap()找到在值之前开始的范围集合。您还可以使用reverseMap.tailMap()找到在给定值之后结束的范围集合。这两者的交集将给您包含给定值的所有范围。交集是使用Set.retainAll方法计算的。

TreeMap<Integer, Integer> ranges = new TreeMap<>(map.headMap(value, true));
ranges.keySet().retainAll(reverseMap.tailMap(value).values());

但这不是特别高效的方法。要获得高效的解决方案,您需要实现一个自定义数据结构,例如:

英文:

If the ranges are disjoint (one range never overlaps with or contains other ranges), this is easy to solve with TreeMap.

Create a TreeMap which associates the start of the range with the end of the range:

var map = new TreeMap&lt;Integer,Integer&gt;()
map.put(-10, 5)
map.put(8, 12)
map.put(20, 30)
map.put(800, 1000)

Then, you can use the floorEntry method to find if a number is potentially within a range. For example, floorEntry(25) will return the map entry with key 20 and value 30, corresponding to the range 20-30. Then you simply check if your number is less than the end of the range you've found.

boolean isContainedInRange(int value) {
Map.Entry&lt;Integer, Integer&gt; entry = map.floorEntry(value);
return entry != null &amp;&amp; value &lt; entry.getValue());
}

For the general case, where ranges may overlap and you are looking for all ranges, one solution is to have two TreeMaps: one associates the range start with the range end, and the other does the reverse.

var reverseMap = new TreeMap&lt;Integer,Integer&gt;();
reverseMap.put(5, -10);
reverseMap.put(12, 8);
reverseMap.put(13, 9);
reverseMap.put(30, 20);

Now, given a value, with these two maps you can find the set of ranges that start before a value using map.headMap(). You can also find the set of ranges that end after the given value using reverseMap.tailMap(). The set intersection of these two gives you all the ranges that contain the given value. The intersection is computed with the Set.retainAll method.

TreeMap&lt;Integer, Integer&gt; ranges = new TreeMap&lt;&gt;(map.headMap(value, true));
ranges.keySet().retainAll(reverseMap.tailMap(value).values());

This is not particularly efficient though. For an efficient solution, you will need to implement a custom data structure, such as:

答案2

得分: 2

以下是您要翻译的内容:

尝试使用 binarySearch(List<? extends T> list, T key, Comparator<? super T> c) 方法。

通过选择一个合适的 Comparator<IntRange> 类,您可以获得负插入点(即 (-(insertion point) - 1))或 key 的正确索引。 插入点被定义为将 key 插入列表的位置。额外的 inRange() 测试可以检查索引位置是否可用。

package examples;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Objects;

//注意:此类具有与equals不一致的自然排序。
class IntRange {

	private static class IntComparator implements Comparator<IntRange> {

		@Override
		public int compare(IntRange o1, IntRange o2) {
			if (o1.start <= o2.start && o1.end >= o2.end) {
				return 0;
			}
			if (o1.start < o2.start) {
				return -1;
			} else if (o1.start > o2.start) {
				return 1;
			} else if (o1.end > o2.end) {
				return 1;
			}
			return -1;
		}
	}

	private static List<IntRange> findRanges(List<IntRange> ranges, int i) {
		IntRange test = new IntRange(i, i);
		int index = Collections.binarySearch(ranges, test, new IntComparator());
		if (index < 0) {
			index = -(index + 1);
		}
		ArrayList<IntRange> result = new ArrayList<IntRange>();
		for (int j = index - 1; j >= 0; j--) {
			IntRange r = ranges.get(j);
			if (r.inRange(i)) {
				result.add(0, r);
			} else {
				break;
			}
		}
		for (int j = index; j < ranges.size(); j++) {
			IntRange r = ranges.get(j);
			if (r.inRange(i)) {
				result.add(r);
			} else {
				break;
			}
		}
		return result;
	}

	public static void main(String[] args) {
		ArrayList<IntRange> ranges = new ArrayList<IntRange>();
		ranges.add(new IntRange(-10, 5));
		ranges.add(new IntRange(8, 12));
		ranges.add(new IntRange(17, 20));
		ranges.add(new IntRange(20, 30));
		ranges.add(new IntRange(800, 1000));

		// 应包含IntRange(8, 12)作为结果
		List<IntRange> matchingRanges = findRanges(ranges, 10);
		for (int i = 0; i < matchingRanges.size(); i++) {
			System.out.println(matchingRanges.get(i).toString());
		}

		// 应包含IntRange(17, 20)和IntRange(20, 30)作为结果
		matchingRanges = findRanges(ranges, 20);
		for (int i = 0; i < matchingRanges.size(); i++) {
			System.out.println(matchingRanges.get(i).toString());
		}

	}

	private int start;

	private int end;

	public IntRange(int start, int end) {
		this.start = start;
		this.end = end;
	}

	@Override
	public boolean equals(Object o) {
		if (this == o)
			return true;
		if (o == null || getClass() != o.getClass())
			return false;
		IntRange intRange = (IntRange) o;
		return start == intRange.start && end == intRange.end;
	}

	@Override
	public int hashCode() {
		return Objects.hash(start, end);
	}

	public boolean inRange(int i) {
		return i >= start && i <= end;
	}

	@Override
	public String toString() {
		return "IntRange [start=" + start + ", end=" + end + "]";
	}
}
英文:

Try binarySearch(List<? extends T> list, T key, Comparator<? super T> c))

By choosing a suitable Comparator&lt;IntRange&gt; class you can get the negative insertion point (i.e. (-(insertion point) - 1)) or the correct index of the key. The insertion point is defined as the point at which the key would be inserted into the list. An additional inRange() test can check if the key is available at the index position.

package examples;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Objects;
//Note: this class has a natural ordering that is inconsistent with equals.
class IntRange {
private static class IntComparator implements Comparator&lt;IntRange&gt; {
@Override
public int compare(IntRange o1, IntRange o2) {
if (o1.start &lt;= o2.start &amp;&amp; o1.end &gt;= o2.end) {
return 0;
}
if (o1.start &lt; o2.start) {
return -1;
} else if (o1.start &gt; o2.start) {
return 1;
} else if (o1.end &gt; o2.end) {
return 1;
}
return -1;
}
}
private static List&lt;IntRange&gt; findRanges(List&lt;IntRange&gt; ranges, int i) {
IntRange test = new IntRange(i, i);
int index = Collections.binarySearch(ranges, test, new IntComparator());
if (index &lt; 0) {
index = -(index + 1);
}
ArrayList&lt;IntRange&gt; result = new ArrayList&lt;IntRange&gt;();
for (int j = index - 1; j &gt;= 0; j--) {
IntRange r = ranges.get(j);
if (r.inRange(i)) {
result.add(0, r);
} else {
break;
}
}
for (int j = index; j &lt; ranges.size(); j++) {
IntRange r = ranges.get(j);
if (r.inRange(i)) {
result.add(r);
} else {
break;
}
}
return result;
}
public static void main(String[] args) {
ArrayList&lt;IntRange&gt; ranges = new ArrayList&lt;IntRange&gt;();
ranges.add(new IntRange(-10, 5));
ranges.add(new IntRange(8, 12));
ranges.add(new IntRange(17, 20));
ranges.add(new IntRange(20, 30));
ranges.add(new IntRange(800, 1000));
// Should contain IntRange(8, 12) as result
List&lt;IntRange&gt; matchingRanges = findRanges(ranges, 10);
for (int i = 0; i &lt; matchingRanges.size(); i++) {
System.out.println(matchingRanges.get(i).toString());
}
// Should contain IntRange(17, 20) and IntRange(20, 30) as result
matchingRanges = findRanges(ranges, 20);
for (int i = 0; i &lt; matchingRanges.size(); i++) {
System.out.println(matchingRanges.get(i).toString());
}
}
private int start;
private int end;
public IntRange(int start, int end) {
this.start = start;
this.end = end;
}
@Override
public boolean equals(Object o) {
if (this == o)
return true;
if (o == null || getClass() != o.getClass())
return false;
IntRange intRange = (IntRange) o;
return start == intRange.start &amp;&amp; end == intRange.end;
}
@Override
public int hashCode() {
return Objects.hash(start, end);
}
public boolean inRange(int i) {
return i &gt;= start &amp;&amp; i &lt;= end;
}
@Override
public String toString() {
return &quot;IntRange [start=&quot; + start + &quot;, end=&quot; + end + &quot;]&quot;;
}
}

huangapple
  • 本文由 发表于 2020年8月2日 03:39:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/63209353.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定