区间和一组区间之间的区别是什么?

huangapple go评论166阅读模式
英文:

Difference between an interval and a set of intervals?

问题

我有一组不重叠、不相邻的区间,例如 [{10,15}, {30,35}, {20,25}]。它们没有排序,但如果需要的话,我可以对它们进行排序。

现在我有一个新的区间,例如 {5,32},我想生成一个描述差异的新区间集合:这个新区间覆盖的范围中不在原集合中的部分。在这个例子中,答案将是:[{5,9}, {16,19}, {26,29}]。

有没有一种快速的算法来计算这个差异?请注意,这个集合通常只有1个,有时候有2个,很少有3个或更多的项,所以我希望针对这种情况进行优化。

为了提供背景,这是最初从输入流中创建集合的代码,我在进行合并时:

type Interval struct {
    start int
    end   int
}

func (i *Interval) OverlapsOrAdjacent(j Interval) bool {
    return i.end+1 >= j.start && j.end+1 >= i.start
}

func (i *Interval) Merge(j Interval) bool {
    if !i.OverlapsOrAdjacent(j) {
        return false
    }
    if j.start < i.start {
        i.start = j.start
    }
    if j.end > i.end {
        i.end = j.end
    }
    return true
}

type Intervals []Interval

func (ivs Intervals) Len() int           { return len(ivs) }
func (ivs Intervals) Swap(i, j int)      { ivs[i], ivs[j] = ivs[j], ivs[i] }
func (ivs Intervals) Less(i, j int) bool { return ivs[i].start < ivs[j].start }

func (ivs Intervals) Merge(iv Interval) Intervals {
    ivs = append(ivs, iv)
    merged := make(Intervals, 0, len(ivs))
    for _, iv := range ivs {
        for i := 0; i < len(merged); {
            if iv.Merge(merged[i]) {
                merged = append(merged[:i], merged[i+1:]...)
            } else {
                i++
            }
        }
        merged = append(merged, iv)
    }
    return merged
}

func (ivs Intervals) MergeUsingSort(iv Interval) Intervals {
    ivs = append(ivs, iv)
    sort.Sort(ivs)
    merged := make(Intervals, 0, len(ivs))
    merged = append(merged, ivs[0])
    for i := 1; i < len(ivs); i++ {
        last := len(merged) - 1
        if !merged[last].Merge(ivs[i]) {
            merged = append(merged, ivs[i])
        }
    }
    return merged
}

func (ivs Intervals) Difference(iv Interval) Intervals {
    // ???
    return ivs
}

func main() {
    var ivs Intervals
    for _, input := range inputsFromSomewhere { // 实际上,我不是一次性拥有所有这些输入,它们是逐个输入的
        iv := Interval{input.start, input.end}
        diffs := ivs.Difference(iv) // 尚未实现...
        // 对 diffs 做一些操作
        ivs = ivs.Merge(iv)
    }
}

我发现上面的 Intervals.Merge()MergeUsingSort() 快2倍,所以我想知道是否还有一种简单的非排序方法来回答我的问题。

英文:

I have a set of non-overlapping, non-adjacent intervals, eg. [{10,15}, {30,35}, {20,25}]. They are not sorted, but I can sort them if necessary.

Now I am given some new interval, eg. {5,32} and want to generate a new set of intervals describing the difference: the ranges covered by this new interval that aren't in the set. In this example the answer would be: [{5,9}, {16,19}, {26,29}].

What's a fast algorithm for calculating this? Note that the set will typically have 1, sometimes 2, rarely 3 or more items in it, so I want to optimise for this case.

For context, here's the code for initially creating the set from an input stream of start+end data, where I merge as I go:

type Interval struct {
start int
end   int
}
func (i *Interval) OverlapsOrAdjacent(j Interval) bool {
return i.end+1 &gt;= j.start &amp;&amp; j.end+1 &gt;= i.start
}
func (i *Interval) Merge(j Interval) bool {
if !i.OverlapsOrAdjacent(j) {
return false
}
if j.start &lt; i.start {
i.start = j.start
}
if j.end &gt; i.end {
i.end = j.end
}
return true
}
type Intervals []Interval
func (ivs Intervals) Len() int           { return len(ivs) }
func (ivs Intervals) Swap(i, j int)      { ivs[i], ivs[j] = ivs[j], ivs[i] }
func (ivs Intervals) Less(i, j int) bool { return ivs[i].start &lt; ivs[j].start }
func (ivs Intervals) Merge(iv Interval) Intervals {
ivs = append(ivs, iv)
merged := make(Intervals, 0, len(ivs))
for _, iv := range ivs {
for i := 0; i &lt; len(merged); {
if iv.Merge(merged[i]) {
merged = append(merged[:i], merged[i+1:]...)
} else {
i++
}
}
merged = append(merged, iv)
}
return merged
}
func (ivs Intervals) MergeUsingSort(iv Interval) Intervals {
ivs = append(ivs, iv)
sort.Sort(ivs)
merged := make(Intervals, 0, len(ivs))
merged = append(merged, ivs[0])
for i := 1; i &lt; len(ivs); i++ {
last := len(merged) - 1
if !merged[last].Merge(ivs[i]) {
merged = append(merged, ivs[i])
}
}
return merged
}
func (ivs Intervals) Difference(iv Interval) Intervals {
// ???
return ivs
}
func main() {
var ivs Intervals
for _, input := range inputsFromSomewhere { // in reality, I don&#39;t have all these inputs at once, they come in one at a time
iv := Interval{input.start, input.end}
diffs := ivs.Difference(iv) // not yet implemented...
// do something with diffs
ivs = ivs.Merge(iv)
}
}

I find that the above Intervals.Merge() is 2x faster than MergeUsingSort(), so I wonder if there's also a simple non-sorting way of answering my question.

答案1

得分: 2

问题和答案的代码是不完整的,无法编译。代码中没有基准测试。从代码的快速浏览来看,它可能效率低下。

interval.gointerval_test.go的可用代码是从https://github.com/VertebrateResequencing/wr/tree/develop/minfys获取的。

让我们首先为区间差异示例编写一个基准测试。

package minfys

import (
	"fmt"
	"testing"
)

// 示例
var (
	xA = Intervals{{10, 15}, {30, 35}, {20, 25}}
	xB = Interval{5, 32}
	xD = Intervals{{5, 9}, {16, 19}, {26, 29}}
	xR = Intervals{}
)

func BenchmarkExample(b *testing.B) {
	b.ReportAllocs()
	a := make(Intervals, len(xA))
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		copy(a, xA)
		xR = a.Difference(xB)
	}
	b.StopTimer()
	if fmt.Sprint(xD) != fmt.Sprint(xR) {
		b.Fatal(xD, xR)
	}
}

接下来,编写一个Difference方法。

package minfys

func (a Intervals) Difference(b Interval) Intervals {
	// 如果A和B是集合,那么A在B中的相对补集是B中不在A中的元素的集合。
	// A在B中的相对补集表示为B \ A:
	//     B \ A = {x ∈ A | x ∉ B}
	//     B \ A = B ∩ A'
	//
	// 例如,d = a\b,
	//     a: [{10, 15}, {30, 35}, {20, 25}]
	//     b: {5,32}
	//     d: [{5,9}, {16,19}, {26,29}]
	// 集合a的元素是非重叠、非相邻和未排序的区间。

	if len(a) <= 0 {
		return Intervals{b}
	}

	d := make(Intervals, 0, 3)
	for ; len(a) > 0; a = a[1:] {
		for i := 1; i < len(a); i++ {
			if a[i].Start < a[0].Start {
				a[i], a[0] = a[0], a[i]
			}
		}

		if b.Start < a[0].Start {
			if b.End < a[0].Start {
				d = append(d, b)
				break
			}
			d = append(d, Interval{b.Start, a[0].Start - 1})
			b.Start = a[0].Start
		}
		if b.End <= a[0].End {
			break
		}
		if b.Start <= a[0].End {
			b.Start = a[0].End + 1
		}
		if len(a) == 1 {
			if b.Start <= a[0].End {
				b.Start = a[0].End + 1
			}
			d = append(d, b)
			break
		}
	}
	return d
}

现在,对Difference方法进行基准测试。

BenchmarkExample-4     20000000    62.4 ns/op    48 B/op    1 allocs/op

sbs编写了一个Difference方法。

// Interval结构用于描述具有开始和结束的内容。结束必须大于开始。
type Interval struct {
	Start int64
	End   int64
}

// Overlaps返回true,如果此间隔与提供的间隔重叠。
func (i *Interval) Overlaps(j Interval) bool {
	// https://nedbatchelder.com/blog/201310/range_overlap_in_two_compares.html
	return i.End >= j.Start && j.End >= i.Start
}

// Intervals类型是Interval的切片。
type Intervals []Interval

// Difference返回与我们的任何间隔不重叠的iv的任何部分。假设我们的所有间隔都已经合并。
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
	diffs = append(diffs, iv)
	for _, prior := range ivs {
		for i := 0; i < len(diffs); {
			if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
				if len(diffs) == 1 {
					diffs = nil
				} else {
					diffs = append(diffs[:i], diffs[i+1:]...)
				}

				if left != nil {
					diffs = append(diffs, *left)
				}
				if right != nil {
					diffs = append(diffs, *right)
				}
			} else {
				i++
			}
		}
		if len(diffs) == 0 {
			break
		}
	}

	return
}

对sbs的Difference方法进行基准测试。

BenchmarkExample-4    5000000    365 ns/op    128 B/op    4 allocs/op

peterSO的Difference方法明显更快。

old.txt(sbs)与new.txt(peterSO):

基准测试 旧的ns/op 新的ns/op 变化
BenchmarkExample-4 365 62.4 -82.90%

基准测试 旧的分配 新的分配 变化
BenchmarkExample-4 4 1 -75.00%

基准测试 旧的字节 新的字节 变化
BenchmarkExample-4 128 48 -62.50%

这只是一个开始。可能还有其他可以改进的地方。

interval_test.go中有一些错误。ShouldBeNil用于指针;ShouldBeEmpty用于集合。ShouldResemble不能处理集合相等性(包含相同元素的两个集合是同一个集合)。更改ShouldResemble的顺序以匹配实现相关的顺序。

$ go test
..........................................................................................................................x......................................................x................x
Failures:

  * interval_test.go 
  Line 247:
  Expected: nil
  Actual:   '[]'

  * interval_test.go 
  Line 375:
  Expected: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:31, End:32}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}}'
  Actual:   'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}, minfys.Interval{Start:31, End:32}}'
  (Should resemble)!

  * interval_test.go 
  Line 413:
  Expected: 'minfys.Intervals{minfys.Interval{Start:7, End:10}, minfys.Interval{Start:1, End:3}, minfys.Interval{Start:15, End:17}}'
  Actual:   'minfys.Intervals{minfys.Interval{Start:1, End:3}, minfys.Interval{Start:7, End:10}, minfys.Interval{Start:15, End:17}}'
  (Should resemble)!


195 total assertions

...
198 total assertions

--- FAIL: TestIntervals (0.04s)
FAIL
.

$ diff -a -u ../interval_test.go interval_test.go
--- ../interval_test.go    2017-04-29 20:23:29.365344008 -0400
+++ interval_test.go    2017-04-29 20:54:14.349344903 -0400
@@ -244,19 +244,19 @@
             So(len(ivs), ShouldEqual, 1)

             newIvs = ivs.Difference(twoSix)
-            So(newIvs, ShouldBeNil)
+            So(newIvs, ShouldBeEmpty)
             ivs = ivs.Merge(twoSix)
             So(len(ivs), ShouldEqual, 1)

             newIvs = ivs.Difference(oneThree)
-            So(newIvs, ShouldBeNil)
+            So(newIvs, ShouldBeEmpty)
             ivs = ivs.Merge(oneThree)
             So(len(ivs), ShouldEqual, 1)

             oneSeven := Interval{1, 7}

             newIvs = ivs.Difference(oneSix)
-            So(newIvs, ShouldBeNil)
+            So(newIvs, ShouldBeEmpty)
             ivs = ivs.Merge(oneSix)
             So(len(ivs), ShouldEqual, 1)

@@ -372,7 +372,7 @@

             fiveThirtyTwo := Interval{5, 32}
             newIvs = ivs.Difference(fiveThirtyTwo)
-            So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{31, 32}, Interval{11, 14}, Interval{19, 19}})
+            So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{11, 14}, Interval{19, 19}, Interval{31, 32}})
             ivs = ivs.Merge(fiveThirtyTwo)
             So(len(ivs), ShouldEqual, 3)

@@ -409,7 +409,7 @@

             ivs = ivs.Truncate(17)

-            expected := Intervals{sevenTen, oneThree, Interval{15, 17}}
+            expected := Intervals{oneThree, sevenTen, Interval{15, 17}}
             So(ivs, ShouldResemble, expected)
         })
     })
.

$ go test
.............................................................................................................................................................................................................
205 total assertions

...
208 total assertions

PASS
$ 

运行interval_test.go基准测试以测量peterSO和sbs的Difference方法。

$ go test -v
Merging many intervals is fast took 26.208614ms
Difference took 10.706858ms

$ go test -v
Merging many intervals is fast took 30.799216ms
Difference took 14.414488ms

peterSO的Difference方法比sbs的Difference方法快得多:10.706858ms对比14.414488ms,差距为25.7%。

更新peterSO修订后的Difference方法的早期示例基准测试结果为

old.txt(sbs)与new.txt(peterSO):

基准测试 旧的ns/op 新的ns/op 变化
BenchmarkExample-4 365 221 -39.45%

基准测试 旧的分配 新的分配 变化
BenchmarkExample-4 4 3 -25.00%

基准测试 旧的字节 新的字节 变化
BenchmarkExample-4 128 112 -12.50%

英文:

The question and answer code is incomplete and doesn't compile. There are no benchmarks. From a quick glance at the code, it's likely inefficient.

Usable code for interval.go and interval_test.go was obtained from https://github.com/VertebrateResequencing/wr/tree/develop/minfys.

Let's start by writing a benchmark for the interval difference example.

package minfys
import (
&quot;fmt&quot;
&quot;testing&quot;
)
// Example
var (
xA = Intervals{{10, 15}, {30, 35}, {20, 25}}
xB = Interval{5, 32}
xD = Intervals{{5, 9}, {16, 19}, {26, 29}}
xR = Intervals{}
)
func BenchmarkExample(b *testing.B) {
b.ReportAllocs()
a := make(Intervals, len(xA))
b.ResetTimer()
for i := 0; i &lt; b.N; i++ {
copy(a, xA)
xR = a.Difference(xB)
}
b.StopTimer()
if fmt.Sprint(xD) != fmt.Sprint(xR) {
b.Fatal(xD, xR)
}
}

Next, write a Difference method.

package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖  A:
//     B \ A = {x ∈ A | x ∉ B}
//     B \ A = B ∩ A&#39;
//
// For example. d = a\b,
//     a: [{10, 15}, {30, 35}, {20, 25}]
//     b: {5,32}
//     d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) &lt;= 0 {
return Intervals{b}
}
d := make(Intervals, 0, 3)
for ; len(a) &gt; 0; a = a[1:] {
for i := 1; i &lt; len(a); i++ {
if a[i].Start &lt; a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start &lt; a[0].Start {
if b.End &lt; a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End &lt;= a[0].End {
break
}
if b.Start &lt;= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start &lt;= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}

Now, benchmark the Difference method.

BenchmarkExample-4     20000000	    62.4 ns/op	  48 B/op	   1 allocs/op

sbs wrote a Difference method.

// Interval struct is used to describe something with a start and end. End must
// be greater than start.
type Interval struct {
Start int64
End   int64
}
// Overlaps returns true if this interval overlaps with the supplied one.
func (i *Interval) Overlaps(j Interval) bool {
// https://nedbatchelder.com/blog/201310/range_overlap_in_two_compares.html
return i.End &gt;= j.Start &amp;&amp; j.End &gt;= i.Start
}
// Intervals type is a slice of Interval.
type Intervals []Interval
// Difference returns any portions of iv that do not overlap with any of our
// intervals. Assumes that all of our intervals have been Merge()d in.
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i &lt; len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}

Benchmark sbs's Difference method.

BenchmarkExample-4   	5000000	   365 ns/op	 128 B/op	   4 allocs/op

peterSO's Difference method is significantly faster.

old.txt (sbs) versus new.txt (peterSO):
benchmark              old ns/op     new ns/op     delta
BenchmarkExample-4     365           62.4          -82.90%
benchmark              old allocs     new allocs   delta
BenchmarkExample-4     4              1            -75.00%
benchmark              old bytes     new bytes     delta
BenchmarkExample-4     128           48            -62.50%

This is just a beginning. There are likely other improvements that can be made.

There were some errors in interval_test.go. ShouldBeNil is for pointers; ShouldBeEmpty is for collections. ShouldResemble does not handle set equality (two sets which contain the same elements are the same set). Change ShouldResemble order to match implementation dependent order.

$ go test
..........................................................................................................................x......................................................x................x
Failures:
* interval_test.go 
Line 247:
Expected: nil
Actual:   &#39;[]&#39;
* interval_test.go 
Line 375:
Expected: &#39;minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:31, End:32}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}}&#39;
Actual:   &#39;minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}, minfys.Interval{Start:31, End:32}}&#39;
(Should resemble)!
* interval_test.go 
Line 413:
Expected: &#39;minfys.Intervals{minfys.Interval{Start:7, End:10}, minfys.Interval{Start:1, End:3}, minfys.Interval{Start:15, End:17}}&#39;
Actual:   &#39;minfys.Intervals{minfys.Interval{Start:1, End:3}, minfys.Interval{Start:7, End:10}, minfys.Interval{Start:15, End:17}}&#39;
(Should resemble)!
195 total assertions
...
198 total assertions
--- FAIL: TestIntervals (0.04s)
FAIL

.

$ diff -a -u ../interval_test.go interval_test.go
--- ../interval_test.go	2017-04-29 20:23:29.365344008 -0400
+++ interval_test.go	2017-04-29 20:54:14.349344903 -0400
@@ -244,19 +244,19 @@
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(twoSix)
-			So(newIvs, ShouldBeNil)
+			So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(twoSix)
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(oneThree)
-			So(newIvs, ShouldBeNil)
+			So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneThree)
So(len(ivs), ShouldEqual, 1)
oneSeven := Interval{1, 7}
newIvs = ivs.Difference(oneSix)
-			So(newIvs, ShouldBeNil)
+			So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneSix)
So(len(ivs), ShouldEqual, 1)
@@ -372,7 +372,7 @@
fiveThirtyTwo := Interval{5, 32}
newIvs = ivs.Difference(fiveThirtyTwo)
-			So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{31, 32}, Interval{11, 14}, Interval{19, 19}})
+			So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{11, 14}, Interval{19, 19}, Interval{31, 32}})
ivs = ivs.Merge(fiveThirtyTwo)
So(len(ivs), ShouldEqual, 3)
@@ -409,7 +409,7 @@
ivs = ivs.Truncate(17)
-			expected := Intervals{sevenTen, oneThree, Interval{15, 17}}
+			expected := Intervals{oneThree, sevenTen, Interval{15, 17}}
So(ivs, ShouldResemble, expected)
})
})

.

$ go test
.............................................................................................................................................................................................................
205 total assertions
...
208 total assertions
PASS
$ 

> I [@sbs] confirm it's faster than my solution. Though if you just
> measure the wall-time that using Difference() takes (put a before :=
> time.Now() before the last Difference() call in the interval_test.go,
> and a time.Since(before) after it and sum those durations over the
> loop), it seems to make surprisingly little difference (on my machine
> it takes ~31ms with my solution and ~29ms with yours).

As requested, interval_test.go was modified to measure wall time:

$ diff -a -u ../interval_test.go walltime_test.go
--- ../interval_test.go	2017-04-29 20:23:29.365344008 -0400
+++ walltime_test.go	2017-04-30 13:39:29.000000000 -0400
@@ -24,6 +24,7 @@
&quot;math/rand&quot;
&quot;testing&quot;
&quot;time&quot;
+	&quot;fmt&quot;
)
func TestIntervals(t *testing.T) {
@@ -459,16 +460,20 @@
var ivs Intervals
errors := 0
+		var diffTime time.Duration
t := time.Now()
for i, input := range inputs {
iv := NewInterval(int64(input), int64(readSize))
+			before := time.Now()
newIvs := ivs.Difference(iv)
+			diffTime += time.Since(before)
if (len(newIvs) == 1) != exepectedNew[i] {
errors++
}
ivs = ivs.Merge(iv)
}
-		// fmt.Printf(&quot;\ntook %s\n&quot;, time.Since(t))
+		fmt.Printf(&quot;took %s\n&quot;, time.Since(t))
+		fmt.Printf(&quot;\n  Difference took %s\n&quot;, diffTime)
So(errors, ShouldEqual, 0)
So(len(ivs), ShouldEqual, 1)
So(time.Since(t).Seconds(), ShouldBeLessThan, 1) // 42ms on my machine
$ 

The interval_test.go benchmark input sizes and frequencies were

size    frequency
0       1
1       94929
2       50072
3       4998

Output sizes and frequencies were

size    frequency
0       50000
1       100000

Tuning peterSo's Difference method for this distribution gives

package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖  A:
//     B \ A = {x ∈ A | x ∉ B}
//     B \ A = B ∩ A&#39;
//
// For example. d = a\b,
//     a: [{10, 15}, {30, 35}, {20, 25}]
//     b: {5,32}
//     d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) &lt;= 0 {
return Intervals{b}
}
var d Intervals
for ; len(a) &gt; 0; a = a[1:] {
for i := 1; i &lt; len(a); i++ {
if a[i].Start &lt; a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start &lt; a[0].Start {
if b.End &lt; a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End &lt;= a[0].End {
break
}
if b.Start &lt;= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start &lt;= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}

Running the interval_test.go benchmark for peterSO's and sbs's Difference methods gives

$ go test -v
Merging many intervals is fast took 26.208614ms
Difference took 10.706858ms

and

$ go test -v
Merging many intervals is fast took 30.799216ms
Difference took 14.414488ms

peterSo's Difference method is significantly faster than sbs's: 10.706858ms versus 14.414488ms or minus 25.7 percent.

Updating the earlier example benchmark results for peterSO's revised Difference method gives

old.txt (sbs) versus new.txt (peterSO):
benchmark              old ns/op     new ns/op     delta
BenchmarkExample-4     365           221           -39.45%
benchmark              old allocs     new allocs   delta
BenchmarkExample-4     4              3            -25.00%
benchmark              old bytes     new bytes     delta
BenchmarkExample-4     128           112           -12.50%

答案2

得分: 0

为了回答我的问题,这是我对Difference()的实现,它比JimB建议的需要排序的方法更快(在我的输入数据上)。

func (i *Interval) Overlaps(j Interval) bool {
    return i.End >= j.Start && j.End >= i.Start
}

func (i *Interval) Difference(j Interval) (left *Interval, right *Interval, overlapped bool) {
    if !i.Overlaps(j) {
        return
    }

    overlapped = true
    if j.Start < i.Start {
        left = &Interval{j.Start, i.Start - 1}
    }
    if j.End > i.End {
        right = &Interval{i.End + 1, j.End}
    }
    return
}

func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
    diffs = append(diffs, iv)
    for _, prior := range ivs {
        for i := 0; i < len(diffs); {
            if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
                if len(diffs) == 1 {
                    diffs = nil
                } else {
                    diffs = append(diffs[:i], diffs[i+1:]...)
                }

                if left != nil {
                    diffs = append(diffs, *left)
                }
                if right != nil {
                    diffs = append(diffs, *right)
                }
            } else {
                i++
            }
        }
        if len(diffs) == 0 {
            break
        }
    }
    return
}

在我尝试的数据上它是有效的,尽管我有点担心可能会错过某些情况,导致得到错误的答案。

英文:

To answer my own question, here's my implementation of Difference() that is faster (on my input data) than eg. JimB's suggestion that required a sort.

func (i *Interval) Overlaps(j Interval) bool {
return i.End &gt;= j.Start &amp;&amp; j.End &gt;= i.Start
}
func (i *Interval) Difference(j Interval) (left *Interval, right *Interval, overlapped bool) {
if !i.Overlaps(j) {
return
}
overlapped = true
if j.Start &lt; i.Start {
left = &amp;Interval{j.Start, i.Start - 1}
}
if j.End &gt; i.End {
right = &amp;Interval{i.End + 1, j.End}
}
return
}
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i &lt; len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}

It works on the data I've tried, though I'm a little worried I might have missed an edge case where it gets the wrong answer.

huangapple
  • 本文由 发表于 2017年4月27日 18:18:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/43654953.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定