2021年8月19日 19:37:19go评论83阅读模式

英文:

optimize iteration over a list of sorted periods

问题

给定一个已排序且不包含重复元素的周期列表。

periods := periods{
    period{min: 0, max: time.Millisecond},
	period{min: time.Millisecond, max: time.Millisecond * 1},
	period{min: time.Millisecond * 1, max: time.Millisecond * 2},
	period{min: time.Millisecond * 2, max: time.Millisecond * 7},
	period{min: time.Millisecond * 7, max: 0},
}

其中periods和period类型定义如下：

type periods []period

func (ks periods) index(v time.Duration) period {
	for i := 0; i < len(ks); i++ {
		if ks[i].contains(v) {
			return ks[i]
		}
	}
	return period{}
}

type period struct {
	min time.Duration
	max time.Duration
}

func (k period) String() string {
	if k.max == 0 && k.max < k.min {
		return fmt.Sprintf("%v-", k.min)
	}
	return fmt.Sprintf("%v-%v", k.min, k.max)
}

func (k period) contains(t time.Duration) bool {
	if t <= 0 && k.min == 0 {
		return true
	}
	return t > k.min && (k.max == 0 || t <= k.max)
}

完整代码可在以下链接中找到：https://play.golang.org/p/cDmQ7Ho6hUI

你能提供改进periods.index函数中搜索实现的解决方案吗？

此外，你能提供一个因式分解的解决方案，以便可以重用该实现吗？
包含泛型的解决方案也可以，因为我可以使用代码生成进行特殊化。

代码中还包含了一个基准测试：

func BenchmarkIndex(b *testing.B) {
	periods := periods{
		period{min: 0, max: 8000},
		period{min: 8000, max: 16000},
		period{min: 16000, max: 24000},
		period{min: 24000, max: 32000},
		period{min: 32000, max: 40000},
		period{min: 40000, max: 48000},
		period{min: 48000, max: 56000},
		period{min: 56000, max: 64000},
		period{min: 64000, max: 72000},
		period{min: 72000, max: 80000},
		period{min: 80000, max: 0},
	}

	inputs := []time.Duration{
		time.Duration(0),
		time.Duration(72000 + 1),
		time.Duration(80000 + 1),
	}
	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		for _, input := range inputs {
			_ = periods.index(input)
		}
	}
}

英文:

given a list of periods, already sorted, and containing no dups.

periods := periods{
    period{min: 0, max: time.Millisecond},
	period{min: time.Millisecond, max: time.Millisecond * 1},
	period{min: time.Millisecond * 1, max: time.Millisecond * 2},
	period{min: time.Millisecond * 2, max: time.Millisecond * 7},
	period{min: time.Millisecond * 7, max: 0},
}

with the types periods and period defined as

type periods []period

func (ks periods) index(v time.Duration) period {
	for i := 0; i &lt; len(ks); i++ {
		if ks[i].contains(v) {
			return ks[i]
		}
	}
	return period{}
}

type period struct {
	min time.Duration
	max time.Duration
}

func (k period) String() string {
	if k.max == 0 &amp;&amp; k.max &lt; k.min {
		return fmt.Sprintf(&quot;%v-&quot;, k.min)
	}
	return fmt.Sprintf(&quot;%v-%v&quot;, k.min, k.max)
}

func (k period) contains(t time.Duration) bool {
	if t &lt;= 0 &amp;&amp; k.min == 0 {
		return true
	}
	return t &gt; k.min &amp;&amp; (k.max == 0 || t &lt;= k.max)
}

The full code is available at https://play.golang.org/p/cDmQ7Ho6hUI

Can you suggest solution(s) to improve the search implementation in the periods.index function ?

Also, can you provide a factorized solution such as it is possible to re use the implementation ?
A generics included solution is OK for that i can still specialize using code gen.

A benchmark is included

func BenchmarkIndex(b *testing.B) {
	periods := periods{
		period{min: 0, max: 8000},
		period{min: 8000, max: 16000},
		period{min: 16000, max: 24000},
		period{min: 24000, max: 32000},
		period{min: 32000, max: 40000},
		period{min: 40000, max: 48000},
		period{min: 48000, max: 56000},
		period{min: 56000, max: 64000},
		period{min: 64000, max: 72000},
		period{min: 72000, max: 80000},
		period{min: 80000, max: 0},
	}

	inputs := []time.Duration{
		time.Duration(0),
		time.Duration(72000 + 1),
		time.Duration(80000 + 1),
	}
	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i &lt; b.N; i++ {
		for _, input := range inputs {
			_ = periods.index(input)
		}
	}
}

答案1

得分: 3

你很难在像你这样小的列表（11个元素）上实现比简单的顺序搜索更好的性能。

如果你的切片要大得多（例如数百甚至数千个周期），并且你所说的periods是排序的，那么你可以使用二分搜索。

二分搜索在sort.Search()中实现。你只需要为period提供一个lessOrContains()的实现。代码如下：

func (k period) lessOrContains(t time.Duration) bool {
    return k.max == 0 || t <= k.max
}

func (ks periods) indexBinary(v time.Duration) period {
    idx := sort.Search(len(ks), func(i int) bool {
        return ks[i].lessOrContains(v)
    })
    if idx < len(ks) && ks[idx].contains(v) {
        return ks[idx]
    }
    return period{}
}

接下来是基准测试。让我们创建一个createPeriods()辅助函数，用于创建小或大的周期切片：

func createPeriods(big bool) periods {
    ps := periods{
        period{min: 0, max: 8000},
        period{min: 8000, max: 16000},
        period{min: 16000, max: 24000},
        period{min: 24000, max: 32000},
        period{min: 32000, max: 40000},
        period{min: 40000, max: 48000},
        period{min: 48000, max: 56000},
        period{min: 56000, max: 64000},
        period{min: 64000, max: 72000},
        period{min: 72000, max: 80000},
        period{min: 80000, max: 0},
    }

    if big {
        psbig := periods{}
        for i := 0; i < 1000; i++ {
            psbig = append(psbig, period{time.Duration(i), time.Duration(i + 1)})
        }
        psbig = append(psbig, ps[1:]...)
        ps = psbig
    }

    return ps
}

现在让我们为所有情况编写基准测试函数：

func BenchmarkIndexSmall(b *testing.B) {
    benchmarkIndexImpl(b, false)
}

func BenchmarkIndexBinarySmall(b *testing.B) {
    benchmarkIndexBinaryImpl(b, false)
}

func BenchmarkIndexBig(b *testing.B) {
    benchmarkIndexImpl(b, true)
}

func BenchmarkIndexBinaryBig(b *testing.B) {
    benchmarkIndexBinaryImpl(b, true)
}

func benchmarkIndexImpl(b *testing.B, big bool) {
    periods := createPeriods(big)

    inputs := []time.Duration{
        time.Duration(0),
        time.Duration(72000 + 1),
        time.Duration(80000 + 1),
    }
    b.ResetTimer()
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        for _, input := range inputs {
            _ = periods.index(input)
        }
    }
}

func benchmarkIndexBinaryImpl(b *testing.B, big bool) {
    periods := createPeriods(big)

    inputs := []time.Duration{
        time.Duration(0),
        time.Duration(72000 + 1),
        time.Duration(80000 + 1),
    }
    b.ResetTimer()
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        for _, input := range inputs {
            _ = periods.indexBinary(input)
        }
    }
}

现在让我们看看基准测试的结果：

BenchmarkIndexSmall-8        44408948      25.50 ns/op    0 B/op     0 allocs/op
BenchmarkIndexBinarySmall-8  18441049      58.35 ns/op    0 B/op     0 allocs/op
BenchmarkIndexBig-8            562202    1908 ns/op       0 B/op     0 allocs/op
BenchmarkIndexBinaryBig-8     9234846     125.1 ns/op     0 B/op     0 allocs/op

如你所见，对于具有11个元素的小列表，index()比indexBinary()更快（25 ns vs 58 ns）。

但是当列表变大时（超过一千个周期，在上面的基准测试中为1010个），indexBinary()比index()快一个数量级以上（125 ns vs 2000 ns），而且如果列表变得更大，这种差异将变得更大。

英文:

It's unlikely you can achieve better performance than a simple, sequential search for a list as small as yours (11 elements).

If your slice would be much bigger (e.g. hundreds or even thousands of periods), since your periods is sorted as you claim, then you could use binary search.

Binary search is implemented in sort.Search(). You just basically have to provide a lessOrContains() implementation for period. This is how it could look like:

func (k period) lessOrContains(t time.Duration) bool {
return k.max == 0 || t &lt;= k.max
}

Now an index() function using binary search:

func (ks periods) indexBinary(v time.Duration) period {
idx := sort.Search(len(ks), func(i int) bool {
return ks[i].lessOrContains(v)
})
if idx &lt; len(ks) &amp;&amp; ks[idx].contains(v) {
return ks[idx]
}
return period{}
}

Now on to benchmarking. Let's create a createPeriods() helper function that creates either a small or big periods slice:

func createPeriods(big bool) periods {
ps := periods{
period{min: 0, max: 8000},
period{min: 8000, max: 16000},
period{min: 16000, max: 24000},
period{min: 24000, max: 32000},
period{min: 32000, max: 40000},
period{min: 40000, max: 48000},
period{min: 48000, max: 56000},
period{min: 56000, max: 64000},
period{min: 64000, max: 72000},
period{min: 72000, max: 80000},
period{min: 80000, max: 0},
}
if big {
psbig := periods{}
for i := 0; i &lt; 1000; i++ {
psbig = append(psbig, period{time.Duration(i), time.Duration(i + 1)})
}
psbig = append(psbig, ps[1:]...)
ps = psbig
}
return ps
}

Now let's write benchmark functions for all cases:

func BenchmarkIndexSmall(b *testing.B) {
benchmarkIndexImpl(b, false)
}
func BenchmarkIndexBinarySmall(b *testing.B) {
benchmarkIndexBinaryImpl(b, false)
}
func BenchmarkIndexBig(b *testing.B) {
benchmarkIndexImpl(b, true)
}
func BenchmarkIndexBinaryBig(b *testing.B) {
benchmarkIndexBinaryImpl(b, true)
}
func benchmarkIndexImpl(b *testing.B, big bool) {
periods := createPeriods(big)
inputs := []time.Duration{
time.Duration(0),
time.Duration(72000 + 1),
time.Duration(80000 + 1),
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i &lt; b.N; i++ {
for _, input := range inputs {
_ = periods.index(input)
}
}
}
func benchmarkIndexBinaryImpl(b *testing.B, big bool) {
periods := createPeriods(big)
inputs := []time.Duration{
time.Duration(0),
time.Duration(72000 + 1),
time.Duration(80000 + 1),
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i &lt; b.N; i++ {
for _, input := range inputs {
_ = periods.indexBinary(input)
}
}
}

And now let's see the benchmark results:

BenchmarkIndexSmall-8        44408948      25.50 ns/op    0 B/op     0 allocs/op
BenchmarkIndexBinarySmall-8  18441049      58.35 ns/op    0 B/op     0 allocs/op
BenchmarkIndexBig-8            562202    1908 ns/op       0 B/op     0 allocs/op
BenchmarkIndexBinaryBig-8     9234846     125.1 ns/op     0 B/op     0 allocs/op

As you can see, index() is faster than indexBinary() for small lists with 11 elements (25 ns vs 58 ns).

But when the list gets big (more than a thousand periods, 1010 in the above benchmark), then indexBinary() outperforms index() by more than an order of magnitude (125 ns vs 2000 ns), and this difference will get even bigger if the lists gets bigger.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

优化对已排序周期列表的迭代。

问题

答案1

使用Go封装FUSE

golang syslog失败：Unix syslog传递错误

Go拒绝未使用的依赖的利弊

在golang中，有一种高效的方法可以计算执行时间吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论