2013年12月28日 18:21:08go评论175阅读模式

英文:

What's the slice element access complexity in Go?

问题

我以为它的时间复杂度是O(1)，但这是从pprof输出中得到的：

140    140  176: 	var lastSB byte = s[lenSMinusOne]
88     88   177: 	var lastSuffixB byte = suffix[lenSuffixMinusOne]

而且根据s的平均长度大于suffix的长度。这表明如果切片更大，访问元素所需的时间更长？

该函数的代码如下：

func hasSuffix(s, suffix []byte) bool {

    lenSMinusOne      := len(s)      - 1
    lenSuffixMinusOne := len(suffix) - 1

    var lastSB byte = s[lenSMinusOne]
    var lastSuffixB byte = suffix[lenSuffixMinusOne]

    if lenSMinusOne < lenSuffixMinusOne {
        return false
    } else if lastSB != lastSuffixB {
        return false
    } else {
        for i := 0; i < lenSuffixMinusOne ; i++ {
               if suffix[i] != s[lenSMinusOne-lenSuffixMinusOne+i] {
                        return false
               }
        }
    }
    return true
}

更新：
要重现这些结果，请安装fetch，它使用带有大型语料库的go-porterstemmer分支（我使用了一个440MB的文件）。

英文:

I thought it to be O(1), but this is from a pprof output:

140    140  176: 	var lastSB byte = s[lenSMinusOne]
88     88   177: 	var lastSuffixB byte = suffix[lenSuffixMinusOne]

and by average length of s is greater than length of suffix. Thus, this shows that accessing an element takes longer if the slice is bigger?

The function is:

func hasSuffix(s, suffix []byte) bool {

    lenSMinusOne      := len(s)      - 1
    lenSuffixMinusOne := len(suffix) - 1

    var lastSB byte = s[lenSMinusOne]
    var lastSuffixB byte = suffix[lenSuffixMinusOne]

    if lenSMinusOne &lt; lenSuffixMinusOne {
        return false
    } else if lastSB != lastSuffixB {
        return false
    } else {
        for i := 0; i &lt; lenSuffixMinusOne ; i++ {
               if suffix[i] != s[lenSMinusOne-lenSuffixMinusOne+i] {
                        return false
               }
        }
    }
    return true
}

UPDATE:
To reproduce the results install fetch which uses go-porterstemmer fork with a large corpus(I use a 440mb file).

答案1

得分: 8

pprof在程序执行过程中收集样本，以揭示热点。使用testing包和go test来运行基准测试。

正如你所期望的那样，以下基准测试显示，平均而言，读取切片的第二个元素与读取切片的第2691个元素之间没有区别，对于包含904,061个字节的切片元素，分别为13439773 ns/op和13460864 ns/op。这两个基准测试使用相同的底层数组。索引切片的时间复杂度为O(1)。

在你的示例中，你正在从具有不同访问模式的两个不同的底层数组中进行读取（外部循环与内部循环）。在具有先进的内存管理和优化的现代处理器上，你不应该期望得到相同的结果。

$ go version
go version devel +3ae7a530dd4e Sat Dec 28 09:37:54 2013 -0800 linux/amd64
$ go test -bench=IndexWord
904061 2 2690.8131199111563
testing: warning: no tests to run
PASS
BenchmarkIndexWordLong     100  13460864 ns/op
BenchmarkIndexWordShort    100  13439773 ns/op
ok  bench  7.814s
$

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"testing"
)

var (
	Words    [][]byte
	ShortLen = 2
)

func IndexWord(b *testing.B, words [][]byte) {
	b.ResetTimer()
	b.StartTimer()
	var char byte
	for i := 0; i < b.N; i++ {
		for _, word := range words {
			char = word[len(word)-1]
		}
	}
	_ = char
}

func BenchmarkIndexWordLong(b *testing.B) {
	words := make([][]byte, len(Words))
	for i, word := range Words {
		words[i] = word
	}
	IndexWord(b, words)
}

func BenchmarkIndexWordShort(b *testing.B) {
	words := make([][]byte, len(Words))
	for i, word := range Words {
		if len(word) > ShortLen {
			word = word[:ShortLen]
		}
		words[i] = word
	}
	IndexWord(b, words)
}

func init() {
	// The Complete Works of William Shakespeare
	// http://www.gutenberg.org/cache/epub/100/pg100.txt
	text, err := ioutil.ReadFile(`/home/peter/pg100.txt`)
	if err != nil {
		panic(err)
	}
	var n, short, long int64
	Words = bytes.Fields(text)
	for i, word := range Words {
		word = bytes.Repeat(word, 600) // Requires 4GB memory
		Words[i] = word
		n++
		long += int64(len(word))
		shortLen := ShortLen
		if len(word) < ShortLen {
			shortLen = len(word)
		}
		short += int64(shortLen)
	}
	fmt.Println(n, float64(short)/float64(len(Words)), float64(long)/float64(len(Words)))
}

你的hasSuffix函数的代码看起来像是从另一种语言直接移植过来的，它不像是为Go编写的。这是我重写的版本。

func hasSuffix(s, suffix []byte) bool {
	if len(s) < len(suffix) {
		return false
	}
	s = s[len(s)-len(suffix):]
	for i, x := range suffix {
		if x != s[i] {
			return false
		}
	}
	return true
}

此外，Go语言还提供了bytes.HasSuffix函数。

Package bytes

func HasSuffix
func HasSuffix(s, suffix []byte) bool
HasSuffix函数测试字节切片s是否以suffix结尾。

英文:

pprof collects samples during program execution to illuminate hotspots. Use the testing package and go test to run benchmarks.

As you should expect, the following benchmark shows that there is no difference between reading the 2nd element of a slice on average and reading the 2691st element of a slice on average, 13439773 ns/op versus 13460864 ns/op for 904,061 byte slice elements. Both benchmarks use the same underlying data arrays. Indexing a slice is O(1).

In your example, you are reading from two different underlying data arrays with different access patterns (outer versus inner loop). On modern processors, which have sophisticated memory management and optimization, you shouldn't expect the same results.

$ go version
go version devel +3ae7a530dd4e Sat Dec 28 09:37:54 2013 -0800 linux/amd64
$ go test -bench=IndexWord
904061 2 2690.8131199111563
testing: warning: no tests to run
PASS
BenchmarkIndexWordLong	     100	  13460864 ns/op
BenchmarkIndexWordShort	     100	  13439773 ns/op
ok  	bench	7.814s
$

package main
import (
&quot;bytes&quot;
&quot;fmt&quot;
&quot;io/ioutil&quot;
&quot;testing&quot;
)
var (
Words    [][]byte
ShortLen = 2
)
func IndexWord(b *testing.B, words [][]byte) {
b.ResetTimer()
b.StartTimer()
var char byte
for i := 0; i &lt; b.N; i++ {
for _, word := range words {
char = word[len(word)-1]
}
}
_ = char
}
func BenchmarkIndexWordLong(b *testing.B) {
words := make([][]byte, len(Words))
for i, word := range Words {
words[i] = word
}
IndexWord(b, words)
}
func BenchmarkIndexWordShort(b *testing.B) {
words := make([][]byte, len(Words))
for i, word := range Words {
if len(word) &gt; ShortLen {
word = word[:ShortLen]
}
words[i] = word
}
IndexWord(b, words)
}
func init() {
// The Complete Works of William Shakespeare
// http://www.gutenberg.org/cache/epub/100/pg100.txt
text, err := ioutil.ReadFile(`/home/peter/pg100.txt`)
if err != nil {
panic(err)
}
var n, short, long int64
Words = bytes.Fields(text)
for i, word := range Words {
word = bytes.Repeat(word, 600) // Requires 4GB memory
Words[i] = word
n++
long += int64(len(word))
shortLen := ShortLen
if len(word) &lt; ShortLen {
shortLen = len(word)
}
short += int64(shortLen)
}
fmt.Println(n, float64(short)/float64(len(Words)), float64(long)/float64(len(Words)))
}

The code for your hasSuffix function looks like a direct port from another language; it doesn't look like it is written for Go. Here's my rewrite.

func hasSuffix(s, suffix []byte) bool {
if len(s) &lt; len(suffix) {
return false
}
s = s[len(s)-len(suffix):]
for i, x := range suffix {
if x != s[i] {
return false
}
}
return true
}

Also, Go has a bytes.HasSuffix function.

> Package bytes
>
> func HasSuffix
>
> func HasSuffix(s, suffix []byte) bool
>
> HasSuffix tests whether the byte slice s ends with suffix.

答案2

得分: 1

切片访问的时间复杂度为O(1)，但在现代计算机上，内存访问的时间可能会因为值是否被缓存而有数量级的差异。在没有看到你的代码之前，最有可能的原因是一个内存访问比另一个慢的原因。

另一个可能性是你的其中一个切片是数组，并且索引是常量，这意味着不需要进行边界检查。

英文:

Slice access is O(1), but memory access on modern computers can take orders of magnitude more or less time depending on whether the value is cached. Without seeing your code, most likely this is why one memory access is slower than another.

Another possibility are that one of your slices is an array and the index is constant, meaning that bounds checking isn't necessary.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Go语言中，切片元素的访问复杂度是多少？

问题

答案1

答案2

如何传递空接口

如何读取一个gzipped的CSV文件？

在JavaScript中将分组为嵌套对象数组。

How to properly pass arguments in Go function?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论