2014年8月4日 03:53:27go评论131阅读模式

英文:

Vector performance in Go and C++

问题

请考虑以下两个代码片段，一个是GO语言的，另一个是C++11的。在C++中，std::vector是一个具有摊销O(1)插入操作的动态数组。如何在GO语言中实现相同的性能？问题在于这段GO代码在我的硬件上大约慢了3倍。运行多次。

编译：

go build vec.go（go版本go1.2.1 linux/amd64）
g++ -O2 -std=gnu++11 -o vec vec.cc（g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2）

GO版本（vec.go）：

package main
type X struct {
	x int32
	y float64
}
const N int = 80000000
func main() {
	x := X{123, 2.64}
	s := make([]X, 1)
	for i := 0; i < N; i++ {
		s = append(s, x)
	}
}

C++11版本（vec.cc）：

#include <vector>
const int N = 80000000;
struct X {
        int x;
        double y;
};
int main(void)
{
        X x{123, 2.64};
        std::vector<X> s(1);
        for (int i = 0; i < N; ++i) {
                s.push_back(x);
        }
}

英文:

Please consider these two snippets in GO and C++11. In C++ std::vector is a doubling-array which has amortized O(1) insert operation. How to achieve the same performance in GO? Problem is that this GO code is about 3 times slower on my hardware. Run many times.

Compiled:

go build vec.go (go version go1.2.1 linux/amd64)
g++ -O2 -std=gnu++11 -o vec vec.cc (g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2)

GO version (vec.go):

package main
type X struct {
	x int32
	y float64
}
const N int = 80000000
func main() {
	x := X{123, 2.64}
	s := make([]X, 1)
	for i := 0; i &lt; N; i++ {
		s = append(s, x)
	}
}

C++11 version (vec.cc):

#include &lt;vector&gt;
const int N = 80000000;
struct X {
        int x;
        double y;
};
int main(void)
{
        X x{123, 2.64};
        std::vector&lt;X&gt; s(1);
        for (int i = 0; i &lt; N; ++i) {
                s.push_back(x);
        }
}

答案1

得分: 11

Go的规范对append()函数没有特定的复杂度要求，但实际上它也是以摊还常数时间实现的，就像这个问题的答案中所描述的那样。

当前的实现方式如下：对于数组大小小于1024的情况，它会按需求进行倍增，而对于大于1024的情况，它会增加到原始大小的1.25倍。以1.25倍增加仍然是摊还常数时间，但它会导致摊还常数因子比总是倍增的实现方式更高。然而，1.25倍增加的方式在整体上浪费的内存较少。

如果你在只有少数几次操作时（即使在非常大的N值下），得到了不同的性能表现，那么你可能看到了不同的常数因子。我自己注意到，gc编译器生成的机器代码比gccgo生成的代码更高效。

为了验证Go是否以摊还常数时间运行，请尝试绘制运行你的算法所需时间的图表，使用几个不同的N值。

英文:

Go's specification doesn't require any particular complexity for append(), but in practice it's also implemented in ammortized constant time, as described in the answer to this question.

The current implementation works as follows: for array sizes below 1024, it doubles as needed, and above 1024 it increases to 1.25x the original size. Increasing by 1.25x is still amortized constant time, but it has the effect of imposing a higher amortized constant factor than an implementation that always doubles. However 1.25x wastes less memory overall.

If you're getting different performance behavior by only a few times (even at very large N), then you're seeing different constant factors in play. I've noted myself that the machine code produced by the gc compiler is much more efficient than that generated by gccgo.

To verify for yourself that Go is operating in ammortized constant time, try plotting the time it takes to run your algorithm for several different values of N.

答案2

得分: 1

我已经回答了你关于计算复杂度的问题：append复杂度。它的摊销时间是常数时间。

你的基准测试结果如下：

$ rm vec
$ cat vec.cc
#include <vector>
const int N = 80000000;
struct X {
        int x;
        double y;
};
int main(void)
{
        X x{123, 2.64};
        std::vector<X> s(1);
        for (int i = 0; i < N; ++i) {
                s.push_back(x);
        }
}
$ g++ -O2 -std=gnu++11 -o vec vec.cc
$ time ./vec
real    0m1.360s
user    0m0.536s
sys     0m0.816s
$ rm vec
$ cat vec.go
package main
type X struct {
    x int32
    y float64
}
const N int = 80000000
func main() {
    x := X{123, 2.64}
    s := make([]X, 1)
    for i := 0; i < N; i++ {
        s = append(s, x)
    }
}
$ go version
go version devel +6b696a34e0af Sun Aug 03 15:14:59 2014 -0700 linux/amd64
$ go build vec.go
$ time ./vec
real    0m2.590s
user    0m1.192s
sys     0m1.388s
$

英文:

I've already answered your computational complexity question: append complexity. It is amortized constant time.

My results from your benchmark.

$ rm vec
$ cat vec.cc
#include &lt;vector&gt;
const int N = 80000000;
struct X {
        int x;
        double y;
};
int main(void)
{
        X x{123, 2.64};
        std::vector&lt;X&gt; s(1);
        for (int i = 0; i &lt; N; ++i) {
                s.push_back(x);
        }
}
$ g++ -O2 -std=gnu++11 -o vec vec.cc
$ time ./vec
real	0m1.360s
user	0m0.536s
sys	0m0.816s
$ rm vec
$ cat vec.go
package main
type X struct {
	x int32
	y float64
}
const N int = 80000000
func main() {
	x := X{123, 2.64}
	s := make([]X, 1)
	for i := 0; i &lt; N; i++ {
		s = append(s, x)
	}
}
$ go version
go version devel +6b696a34e0af Sun Aug 03 15:14:59 2014 -0700 linux/amd64
$ go build vec.go
$ time ./vec
real	0m2.590s
user	0m1.192s
sys	0m1.388s
$

答案3

得分: 0

如果您事先知道元素的数量，可以使用以下方式进行预分配：

s := make([]X, 0, N)
for i := 0; i < N; i++ {
    s = append(s, x)
}

此外，建议使用Go 1.3版本，编译器进行了一些优化。

为了获得更好的向量化效果，可以尝试使用gccgo。

英文:

If you know the number of elements before hand, you can preallocate it with:

s := make([]X, 0, N)
for i := 0; i &lt; N; i++ {
    s = append(s, x)
}

Also use Go 1.3, the compiler got some optimizations.

And for better vectorization, try gccgo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go和C++中的向量性能

问题

答案1

答案2

答案3

在Golang中生成没有使用protoc的.pb文件。

你可以使用github.com/golang/appengine来访问Datastore。

在Golang中，delete操作是否保证从哈希表中删除元素？

如何使用不同的数据结构递归循环遍历地图？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。