2015年2月2日 14:35:13go评论126阅读模式

英文:

Why cgo's performance is so slow? is there something wrong with my testing code?

问题

我正在进行一个测试：比较运行100000000次的cgo和纯Go函数的执行时间。与Golang函数相比，cgo函数需要更长的时间，我对这个结果感到困惑。我的测试代码如下：

package main
import (
	"fmt"
	"time"
)
/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void show() {
}
*/
// #cgo LDFLAGS: -lstdc++
import "C"
//import "fmt"
func show() {
}
func main() {
	now := time.Now()
	for i := 0; i < 100000000; i = i + 1 {
		C.show()
	}
	end_time := time.Now()
	var dur_time time.Duration = end_time.Sub(now)
	var elapsed_min float64 = dur_time.Minutes()
	var elapsed_sec float64 = dur_time.Seconds()
	var elapsed_nano int64 = dur_time.Nanoseconds()
	fmt.Printf("cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
		elapsed_min, elapsed_sec, elapsed_nano)
	now = time.Now()
	for i := 0; i < 100000000; i = i + 1 {
		show()
	}
	end_time = time.Now()
	dur_time = end_time.Sub(now)
	elapsed_min = dur_time.Minutes()
	elapsed_sec = dur_time.Seconds()
	elapsed_nano = dur_time.Nanoseconds()
	fmt.Printf("go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
		elapsed_min, elapsed_sec, elapsed_nano)
	var input string
	fmt.Scanln(&input)
}

结果是：

cgo show function elasped 0.368096 minutes or 
elapsed 22.085756 seconds or 
elapsed 22085755775 nanoseconds
go show function elasped 0.000654 minutes or 
elapsed 0.039257 seconds or 
elapsed 39257120 nanoseconds

结果显示调用C函数比调用Go函数慢。我的测试代码有什么问题吗？

我的系统是：mac OS X 10.9.4 (13E28)

英文:

I'm doing a test: compare excecution times of cgo and pure Go functions run 100 million times each. The cgo function takes longer time compared to the Golang function, and I am confused with this result. My testing code is:

package main
import (
	&quot;fmt&quot;
	&quot;time&quot;
)
/*
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
void show() {
}
*/
// #cgo LDFLAGS: -lstdc++
import &quot;C&quot;
//import &quot;fmt&quot;
func show() {
}
func main() {
	now := time.Now()
	for i := 0; i &lt; 100000000; i = i + 1 {
		C.show()
	}
	end_time := time.Now()
	var dur_time time.Duration = end_time.Sub(now)
	var elapsed_min float64 = dur_time.Minutes()
	var elapsed_sec float64 = dur_time.Seconds()
	var elapsed_nano int64 = dur_time.Nanoseconds()
	fmt.Printf(&quot;cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n&quot;,
		elapsed_min, elapsed_sec, elapsed_nano)
	now = time.Now()
	for i := 0; i &lt; 100000000; i = i + 1 {
		show()
	}
	end_time = time.Now()
	dur_time = end_time.Sub(now)
	elapsed_min = dur_time.Minutes()
	elapsed_sec = dur_time.Seconds()
	elapsed_nano = dur_time.Nanoseconds()
	fmt.Printf(&quot;go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n&quot;,
		elapsed_min, elapsed_sec, elapsed_nano)
	var input string
	fmt.Scanln(&amp;input)
}

and result is:

cgo show function elasped 0.368096 minutes or 
elapsed 22.085756 seconds or 
elapsed 22085755775 nanoseconds
go show function elasped 0.000654 minutes or 
elapsed 0.039257 seconds or 
elapsed 39257120 nanoseconds

The results show that invoking the C function is slower than the Go function. Is there something wrong with my testing code?

My system is : mac OS X 10.9.4 (13E28)

答案1

得分: 43

正如你所发现的，通过CGo调用C/C++代码会有相当高的开销。因此，通常情况下，你最好尽量减少CGo调用的次数。对于上面的例子，与其在循环中重复调用CGo函数，将循环移到C代码中可能更合理。

Go运行时设置线程的几个方面可能会破坏许多C代码的预期：

Goroutines在相对较小的堆栈上运行，通过分段堆栈（旧版本）或复制（新版本）来处理堆栈增长。
Go运行时创建的线程可能无法与libpthread的线程本地存储实现正确交互。
Go运行时的UNIX信号处理程序可能会干扰传统的C或C++代码。
Go重用操作系统线程来运行多个Goroutine。如果C代码调用了阻塞系统调用或以其他方式独占了线程，可能会对其他Goroutine产生不利影响。

因此，CGo选择了安全的方法，在一个使用传统堆栈设置的单独线程中运行C代码。

如果你来自像Python这样的语言，重写热点代码以加快程序速度并不罕见，那么你可能会感到失望。但与此同时，等效的C和Go代码之间的性能差距要小得多。

总的来说，我会将CGo保留用于与现有库进行接口交互，可能还会使用小的C包装函数来减少我在Go中需要进行的调用次数。

英文:

As you've discovered, there is fairly high overhead in calling C/C++ code via CGo. So in general, you are best off trying to minimise the number of CGo calls you make. For the above example, rather than calling a CGo function repeatedly in a loop it might make sense to move the loop down to C.

There are a number of aspects of how the Go runtime sets up its threads that can break the expectations of many pieces of C code:

Goroutines run on a relatively small stack, handling stack growth through segmented stacks (old versions) or by copying (new versions).
Threads created by the Go runtime may not interact properly with libpthread's thread local storage implementation.
The Go runtime's UNIX signal handler may interfere with traditional C or C++ code.
Go reuses OS threads to run multiple Goroutines. If the C code called a blocking system call or otherwise monopolised the thread, it could be detrimental to other goroutines.

For these reasons, CGo picks the safe approach of running the C code in a separate thread set up with a traditional stack.

If you are coming from languages like Python where it isn't uncommon to rewrite code hotspots in C as a way to speed up a program you will be disappointed. But at the same time, there is a much smaller gap in performance between equivalent C and Go code.

In general I reserve CGo for interfacing with existing libraries, possibly with small C wrapper functions that can reduce the number of calls I need to make from Go.

答案2

得分: 29

更新James的回答：目前的实现中似乎没有线程切换。

在golang-nuts上查看此线程：

总会有一些开销。
它比简单的函数调用要昂贵，但比上下文切换要便宜得多
（agl记得在公开发布之前删除了线程切换）。
现在的开销基本上只是需要进行完整的寄存器集切换（没有内核参与）。
我猜它相当于十次函数调用。

还可以参考此回答，其中链接了“cgo is not Go”博文。

C对Go的调用约定或可增长的堆栈一无所知，因此调用C代码必须记录goroutine堆栈的所有细节，切换到C堆栈，并运行不知道如何调用它或负责程序的较大Go运行时的C代码。

因此，cgo的开销是因为它执行了堆栈切换，而不是线程切换。

当调用C函数时，它保存并恢复所有寄存器，而调用Go函数或汇编函数时则不需要。

此外，cgo的调用约定禁止直接将Go指针传递给C代码，常见的解决方法是使用C.malloc，从而引入了额外的分配。有关详细信息，请参见此问题。

英文:

Update for James's answer: it seems that there's no thread switch in current implementation.

See this thread on golang-nuts:

> There's always going to be some overhead.
It's more expensive than a simple function call but
significantly less expensive than a context switch
(agl is remembering an earlier implementation;
we cut out the thread switch before the public release).
Right now the expense is basically just having to
do a full register set switch (no kernel involvement).
I'd guess it's comparable to ten function calls.

See also this answer which links "cgo is not Go" blog post.

> C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

Thus, cgo has an overhead because it performs a stack switch, not thread switch.

It saves and restores all registers when C function is called, while it's not required when Go function or assembly function is called.

Besides that, cgo's calling conventions forbid passing Go pointers directly to C code, and common workaround is to use C.malloc, and so introduce additional allocations. See this question for details.

答案3

得分: -1

我支持gavv，

在Windows上：

/*
#include "stdio.h"
#include <Windows.h>
unsigned long CTid(void){
	return GetCurrentThreadId();
}
*/
import "C"
import (
	"fmt"
	"time"
	"golang.org/x/sys/windows"
)
func main() {
	fmt.Println(uint32(C.CTid()))
	fmt.Println(windows.GetCurrentThreadId())
	time.Sleep(time.Second * 5)
}

Go和CGO获取相同的线程ID。

英文:

I support gavv,

on winodws:

/*
#include &quot;stdio.h&quot;
#include  &lt;Windows.h&gt;
unsigned long CTid(void){
	return GetCurrentThreadId();
}
*/
import &quot;C&quot;
import (
	&quot;fmt&quot;
	&quot;time&quot;
	&quot;golang.org/x/sys/windows&quot;
)
func main() {
	fmt.Println(uint32(C.CTid()))
	fmt.Println(windows.GetCurrentThreadId())
	time.Sleep(time.Second * 5)
}

go and cgo get same TID.

答案4

得分: -4

从Go调用C函数会有一些额外开销，这是无法改变的。

英文:

There is a little overhead in calling C functions from Go. This cannot be changed.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么cgo的性能如此之慢？我的测试代码有什么问题吗？

问题

答案1

答案2

答案3

答案4

将sqlite.c嵌入到Go代码中：ld在不应该的情况下抱怨libdl。

将高类型向下转换为低类型

how to get goroutine id with ebpf

如何从字符串中获取单个 Unicode 字符

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。