
huangapple go评论68阅读模式

In Go, do non-capturing closures harm performance?



func someFunc() {
    matcher := func(n *html.Node) bool {
        return n.DataAtom == atom.Body
    body, ok := scrape.Find(root, matcher)


func someFunc() {
    body, ok := scrape.Find(root, matcher)

func matcher(n *html.Node) bool {
    return n.DataAtom == atom.Body





For instance, github.com/yhat/scrape suggests using a closure like this:

func someFunc() {
	matcher := func(n *html.Node) bool {
		return n.DataAtom == atom.Body
	body, ok := scrape.Find(root, matcher)

Since matcher doesn’t actually capture any local variables, this could equivalently be written as:

func someFunc() {
	body, ok := scrape.Find(root, matcher)

func matcher(n *html.Node) bool {
	return n.DataAtom == atom.Body

The first form looks better, because the matcher function is quite specific to that place in the code. But does it perform worse at runtime (assuming someFunc may be called often)?

I guess there must be some overhead to creating a closure, but this kind of closure could be optimized into a regular function by the compiler?

(Obviously the language spec doesn’t require this; I’m interested in what gc actually does.)


得分: 9



package main

import "fmt"

func topLevelFunction(x int) int {
    return x + 4

func useFunction(fn func(int) int) {

func invoke() {
    innerFunction := func(x int) int {
        return x + 8

func main() {


$ go version
go version go1.8.5 linux/amd64

$ go tool objdump -s 'main.(invoke|topLevel)' bin/toy 
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:6    0x47b7a0    488b442408    MOVQ 0x8(SP), AX    
    toy.go:6    0x47b7a5    4883c004    ADDQ $0x4, AX        
    toy.go:6    0x47b7a9    4889442410    MOVQ AX, 0x10(SP)    
    toy.go:6    0x47b7ae    c3    RET            

TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:13    0x47b870    64488b0c25f8ffffff    FS MOVQ FS:0xfffffff8, CX        
    toy.go:13    0x47b879    483b6110    CMPQ 0x10(CX), SP            
    toy.go:13    0x47b87d    7638    JBE 0x47b8b7                
    toy.go:13    0x47b87f    4883ec10    SUBQ $0x10, SP                
    toy.go:13    0x47b883    48896c2408    MOVQ BP, 0x8(SP)            
    toy.go:13    0x47b888    488d6c2408    LEAQ 0x8(SP), BP            
    toy.go:17    0x47b88d    488d052cfb0200    LEAQ 0x2fb2c(IP), AX            
    toy.go:17    0x47b894    48890424    MOVQ AX, 0(SP)                
    toy.go:17    0x47b898    e813ffffff    CALL main.useFunction(SB)        
    toy.go:14    0x47b89d    488d0514fb0200    LEAQ 0x2fb14(IP), AX            
    toy.go:18    0x47b8a4    48890424    MOVQ AX, 0(SP)                
    toy.go:18    0x47b8a8    e803ffffff    CALL main.useFunction(SB)        
    toy.go:19    0x47b8ad    488b6c2408    MOVQ 0x8(SP), BP            
    toy.go:19    0x47b8b2    4883c410    ADDQ $0x10, SP                
    toy.go:19    0x47b8b6    c3    RET                    
    toy.go:13    0x47b8b7    e874f7fcff    CALL runtime.morestack_noctxt(SB)    
    toy.go:13    0x47b8bc    ebb2    JMP main.invoke(SB)            

TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:15    0x47b8f0    488b442408    MOVQ 0x8(SP), AX    
    toy.go:15    0x47b8f5    4883c008    ADDQ $0x8, AX        
    toy.go:15    0x47b8f9    4889442410    MOVQ AX, 0x10(SP)    
    toy.go:15    0x47b8fe    c3    RET            

我们可以看到,在这个简单的例子中,topLevelFunctioninnerFunctioninvoke.func1)以及它们传递给 useFunction 的方式在机器代码中没有结构上的区别。

(将此与 innerFunction 捕获局部变量的情况进行比较是有益的;以及将 innerFunction 通过全局变量而不是函数参数传递的情况 - 但这些留作读者的练习。)


It seems like there is no difference. We can check in the generated machine code.

Here is a toy program:

package main

import "fmt"

func topLevelFunction(x int) int {
    return x + 4

func useFunction(fn func(int) int) {

func invoke() {
    innerFunction := func(x int) int {
        return x + 8

func main() {

And here is its disassembly:

$ go version
go version go1.8.5 linux/amd64

$ go tool objdump -s 'main\.(invoke|topLevel)' bin/toy 
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:6	0x47b7a0	488b442408	MOVQ 0x8(SP), AX	
    toy.go:6	0x47b7a5	4883c004	ADDQ $0x4, AX		
    toy.go:6	0x47b7a9	4889442410	MOVQ AX, 0x10(SP)	
    toy.go:6	0x47b7ae	c3		RET			

TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:13	0x47b870	64488b0c25f8ffffff	FS MOVQ FS:0xfffffff8, CX		
    toy.go:13	0x47b879	483b6110		CMPQ 0x10(CX), SP			
    toy.go:13	0x47b87d	7638			JBE 0x47b8b7				
    toy.go:13	0x47b87f	4883ec10		SUBQ $0x10, SP				
    toy.go:13	0x47b883	48896c2408		MOVQ BP, 0x8(SP)			
    toy.go:13	0x47b888	488d6c2408		LEAQ 0x8(SP), BP			
    toy.go:17	0x47b88d	488d052cfb0200		LEAQ 0x2fb2c(IP), AX			
    toy.go:17	0x47b894	48890424		MOVQ AX, 0(SP)				
    toy.go:17	0x47b898	e813ffffff		CALL main.useFunction(SB)		
    toy.go:14	0x47b89d	488d0514fb0200		LEAQ 0x2fb14(IP), AX			
    toy.go:18	0x47b8a4	48890424		MOVQ AX, 0(SP)				
    toy.go:18	0x47b8a8	e803ffffff		CALL main.useFunction(SB)		
    toy.go:19	0x47b8ad	488b6c2408		MOVQ 0x8(SP), BP			
    toy.go:19	0x47b8b2	4883c410		ADDQ $0x10, SP				
    toy.go:19	0x47b8b6	c3			RET					
    toy.go:13	0x47b8b7	e874f7fcff		CALL runtime.morestack_noctxt(SB)	
    toy.go:13	0x47b8bc	ebb2			JMP main.invoke(SB)			

TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:15	0x47b8f0	488b442408	MOVQ 0x8(SP), AX	
    toy.go:15	0x47b8f5	4883c008	ADDQ $0x8, AX		
    toy.go:15	0x47b8f9	4889442410	MOVQ AX, 0x10(SP)	
    toy.go:15	0x47b8fe	c3		RET			

As we can see, at least in this simple case, there is no structural difference in how topLevelFunction and innerFunction (invoke.func1), and their passing to useFunction, are translated to machine code.

(It is instructive to compare this to the case where innerFunction does capture a local variable; and to the case where, moreover, innerFunction is passed via a global variable rather than a function argument — but these are left as an exercise to the reader.)


得分: 1



在我的笔记本电脑上,funcTest每次迭代需要2.0纳秒,closureTest需要2.2纳秒,closureTestLocal需要1.9纳秒。在这里,closureTest与funcTest相比,似乎证实了您(和我)的假设,即闭包调用比函数调用慢。但请注意,这些测试函数故意设计得简单而小,以突出调用速度的差异,而且只有10%的差异。实际上,检查编译器输出显示,在funcTest的情况下,编译器实际上内联了funcTest而不是调用它。因此,如果没有这个内联,我期望差异会更小。但更重要的是,我想指出,尽管closureTestLocal实际上是一个捕获闭包,但它比(内联的)函数快5%。请注意,两个闭包都没有被内联或优化掉 - 两个闭包测试都忠实地进行了所有的调用。我在编译后的代码中唯一看到的区别是局部闭包的情况下完全在堆栈上操作,而其他两个函数都通过地址访问全局变量(在内存中的某个位置)。但是,尽管我可以通过查看编译后的代码来推理出差异,但我的观点是 - 即使在最简单的情况下,这并不是绝对的。

因此,如果速度对您来说真的很重要,我建议进行基准测试(使用实际的代码)。您还可以使用go tool objdump来分析生成的实际代码,以了解差异来自何处。但作为一个经验法则,我建议更专注于编写更好的代码(无论对您来说意味着什么),而忽略实际调用的速度(如“避免过早优化”)。


It generally should. And probably even more so with compiler optimization taken into account (as reasoning about a function is generally easier then about a closure, so I would expect a compiler to tend to optimize a function more often then an equivalent closure). But it is not exactly black and white as many factors may affect the final code produced, including your platform and version of the compiler itself. And more importantly, your other code will typically affect performance much more then speed of making a call (both algorithm wise and lines of code wise), which seems to be the point JimB made.

For example, I wrote following sample code and then benchmarked it.

var (
	test int64

const (
	testThreshold = int64(1000000000)

func someFunc() {
	test += 1

func funcTest(threshold int64) int64 {
	test = 0
	for i := int64(0); i < threshold; i++ {
	return test

func closureTest(threshold int64) int64 {
	someClosure := func() {
		test += 1

	test = 0
	for i := int64(0); i < threshold; i++ {
	return test

func closureTestLocal(threshold int64) int64 {
	var localTest int64
	localClosure := func() {
		localTest += 1

	localTest = 0
	for i := int64(0); i < threshold; i++ {
	return localTest

On my laptop, funcTest takes 2.0 ns per iteration, closureTest takes 2.2 ns and
closureTestLocal takes 1.9ns. Here, closureTest vs funcTest appears confirming your (and mine) assumption that a closure call will be slower then a function call. But please note that those test functions were intentionally made simple and small to make call speed difference to stand out and it's still only 10% difference. In fact, checking compiler output shows that actually in funcTest case compiler did inline funcTest instead of calling it. So, I would expect the difference be even smaller if it didn't. But more importantly, I'd like to point out that closureTestLocal is 5% faster then the (inlined) function even though this one is actually a capturing closure. Please note that neither of the closures was inlined or optimized out - both closure tests faithfully make all the calls. The only difference I see in the compiled code for local closure case operates completely on the stack, while both other functions access a global variable (somewhere in memory) by it's address. But whilst I easily can reason about the difference by looking at the compiled code, my point is - it's not exactly black and white even in the simplest cases.

So, if speed is really that important in your case, I would suggest benchmarking it instead (and with actual code). You also could use go tool objdump to analyze actual code produced to get a clue where difference comes from. But as a rule of thumb, I would suggest to rather focus on writing better code (whatever that means for you) and ignore speed of actual calls (as in "avoid premature optimization").


得分: 0


body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})

I don't think scope of function declaration can harm performance. Also it's common to inline lambda in call. I'd write it

body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})

  • 本文由 发表于 2017年8月29日 19:25:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/45937924.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
