在Go语言中,ASM函数调用的开销

huangapple go评论86阅读模式
英文:

Overhead of ASM-function-call in go

问题

我目前正在使用Go语言进行一些实验,主要涉及到它的汇编语言、浮点运算性能(float32)以及纳秒级别的优化。我对一个简单函数调用的开销感到有些困惑:

func BenchmarkEmpty(b *testing.B) {
    for i := 0; i < b.N; i++ {
    }
}
func BenchmarkNop(b *testing.B) {
    for i := 0; i < b.N; i++ {
        doNop()
    }
}

doNop函数的实现如下:

TEXT ·doNop(SB),0,$0-0
    RET

运行结果(go test -bench .)如下:

BenchmarkEmpty        2000000000               0.30 ns/op
BenchmarkNop  2000000000               1.73 ns/op

我对汇编语言和Go的内部机制不太熟悉。在Go编译器/链接器中,是否有可能内联一个在汇编中定义的函数?我是否可以通过某种方式给链接器一些提示?对于一些简单的函数,比如“将两个R3向量相加”,这样做可能会消耗所有可能的性能提升。

(go 1.4.2,amd64)

英文:

I currently play around with go, it's assembly, performance of floating point operations (float32) and optimizations in the nano-seconds-scale. I was a bit confused by the overhead of a simple function call:

func BenchmarkEmpty(b *testing.B) {
    for i := 0; i &lt; b.N; i++ {
    }
}
func BenchmarkNop(b *testing.B) {
    for i := 0; i &lt; b.N; i++ {
	    doNop()
    }
}

The implementation of doNop:

TEXT &#183;doNop(SB),0,$0-0
    RET

The result (go test -bench .):

BenchmarkEmpty        2000000000               0.30 ns/op
BenchmarkNop  2000000000               1.73 ns/op

Im not used to assembly and/ or the internals of go. It is possible fo the go compiler/ linker to inline a function defined in assembly? Can I give the linker a hint somehow? For some simple functions like 'add two R3-vectors' this eats up all possible performance gain.

(go 1.4.2, amd64)

答案1

得分: 1

汇编函数不会被内联。以下是你可以尝试的三件事:

  1. 将你的循环移到汇编中。例如,使用以下函数:

     func Sum(xs []int64) int64
    

    你可以这样做:

     #include "textflag.h"
    
     TEXT ·Sum(SB),NOSPLIT,$0-24
         MOVQ  xs+0(FP),DI
         MOVQ  xs+8(FP),SI
         MOVQ  $0,CX
         MOVQ  $0,AX
    
     L1: CMPQ  AX,SI           // i < len(xs)
         JGE   Z1
         LEAQ  (DI)(AX*8),BX   // BX = &xs[i]
         MOVQ  (BX),BX         // BX = *BX
         ADDQ  BX,CX           // CX += BX
         INCQ  AX              // i++
         JMP   L1
    
     Z1: MOVQ  CX,ret+24(FP)
         RET
    

    如果你查看标准库,你会看到这方面的例子。

  2. 在C语言中编写一些代码,利用它对内嵌汇编或内置函数的支持,并使用cgo从Go中调用它。

  3. 使用gccgo来完成与第2点相同的事情,只是你可以直接这样做:

     //extern open
     func c_open(name *byte, mode int, perm int) int
    

    参考链接:https://golang.org/doc/install/gccgo#Function_names

英文:

Assembly functions are not inlined. Here are 3 things you could try:

  1. Move your loop into assembly. For example with this function:

     func Sum(xs []int64) int64
    

    You can do this:

     #include &quot;textflag.h&quot;
    
     TEXT &#183;Sum(SB),NOSPLIT,$0-24
         MOVQ  xs+0(FP),DI
         MOVQ  xs+8(FP),SI
         MOVQ  $0,CX
         MOVQ  $0,AX
    
     L1: CMPQ  AX,SI           // i &lt; len(xs)
         JGE   Z1
         LEAQ  (DI)(AX*8),BX   // BX = &amp;xs[i]
         MOVQ  (BX),BX         // BX = *BX
         ADDQ  BX,CX           // CX += BX
         INCQ  AX              // i++
         JMP   L1
    
     Z1: MOVQ  CX,ret+24(FP)
         RET
    

    If you look in the standard libraries you will see examples of this.

  2. Write some of your code in c, leverage the support it has for intrinsics or inline assembly, and use cgo to call it from go.

  3. Use gccgo to do the same thing as #2, except you can do it directly:

     //extern open
     func c_open(name *byte, mode int, perm int) int
    

    https://golang.org/doc/install/gccgo#Function_names

huangapple
  • 本文由 发表于 2015年4月12日 03:57:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/29582377.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定