英文:
Why Go use cgo on Windows for a simple File.Write?
问题
将一个简单的C#程序重写为Go时,我发现生成的可执行文件的速度要慢3到4倍。尤其是Go版本使用了3到4倍的CPU。这很令人惊讶,因为代码执行了许多I/O操作,不应该消耗大量的CPU。
我制作了一个非常简单的版本,只进行顺序写入,并进行了基准测试。我在Windows 10和Linux(Debian Jessie)上运行了相同的基准测试。虽然无法进行直接比较(系统、磁盘等不同),但结果很有趣。
我在两个平台上都使用了相同的Go版本:1.6
在Windows上,os.File.Write使用了cgo(参见下面的runtime.cgocall
),而在Linux上没有。为什么会这样?
以下是disk.go程序的代码:
package main
import (
"crypto/rand"
"fmt"
"os"
"time"
)
const (
// 测试文件的大小
fullSize = 268435456
// 每次读写的大小
partSize = 128
// 临时测试文件的路径
filePath = "./bigfile.tmp"
)
func main() {
buffer := make([]byte, partSize)
seqWrite := func() error {
return sequentialWrite(filePath, fullSize, buffer)
}
err := fillBuffer(buffer)
panicIfError(err)
duration, err := durationOf(seqWrite)
panicIfError(err)
fmt.Printf("Duration : %v\n", duration)
}
// 这只是一个测试 ;)
func panicIfError(err error) {
if err != nil {
panic(err)
}
}
func durationOf(f func() error) (time.Duration, error) {
startTime := time.Now()
err := f()
return time.Since(startTime), err
}
func fillBuffer(buffer []byte) error {
_, err := rand.Read(buffer)
return err
}
func sequentialWrite(filePath string, fullSize int, buffer []byte) error {
desc, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
return err
}
defer func() {
desc.Close()
err := os.Remove(filePath)
panicIfError(err)
}()
var totalWrote int
for totalWrote < fullSize {
wrote, err := desc.Write(buffer)
totalWrote += wrote
if err != nil {
return err
}
}
return nil
}
以下是benchmark测试(disk_test.go)的代码:
package main
import (
"testing"
)
// go test -bench SequentialWrite -cpuprofile=cpu.out
// Windows : go tool pprof -text -nodecount=10 ./disk.test.exe cpu.out
// Linux : go tool pprof -text -nodecount=10 ./disk.test cpu.out
func BenchmarkSequentialWrite(t *testing.B) {
buffer := make([]byte, partSize)
err := sequentialWrite(filePath, fullSize, buffer)
panicIfError(err)
}
Windows的结果(使用cgo):
11.68秒的11.95秒总时间(97.74%)
删除了18个节点(累计时间小于等于0.06秒)
显示前10个节点中的前26个(累计时间大于等于0.09秒)
flat flat% sum% cum cum%
11.08秒 92.72% 92.72% 11.20秒 93.72% runtime.cgocall
0.11秒 0.92% 93.64% 0.11秒 0.92% runtime.deferreturn
0.09秒 0.75% 94.39% 11.45秒 95.82% os.(*File).write
0.08秒 0.67% 95.06% 0.16秒 1.34% runtime.deferproc.func1
0.07秒 0.59% 95.65% 0.07秒 0.59% runtime.newdefer
0.06秒 0.5% 96.15% 0.28秒 2.34% runtime.systemstack
0.06秒 0.5% 96.65% 11.25秒 94.14% syscall.Write
0.05秒 0.42% 97.07% 0.07秒 0.59% runtime.deferproc
0.04秒 0.33% 97.41% 11.49秒 96.15% os.(*File).Write
0.04秒 0.33% 97.74% 0.09秒 0.75% syscall.(*LazyProc).Find
Linux的结果(没有使用cgo):
5.04秒的5.10秒总时间(98.82%)
删除了5个节点(累计时间小于等于0.03秒)
显示前10个节点中的前19个(累计时间大于等于0.06秒)
flat flat% sum% cum cum%
4.62秒 90.59% 90.59% 4.87秒 95.49% syscall.Syscall
0.09秒 1.76% 92.35% 0.09秒 1.76% runtime/internal/atomic.Cas
0.08秒 1.57% 93.92% 0.19秒 3.73% runtime.exitsyscall
0.06秒 1.18% 95.10% 4.98秒 97.65% os.(*File).write
0.04秒 0.78% 95.88% 5.10秒 100% _/home/sam/Provisoire/go-disk.sequentialWrite
0.04秒 0.78% 96.67% 5.05秒 99.02% os.(*File).Write
0.04秒 0.78% 97.45% 0.04秒 0.78% runtime.memclr
0.03秒 0.59% 98.04% 0.08秒 1.57% runtime.exitsyscallfast
0.02秒 0.39% 98.43% 0.03秒 0.59% os.epipecheck
0.02秒 0.39% 98.82% 0.06秒 1.18% runtime.casgstatus
英文:
Rewriting a simple program from C# to Go, I found the resulting executable 3 to 4 times slower. Expecialy the Go version use 3 to 4 times more CPU. It's surprising because the code does many I/O and is not supposed to consume significant amount of CPU.
I made a very simple version only doing sequential writes, and made benchmarks. I ran the same benchmarks on Windows 10 and Linux (Debian Jessie). The time can't be compared (not the same systems, disks, ...) but the result is interesting.
I'm using the same Go version on both platforms : 1.6
On Windows os.File.Write use cgo (see runtime.cgocall
below), not on Linux. Why ?
Here is the disk.go program :
package main
import (
"crypto/rand"
"fmt"
"os"
"time"
)
const (
// size of the test file
fullSize = 268435456
// size of read/write per call
partSize = 128
// path of temporary test file
filePath = "./bigfile.tmp"
)
func main() {
buffer := make([]byte, partSize)
seqWrite := func() error {
return sequentialWrite(filePath, fullSize, buffer)
}
err := fillBuffer(buffer)
panicIfError(err)
duration, err := durationOf(seqWrite)
panicIfError(err)
fmt.Printf("Duration : %v\n", duration)
}
// It's just a test ;)
func panicIfError(err error) {
if err != nil {
panic(err)
}
}
func durationOf(f func() error) (time.Duration, error) {
startTime := time.Now()
err := f()
return time.Since(startTime), err
}
func fillBuffer(buffer []byte) error {
_, err := rand.Read(buffer)
return err
}
func sequentialWrite(filePath string, fullSize int, buffer []byte) error {
desc, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
return err
}
defer func() {
desc.Close()
err := os.Remove(filePath)
panicIfError(err)
}()
var totalWrote int
for totalWrote < fullSize {
wrote, err := desc.Write(buffer)
totalWrote += wrote
if err != nil {
return err
}
}
return nil
}
The benchmark test (disk_test.go) :
package main
import (
"testing"
)
// go test -bench SequentialWrite -cpuprofile=cpu.out
// Windows : go tool pprof -text -nodecount=10 ./disk.test.exe cpu.out
// Linux : go tool pprof -text -nodecount=10 ./disk.test cpu.out
func BenchmarkSequentialWrite(t *testing.B) {
buffer := make([]byte, partSize)
err := sequentialWrite(filePath, fullSize, buffer)
panicIfError(err)
}
The Windows result (with cgo) :
11.68s of 11.95s total (97.74%)
Dropped 18 nodes (cum <= 0.06s)
Showing top 10 nodes out of 26 (cum >= 0.09s)
flat flat% sum% cum cum%
11.08s 92.72% 92.72% 11.20s 93.72% runtime.cgocall
0.11s 0.92% 93.64% 0.11s 0.92% runtime.deferreturn
0.09s 0.75% 94.39% 11.45s 95.82% os.(*File).write
0.08s 0.67% 95.06% 0.16s 1.34% runtime.deferproc.func1
0.07s 0.59% 95.65% 0.07s 0.59% runtime.newdefer
0.06s 0.5% 96.15% 0.28s 2.34% runtime.systemstack
0.06s 0.5% 96.65% 11.25s 94.14% syscall.Write
0.05s 0.42% 97.07% 0.07s 0.59% runtime.deferproc
0.04s 0.33% 97.41% 11.49s 96.15% os.(*File).Write
0.04s 0.33% 97.74% 0.09s 0.75% syscall.(*LazyProc).Find
The Linux result (without cgo) :
5.04s of 5.10s total (98.82%)
Dropped 5 nodes (cum <= 0.03s)
Showing top 10 nodes out of 19 (cum >= 0.06s)
flat flat% sum% cum cum%
4.62s 90.59% 90.59% 4.87s 95.49% syscall.Syscall
0.09s 1.76% 92.35% 0.09s 1.76% runtime/internal/atomic.Cas
0.08s 1.57% 93.92% 0.19s 3.73% runtime.exitsyscall
0.06s 1.18% 95.10% 4.98s 97.65% os.(*File).write
0.04s 0.78% 95.88% 5.10s 100% _/home/sam/Provisoire/go-disk.sequentialWrite
0.04s 0.78% 96.67% 5.05s 99.02% os.(*File).Write
0.04s 0.78% 97.45% 0.04s 0.78% runtime.memclr
0.03s 0.59% 98.04% 0.08s 1.57% runtime.exitsyscallfast
0.02s 0.39% 98.43% 0.03s 0.59% os.epipecheck
0.02s 0.39% 98.82% 0.06s 1.18% runtime.casgstatus
答案1
得分: 6
Go语言不执行文件I/O操作,而是将任务委托给操作系统。可以查看Go操作系统相关的syscall
包。
Linux和Windows是不同的操作系统,具有不同的操作系统ABI。例如,Linux通过syscall.Syscall
使用系统调用,而Windows使用Windows动态链接库(dll)。在Windows上,dll调用是一个C调用,它不使用cgo
。它通过与cgo
相同的动态C指针检查(runtime.cgocall
)进行处理。没有runtime.wincall
别名。
总之,不同的操作系统具有不同的操作系统调用机制。
> 命令cgo
>
> 传递指针
>
> Go是一种带有垃圾回收的语言,垃圾回收器需要知道指向Go内存的每个指针的位置。因此,在Go和C之间传递指针有一些限制。
>
> 在本节中,Go指针一词表示指向由Go分配的内存的指针(例如使用&
运算符或调用预定义的new函数),而C指针一词表示指向由C分配的内存的指针(例如通过调用C.malloc分配)。指针是Go指针还是C指针是一个动态属性,由内存的分配方式决定,与指针的类型无关。
>
> Go代码可以将Go指针传递给C,前提是它指向的Go内存不包含任何Go指针。C代码必须保持这个属性:它不能在Go内存中存储任何Go指针,即使是临时的。当将指针传递给结构体中的字段时,所涉及的Go内存是字段所占用的内存,而不是整个结构体。当将指针传递给数组或切片中的元素时,所涉及的Go内存是整个数组或切片的内存。
>
> C代码在调用返回后不能保留Go指针的副本。
>
> 由C代码调用的Go函数不能返回Go指针。由C代码调用的Go函数可以接受C指针作为参数,并且可以通过这些指针存储非指针或C指针数据,但它不能将Go指针存储在由C指针指向的内存中。由C代码调用的Go函数可以接受Go指针作为参数,但它必须保持这样的属性:它指向的Go内存不包含任何Go指针。
>
> Go代码不能将Go指针存储在C内存中。C代码可以将Go指针存储在C内存中,但必须遵守上述规则:当C函数返回时,必须停止存储Go指针。
>
> 这些规则在运行时进行动态检查。检查由GODEBUG环境变量的cgocheck设置控制。默认设置为GODEBUG=cgocheck=1,它实现了相对廉价的动态检查。可以使用GODEBUG=cgocheck=0完全禁用这些检查。通过GODEBUG=cgocheck=2可以完全检查指针处理,但会增加运行时间的成本。
>
> 使用unsafe包可以绕过此强制执行,当然,没有任何限制C代码执行任何操作。然而,违反这些规则的程序可能会以意想不到和不可预测的方式失败。
“这些规则在运行时进行动态检查。”
基准测试:
简而言之,有谎言、该死的谎言和基准测试。
要进行跨操作系统的有效比较,需要在相同的硬件上运行。例如,CPU、内存和磁盘I/O的差异。我在同一台机器上双启动了Linux和Windows。
至少连续运行三次基准测试。操作系统会尽力优化,例如缓存I/O。使用虚拟机的语言需要预热时间等等。
了解你正在测量的内容。如果你正在进行顺序I/O,你几乎所有的时间都花在操作系统上。你关闭了恶意软件保护了吗?等等。
等等。
以下是在同一台机器上使用双启动的Windows和Linux运行disk.go
的一些结果。
Windows:
>go build disk.go
>/TimeMem disk
Duration : 18.3300322s
Elapsed time : 18.38
Kernel time : 13.71 (74.6%)
User time : 4.62 (25.1%)
Linux:
$ go build disk.go
$ time ./disk
Duration : 18.54350723s
real 0m18.547s
user 0m2.336s
sys 0m16.236s
实际上,它们是相同的,disk.go
的运行时间为18秒。只是在操作系统之间有一些变化,关于什么被计为用户时间和什么被计为内核或系统时间。经过的时间或实际时间是相同的。
在你的测试中,runtime.cgocall
的内核或系统时间为93.72%,而syscall.Syscall
为95.49%。
英文:
Go does not perform file I/O, it delegates the task to the operating system. See the Go operating system dependent syscall
packages.
Linux and Windows are different operating systems with different OS ABIs. For example, Linux uses syscalls via syscall.Syscall
and Windows uses Windows dlls. On Windows, the dll call is a C call. It doesn't use cgo
. It does go through the same dynamic C pointer check used by cgo
, runtime.cgocall
. There is no runtime.wincall
alias.
In summary, different operating systems have different OS call mechanisms.
> Command cgo
>
> Passing pointers
>
> Go is a garbage collected language, and the garbage collector needs
> to know the location of every pointer to Go memory. Because of this,
> there are restrictions on passing pointers between Go and C.
>
> In this section the term Go pointer means a pointer to memory
> allocated by Go (such as by using the & operator or calling the
> predefined new function) and the term C pointer means a pointer to
> memory allocated by C (such as by a call to C.malloc). Whether a
> pointer is a Go pointer or a C pointer is a dynamic property
> determined by how the memory was allocated; it has nothing to do with
> the type of the pointer.
>
> Go code may pass a Go pointer to C provided the Go memory to which it
> points does not contain any Go pointers. The C code must preserve this
> property: it must not store any Go pointers in Go memory, even
> temporarily. When passing a pointer to a field in a struct, the Go
> memory in question is the memory occupied by the field, not the entire
> struct. When passing a pointer to an element in an array or slice, the
> Go memory in question is the entire array or the entire backing array
> of the slice.
>
> C code may not keep a copy of a Go pointer after the call returns.
>
> A Go function called by C code may not return a Go pointer. A Go
> function called by C code may take C pointers as arguments, and it may
> store non-pointer or C pointer data through those pointers, but it may
> not store a Go pointer in memory pointed to by a C pointer. A Go
> function called by C code may take a Go pointer as an argument, but it
> must preserve the property that the Go memory to which it points does
> not contain any Go pointers.
>
> Go code may not store a Go pointer in C memory. C code may store Go
> pointers in C memory, subject to the rule above: it must stop storing
> the Go pointer when the C function returns.
>
> These rules are checked dynamically at runtime. The checking is
> controlled by the cgocheck setting of the GODEBUG environment
> variable. The default setting is GODEBUG=cgocheck=1, which implements
> reasonably cheap dynamic checks. These checks may be disabled entirely
> using GODEBUG=cgocheck=0. Complete checking of pointer handling, at
> some cost in run time, is available via GODEBUG=cgocheck=2.
>
> It is possible to defeat this enforcement by using the unsafe package,
> and of course there is nothing stopping the C code from doing anything
> it likes. However, programs that break these rules are likely to fail
> in unexpected and unpredictable ways.
"These rules are checked dynamically at runtime."
Benchmarks:
To paraphrase, there are lies, damn lies, and benchmarks.
For valid comparisons across operating systems you need to run on identical hardware. For example, the difference between CPUs, memory, and rust or silicon disk I/O. I dual-boot Linux and Windows on the same machine.
Run benchmarks at least three times back-to-back. Operating systems try to be smart. For example, caching I/O. Languages using virtual machines need warm-up time. And so on.
Know what you are measuring. If you are doing sequential I/O, you spend almost all your time in the operating system. Have you turned off malware protection? And so on.
And so on.
Here are some results for disk.go
from the same machine using dual-boot Windows and Linux.
Windows:
>go build disk.go
>/TimeMem disk
Duration : 18.3300322s
Elapsed time : 18.38
Kernel time : 13.71 (74.6%)
User time : 4.62 (25.1%)
Linux:
$ go build disk.go
$ time ./disk
Duration : 18.54350723s
real 0m18.547s
user 0m2.336s
sys 0m16.236s
Effectively, they are the same, 18 seconds disk.go
duration. Just some variation between operating systems as to what is counted user time and what is counted as kernel or system time. Elapsed or real time is the same.
In your tests, kernel or system time was 93.72% runtime.cgocall
versus 95.49% syscall.Syscall
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论