binary.Read慢吗?

huangapple go评论99阅读模式
英文:

Is binary.Read slow?

问题

我正在将一个旧的小型C项目重写为Go(为了学习Go)。

该项目基本上从文件中读取一些二进制数据,对这些数据进行一些过滤,然后将其打印到stdout。

代码的主要部分如下所示(省略了错误处理):

type netFlowRow struct {
    Timestamp uint32
    Srcip     [4]byte
    Dstip     [4]byte
    Proto     uint16
    Srcport   uint16
    Dstport   uint16
    Pkt       uint32
    Size      uint64
}

func main() {
    // ...
    file, _ := os.Open(path)
    for j := 0; j < infoRow.Count; j++ {
        netRow := netFlowRow{}
        binary.Read(file, binary.BigEndian, &netRow)

        // ...
        fmt.Printf("%v", netRow)
    }
}

在进行了一个简单的重写后,Go版本的运行速度比C版本慢了10倍(约40秒对比2-3秒)。我使用pprof进行了性能分析,结果显示如下:

(pprof) top10
39.96秒中的40.39秒总时间(98.94%)
删除了71个节点(累计时间 <= 0.20秒)
显示了11个节点中的前10个(累计时间 >= 39.87秒)
      flat  flat%   sum%        cum   cum%
    39.87秒 98.71% 98.71%     39.87秒 98.71%  syscall.Syscall
     0.09秒  0.22% 98.94%     40.03秒 99.11%  encoding/binary.Read
         0     0% 98.94%     39.87秒 98.71%  io.ReadAtLeast
         0     0% 98.94%     39.87秒 98.71%  io.ReadFull
         0     0% 98.94%     40.03秒 99.11%  main.main
         0     0% 98.94%     39.87秒 98.71%  os.(*File).Read
         0     0% 98.94%     39.87秒 98.71%  os.(*File).read
         0     0% 98.94%     40.21秒 99.55%  runtime.goexit
         0     0% 98.94%     40.03秒 99.11%  runtime.main
         0     0% 98.94%     39.87秒 98.71%  syscall.Read

我理解得对吗?syscall.Syscall基本上是主要的时间消耗者吗?这是文件读取的地方吗?

更新:
我使用了bufio.Reader并得到了以下性能分析结果:

(pprof) top10
34.16秒中的36秒总时间(94.89%)
删除了99个节点(累计时间 <= 0.18秒)
显示了33个节点中的前10个(累计时间 >= 0.56秒)
      flat  flat%   sum%        cum   cum%
    31.99秒 88.86% 88.86%        32秒 88.89%  syscall.Syscall
     0.43秒  1.19% 90.06%      0.64秒  1.78%  runtime.mallocgc
     0.39秒  1.08% 91.14%      1.06秒  2.94%  encoding/binary.(*decoder).value
     0.28秒  0.78% 91.92%      0.99秒  2.75%  reflect.(*structType).Field
     0.28秒  0.78% 92.69%      0.28秒  0.78%  runtime.duffcopy
     0.24秒  0.67% 93.36%      1.64秒  4.56%  encoding/binary.sizeof
     0.22秒  0.61% 93.97%     34.51秒 95.86%  encoding/binary.Read
     0.22秒  0.61% 94.58%      0.22秒  0.61%  runtime.mach_semaphore_signal
     0.07秒  0.19% 94.78%      1.28秒  3.56%  reflect.(*rtype).Field
     0.04秒  0.11% 94.89%      0.56秒  1.56%  runtime.newobject
英文:

I'm rewriting an old small C project into Go (to learn Go).

The project basically reads some binary data from a file, does some filtering on said data, then prints it into stdout.

The main part of the code looks like this (omitting error handling):

type netFlowRow struct {
	Timestamp uint32
	Srcip     [4]byte
	Dstip     [4]byte
	Proto     uint16
	Srcport   uint16
	Dstport   uint16
	Pkt       uint32
	Size      uint64
}

func main() {
    // ...
    file, _ := os.Open(path)
	for j := 0; j &lt; infoRow.Count; j++ {
		netRow := netFlowRow{}
		binary.Read(file, binary.BigEndian, &amp;netRow)

        // ...
        fmt.Printf(&quot;%v&quot;, netRow)
	}
}

After doing a naive rewrite go version ran 10 times slower than the C version (~40s vs 2-3s). I did profiling with pprof and it showed me this:

(pprof) top10
39.96s of 40.39s total (98.94%)
Dropped 71 nodes (cum &lt;= 0.20s)
Showing top 10 nodes out of 11 (cum &gt;= 39.87s)
      flat  flat%   sum%        cum   cum%
    39.87s 98.71% 98.71%     39.87s 98.71%  syscall.Syscall
     0.09s  0.22% 98.94%     40.03s 99.11%  encoding/binary.Read
         0     0% 98.94%     39.87s 98.71%  io.ReadAtLeast
         0     0% 98.94%     39.87s 98.71%  io.ReadFull
         0     0% 98.94%     40.03s 99.11%  main.main
         0     0% 98.94%     39.87s 98.71%  os.(*File).Read
         0     0% 98.94%     39.87s 98.71%  os.(*File).read
         0     0% 98.94%     40.21s 99.55%  runtime.goexit
         0     0% 98.94%     40.03s 99.11%  runtime.main
         0     0% 98.94%     39.87s 98.71%  syscall.Read

Am I reading this right? Is syscall.Syscall basically the main time consumer? Is it where the reading from file is going on?

Upd.
I used bufio.Reader and got this profile:

(pprof) top10
34.16s of 36s total (94.89%)
Dropped 99 nodes (cum &lt;= 0.18s)
Showing top 10 nodes out of 33 (cum &gt;= 0.56s)
      flat  flat%   sum%        cum   cum%
    31.99s 88.86% 88.86%        32s 88.89%  syscall.Syscall
     0.43s  1.19% 90.06%      0.64s  1.78%  runtime.mallocgc
     0.39s  1.08% 91.14%      1.06s  2.94%  encoding/binary.(*decoder).value
     0.28s  0.78% 91.92%      0.99s  2.75%  reflect.(*structType).Field
     0.28s  0.78% 92.69%      0.28s  0.78%  runtime.duffcopy
     0.24s  0.67% 93.36%      1.64s  4.56%  encoding/binary.sizeof
     0.22s  0.61% 93.97%     34.51s 95.86%  encoding/binary.Read
     0.22s  0.61% 94.58%      0.22s  0.61%  runtime.mach_semaphore_signal
     0.07s  0.19% 94.78%      1.28s  3.56%  reflect.(*rtype).Field
     0.04s  0.11% 94.89%      0.56s  1.56%  runtime.newobject

答案1

得分: 5

binary.Read会比较慢,因为它使用了反射(reflection)。我建议使用bufio.Reader进行基准测试,并手动调用binary.BigEndian方法来读取你的结构体:

type netFlowRow struct {
    Timestamp uint32   // 0
    Srcip     [4]byte  // 4
    Dstip     [4]byte  // 8
    Proto     uint16   // 12
    Srcport   uint16   // 14
    Dstport   uint16   // 16
    Pkt       uint32   // 18
    Size      uint64   // 22
}

func main() {
    // ...
    file, _ := os.Open(path)
    r := bufio.NewReader(file)
    for j := 0; j < infoRow.Count; j++ {
        var buff [4 + 4 + 4 + 2 + 2 + 2 + 4 + 8]byte
        if _, err := io.ReadFull(r, buff[:]); err != nil {
            panic(err)
        }
        netRow := netFlowRow{
            Timestamp: binary.BigEndian.Uint32(buff[:4]),
            // Srcip
            // Dstip
            Proto: binary.BigEndian.Uint16(buff[12:14]),
            Srcport: binary.BigEndian.Uint16(buff[14:16]),
            Dstport: binary.BigEndian.Uint16(buff[16:18]),
            Pkt: binary.BigEndian.Uint32(buff[18:22]),
            Size: binary.BigEndian.Uint64(buff[22:30]),
        }
        copy(netRow.Srcip[:], buff[4:8])
        copy(netRow.Dstip[:], buff[8:12])

        // ...
        fmt.Printf("%v", netRow)
    }
}
英文:

binary.Read will be slower, due to the fact that it uses reflection. I would suggest bench-marking using bufio.Reader and manually invoking the binary.BigEndian methods to read your struct:

type netFlowRow struct {
Timestamp uint32   // 0
Srcip     [4]byte  // 4
Dstip     [4]byte  // 8
Proto     uint16   // 12
Srcport   uint16   // 14
Dstport   uint16   // 16
Pkt       uint32   // 18
Size      uint64   // 22
}
func main() {
// ...
file, _ := os.Open(path)
r := bufio.NewReader(file)
for j := 0; j &lt; infoRow.Count; j++ {
var buff [4 + 4 + 4 + 2 + 2 + 2 + 4 + 8]byte
if _, err := io.ReadFull(r, buff[:]); err != nil {
panic(err)
}
netRow := netFlowRow{
Timestamp: binary.BigEndian.Uint32(buff[:4]),
// Srcip
// Dstip
Proto: binary.BigEndian.Uint16(buff[12:14]),
Srcport: binary.BigEndian.Uint16(buff[14:16]),
Dstport: binary.BigEndian.Uint16(buff[16:18]),
Pkt: binary.BigEndian.Uint32(buff[18:22]),
Size: binary.BigEndian.Uint64(buff[22:30]),
}
copy(netRow.Srcip[:], buff[4:8])
copy(netRow.Dstip[:], buff[8:12])
// ...
fmt.Printf(&quot;%v&quot;, netRow)
}
}

huangapple
  • 本文由 发表于 2016年12月31日 02:05:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/41400639.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定