2015年2月17日 12:45:43go评论165阅读模式

英文:

Counting occurrence of character in slice in Go

问题

好的，以下是翻译好的内容：

好的，我遇到了一个难题。

编辑：
在我的count()函数中使用bytes.IndexByte()可以使其运行速度快近两倍。bytes.IndexByte()是用汇编语言编写的，而不是Go语言。虽然还不及C语言的速度，但已经接近了。

我有两个程序，一个是用C语言写的，一个是用Go语言写的，它们都用于计算文件中的换行符数量。非常简单。C语言程序在大约1.5秒内运行完毕，而Go语言程序在大约4.25秒内运行完毕，针对一个2.4GB的文件。

我是否达到了Go语言的速度极限？如果是这样，到底是什么原因导致的？我可以读懂C语言，但无法读懂汇编语言，所以比较C语言的汇编代码和Go语言的汇编代码对我来说没有太大帮助，除了显示Go语言的汇编代码比C语言的多了大约400行（忽略.ascii部分）。

虽然我知道Go语言无法与C语言一一对应，但我不会认为会有4倍的减速。

有什么想法吗？

这是Go语言的cpuprofile：
在Go语言中统计切片中字符的出现次数。

这是C语言的代码（使用gcc -Wall -pedantic -O9编译）：

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

#define BUFFER_SIZE (16 * 1024)

int
main()
{

    const char *file = "big.txt";
    int fd = open (file, O_RDONLY);
    char buf[BUFFER_SIZE + 1];
    uintmax_t bytes;
    size_t bytes_read;
    size_t lines;

    posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL);
    while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
    {
        char *p = buf;

        // error checking

        while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
          {
            ++p;
            ++lines;
          }
        bytes += bytes_read;
    }
    printf("%zu\n", bytes);
    printf("%zu\n", lines);
    return 0;
}

而这是Go语言的代码：

package main

import (
    "flag"
    "fmt"
    "io"
    "os"
    "runtime/pprof"
    "syscall"
)

const (
    POSIX_FADV_SEQUENTIAL = 2

    NewLineByte = '\n' // or 10
    BufferSize  = (16 * 1024) + 1
)

var Buffer = make([]byte, BufferSize)

func fadvise(file *os.File, off, length int, advice uint32) error {
    _, _, errno := syscall.Syscall6(syscall.SYS_FADVISE64, file.Fd(), uintptr(off), uintptr(length), uintptr(advice), 0, 0)
    if errno != 0 {
        return errno
    }
    return nil
}

func count(s []byte) int64 {
    count := int64(0)
    for i := 0; i < len(s); i++ {
        if s[i] == NewLineByte {
            count++
        }
    }
    return count
}

func main() {

    file, err := os.Open("big.txt")
    if err != nil {
        panic(err)
    }

    var lines int64
    var bytes int64

    fadvise(file, 0, 0, POSIX_FADV_SEQUENTIAL)
    for {

        n, err := file.Read(Buffer)
        if err != nil && err != io.EOF {
            panic(err)
        }

        lines += count(Buffer[:n])
        bytes += int64(n)

        if err == io.EOF {
            break
        }
    }

    fmt.Printf("%d\n", bytes)
    fmt.Printf("%d\n", lines)
}

英文:

Okay, so I've hit a brick wall.

Edit:
Using bytes.IndexByte() in my count() function makes it run almost twice as fast. bytes.IndexByte() is written in assembly instead of Go. Still not C speed, but closer.

I have two programs, one in C and one in Go that both count newlines in a file. Super simple. The C program runs in ~1.5 seconds, the Go in ~4.25 seconds on a 2.4GB file.

Am I hitting Go's speed limit? If so, what, exactly, is causing this? I can read C, but I can't read Assembly so comparing the C's asm and the Go's asm doesn't do much to me except show that the Go has ~400 more lines (ignoring the .ascii section).

While I know Go can't match C step-for-step, I wouldn't assume a 4x slowdown.

Ideas?

Here's the cpuprofile of the Go:
在Go语言中统计切片中字符的出现次数。

Here's the C (compiled w/ gcc -Wall -pedantic -O9)

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;stdint.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;errno.h&gt;

#define BUFFER_SIZE (16 * 1024)

int
main()
{

	const char *file = &quot;big.txt&quot;;
	int fd = open (file, O_RDONLY);
	char buf[BUFFER_SIZE + 1];
	uintmax_t bytes;
	size_t bytes_read;
	size_t lines;

	posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL);
	while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) &gt; 0)
	{
		char *p = buf;

        // error checking

		while ((p = memchr (p, &#39;\n&#39;, (buf + bytes_read) - p)))
		  {
			++p;
			++lines;
		  }
		bytes += bytes_read;
	}
	printf(&quot;%zu\n&quot;, bytes);
	printf(&quot;%zu\n&quot;, lines);
	return 0;
}

And the Go:

package main

import (
	&quot;flag&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;os&quot;
	&quot;runtime/pprof&quot;
	&quot;syscall&quot;
)

const (
	POSIX_FADV_SEQUENTIAL = 2

	NewLineByte = &#39;\n&#39; // or 10
	BufferSize  = (16 * 1024) + 1
)

var Buffer = make([]byte, BufferSize)

func fadvise(file *os.File, off, length int, advice uint32) error {
	_, _, errno := syscall.Syscall6(syscall.SYS_FADVISE64, file.Fd(), uintptr(off), uintptr(length), uintptr(advice), 0, 0)
	if errno != 0 {
		return errno
	}
	return nil
}

func count(s []byte) int64 {
	count := int64(0)
	for i := 0; i &lt; len(s); i++ {
		if s[i] == NewLineByte {
			count++
		}
	}
	return count
}

func main() {

	file, err := os.Open(&quot;big.txt&quot;)
	if err != nil {
		panic(err)
	}

	var lines int64
	var bytes int64

	fadvise(file, 0, 0, POSIX_FADV_SEQUENTIAL)
	for {

		n, err := file.Read(Buffer)
		if err != nil &amp;&amp; err != io.EOF {
			panic(err)
		}

		lines += count(Buffer[:n])
		bytes += int64(n)

		if err == io.EOF {
			break
		}
	}

	fmt.Printf(&quot;%d\n&quot;, bytes)
	fmt.Printf(&quot;%d\n&quot;, lines)
}

答案1

得分: 1

根据这段代码计算文件中的'\n'，在Zorin虚拟机（VMWare Player）上运行大约需要1.26秒（通常更快）。虚拟机配置为6GB RAM，4个核心，并且已连接电源（因为电源管理器有时会限制CPU的电池消耗速度）。主机操作系统是Windows 8。我在一些真实项目中使用Go语言不到6个月，也是一个Linux新手。但我认为问题出在从Go调用C代码上，这比纯Go要慢得多——我在调用一些C代码时也有这种经验，无论是作为dll调用还是使用cgo编译。

package main

import (
	"fmt"
	"io"
	"os"
	"runtime"
	"time"
)

func main() {
	tstart := time.Now()

	file, err := os.Open(filePath)
	if err != nil {
		panic(err)
	}
	defer file.Close()

	done := make(chan bool)
	var cnt int64 = 0
	go func() {
		var Buffer = make([]byte, BufferSize)
		for {
			n, err := file.Read(Buffer)
			if err != nil && err != io.EOF {
				panic(err)
			}

			cnt += count(Buffer[:n])

			if err == io.EOF {
				done <- true
				return
			}
		}
	}()
	<-done
	// 在这种情况下（Zorin ISO映像），应该是5860298，而且确实是。
	fmt.Println(cnt)
	fmt.Printf("%s took %s\n", "counting", time.Since(tstart))
}

func count(s []byte) int64 {
	count := int64(0)
	for i := 0; i < len(s); i++ {
		if s[i] == NewLineByte {
			count++
		}
	}
	return count
}

const (
	NewLineByte = '\n' // 或者10
	BufferSize  = 32 * 1024
)

var (
	filePath = "/.../zorin-os-9.1-core-64++.iso"
	maxt     int
)

func init() {
	maxt = runtime.NumCPU()
	runtime.GOMAXPROCS(maxt)
}

英文:

As far as this is about counting '\n' in a file, this code runs in ~1.26 sec (and mostly faster), on a Zorin VM (VMWare Player), 6 GB RAM, 4 Cores (& power is plugged in; because power managers sometimes prevent CPU from consuming battery too fast), Host OS is Windows 8. I am using Go in some real world projects for less than 6 months and I'm a Linux noob too. But I think the problem is calling C from Go and that's much slower than pure Go - I've experienced this in calling some C code, both as dll and got compiled with cgo.

package main
import (
&quot;fmt&quot;
&quot;io&quot;
&quot;os&quot;
&quot;runtime&quot;
&quot;time&quot;
)
func main() {
tstart := time.Now()
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
done := make(chan bool)
var cnt int64 = 0
go func() {
var Buffer = make([]byte, BufferSize)
for {
n, err := file.Read(Buffer)
if err != nil &amp;&amp; err != io.EOF {
panic(err)
}
cnt += count(Buffer[:n])
if err == io.EOF {
done &lt;- true
return
}
}
}()
&lt;-done
// should be 5860298 in this case (zorin iso image) &amp; it is.
fmt.Println(cnt)
fmt.Printf(&quot;%s took %s\n&quot;, &quot;counting&quot;, time.Since(tstart))
}
func count(s []byte) int64 {
count := int64(0)
for i := 0; i &lt; len(s); i++ {
if s[i] == NewLineByte {
count++
}
}
return count
}
const (
NewLineByte = &#39;\n&#39; // or 10
BufferSize  = 32 * 1024
)
var (
filePath = &quot;/.../zorin-os-9.1-core-64++.iso&quot;
maxt     int
)
func init() {
maxt = runtime.NumCPU()
runtime.GOMAXPROCS(maxt)
}

答案2

得分: 1

这是一个既不太难也不太慢的方法，使用bytes.IndexByte（因为你发现Go的汇编实现对此有帮助）和syscall.Mmap：

package main

import (
	"bytes"
	"fmt"
	"log"
	"os"
	"syscall"
)

func main() {
	if len(os.Args) < 2 {
		log.Fatal("pass filename on command line")
	}
	f, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatal("open: ", err)
	}
	stat, err := f.Stat()
	if err != nil {
		log.Fatal("stat: ", err)

	}
	data, err := syscall.Mmap(int(f.Fd()), 0, int(stat.Size()), syscall.PROT_READ, syscall.MAP_SHARED)
	if err != nil {
		log.Fatal("mmap: ", err)
	}
	newlines := 0
	for {
		i := bytes.IndexByte(data, 10)
		if i == -1 {
			break
		}
		newlines++
		data = data[i+1:]
	}
	fmt.Println(newlines)
}

Mmap看起来很奇怪，但在这里，它的作用就像你将文件读入一个切片一样，只是由于操作系统的帮助，资源消耗较少。

你可以并行计数，不需要太多额外的工作，但我不确定是否值得这样做。（如果单核计数受到内存带宽的限制，那么在amd64上的性能提升可能为零或负值，但我无法快速测试这一点。）

英文:

Here's a not too hard and not too slow way, using bytes.IndexByte (since you found Go's asm implementation of it helped) and syscall.Mmap:

package main
import (
&quot;bytes&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;os&quot;
&quot;syscall&quot;
)
func main() {
if len(os.Args) &lt; 2 {
log.Fatal(&quot;pass filename on command line&quot;)
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal(&quot;open: &quot;, err)
}
stat, err := f.Stat()
if err != nil {
log.Fatal(&quot;stat: &quot;, err)
}
data, err := syscall.Mmap(int(f.Fd()), 0, int(stat.Size()), syscall.PROT_READ, syscall.MAP_SHARED)
if err != nil {
log.Fatal(&quot;mmap: &quot;, err)
}
newlines := 0
for {
i := bytes.IndexByte(data, 10)
if i == -1 {
break
}
newlines++
data = data[i+1:]
}
fmt.Println(newlines)
}

Mmap looks weird, but here it's much as if you'd read the file into a slice, except less resource-intensive thanks to the OS's help.

You can parallelize the counting without too much more work, but I'm not sure that's worth it. (It would not shock me if the gain on amd64 were zero or negative if, for example, single-core counting were limited by memory bandwidth, but that's not quick for me to test.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Go语言中统计切片中字符的出现次数。

问题

答案1

答案2

《Go语言圣经》中的一句话是“所有的Goroutine都在休眠”。

当访问模式为RDWR（读写）时，为什么需要刷新内存映射文件呢？

MongoDB客户端驱动程序是否支持并发安全？

读者接口更改值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论