英文:
Counting occurrence of character in slice in Go
问题
好的,以下是翻译好的内容:
好的,我遇到了一个难题。
编辑:
在我的count()
函数中使用bytes.IndexByte()
可以使其运行速度快近两倍。bytes.IndexByte()
是用汇编语言编写的,而不是Go语言。虽然还不及C语言的速度,但已经接近了。
我有两个程序,一个是用C语言写的,一个是用Go语言写的,它们都用于计算文件中的换行符数量。非常简单。C语言程序在大约1.5秒内运行完毕,而Go语言程序在大约4.25秒内运行完毕,针对一个2.4GB的文件。
我是否达到了Go语言的速度极限?如果是这样,到底是什么原因导致的?我可以读懂C语言,但无法读懂汇编语言,所以比较C语言的汇编代码和Go语言的汇编代码对我来说没有太大帮助,除了显示Go语言的汇编代码比C语言的多了大约400行(忽略.ascii部分)。
虽然我知道Go语言无法与C语言一一对应,但我不会认为会有4倍的减速。
有什么想法吗?
这是Go语言的cpuprofile:
这是C语言的代码(使用gcc -Wall -pedantic -O9
编译):
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#define BUFFER_SIZE (16 * 1024)
int
main()
{
const char *file = "big.txt";
int fd = open (file, O_RDONLY);
char buf[BUFFER_SIZE + 1];
uintmax_t bytes;
size_t bytes_read;
size_t lines;
posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL);
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
char *p = buf;
// error checking
while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
{
++p;
++lines;
}
bytes += bytes_read;
}
printf("%zu\n", bytes);
printf("%zu\n", lines);
return 0;
}
而这是Go语言的代码:
package main
import (
"flag"
"fmt"
"io"
"os"
"runtime/pprof"
"syscall"
)
const (
POSIX_FADV_SEQUENTIAL = 2
NewLineByte = '\n' // or 10
BufferSize = (16 * 1024) + 1
)
var Buffer = make([]byte, BufferSize)
func fadvise(file *os.File, off, length int, advice uint32) error {
_, _, errno := syscall.Syscall6(syscall.SYS_FADVISE64, file.Fd(), uintptr(off), uintptr(length), uintptr(advice), 0, 0)
if errno != 0 {
return errno
}
return nil
}
func count(s []byte) int64 {
count := int64(0)
for i := 0; i < len(s); i++ {
if s[i] == NewLineByte {
count++
}
}
return count
}
func main() {
file, err := os.Open("big.txt")
if err != nil {
panic(err)
}
var lines int64
var bytes int64
fadvise(file, 0, 0, POSIX_FADV_SEQUENTIAL)
for {
n, err := file.Read(Buffer)
if err != nil && err != io.EOF {
panic(err)
}
lines += count(Buffer[:n])
bytes += int64(n)
if err == io.EOF {
break
}
}
fmt.Printf("%d\n", bytes)
fmt.Printf("%d\n", lines)
}
英文:
Okay, so I've hit a brick wall.
Edit:
Using bytes.IndexByte()
in my count()
function makes it run almost twice as fast. bytes.IndexByte()
is written in assembly instead of Go. Still not C speed, but closer.
I have two programs, one in C and one in Go that both count newlines in a file. Super simple. The C program runs in ~1.5 seconds, the Go in ~4.25 seconds on a 2.4GB file.
Am I hitting Go's speed limit? If so, what, exactly, is causing this? I can read C, but I can't read Assembly so comparing the C's asm and the Go's asm doesn't do much to me except show that the Go has ~400 more lines (ignoring the .ascii section).
While I know Go can't match C step-for-step, I wouldn't assume a 4x slowdown.
Ideas?
Here's the cpuprofile of the Go:
Here's the C (compiled w/ gcc -Wall -pedantic -O9
)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#define BUFFER_SIZE (16 * 1024)
int
main()
{
const char *file = "big.txt";
int fd = open (file, O_RDONLY);
char buf[BUFFER_SIZE + 1];
uintmax_t bytes;
size_t bytes_read;
size_t lines;
posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL);
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
char *p = buf;
// error checking
while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
{
++p;
++lines;
}
bytes += bytes_read;
}
printf("%zu\n", bytes);
printf("%zu\n", lines);
return 0;
}
And the Go:
package main
import (
"flag"
"fmt"
"io"
"os"
"runtime/pprof"
"syscall"
)
const (
POSIX_FADV_SEQUENTIAL = 2
NewLineByte = '\n' // or 10
BufferSize = (16 * 1024) + 1
)
var Buffer = make([]byte, BufferSize)
func fadvise(file *os.File, off, length int, advice uint32) error {
_, _, errno := syscall.Syscall6(syscall.SYS_FADVISE64, file.Fd(), uintptr(off), uintptr(length), uintptr(advice), 0, 0)
if errno != 0 {
return errno
}
return nil
}
func count(s []byte) int64 {
count := int64(0)
for i := 0; i < len(s); i++ {
if s[i] == NewLineByte {
count++
}
}
return count
}
func main() {
file, err := os.Open("big.txt")
if err != nil {
panic(err)
}
var lines int64
var bytes int64
fadvise(file, 0, 0, POSIX_FADV_SEQUENTIAL)
for {
n, err := file.Read(Buffer)
if err != nil && err != io.EOF {
panic(err)
}
lines += count(Buffer[:n])
bytes += int64(n)
if err == io.EOF {
break
}
}
fmt.Printf("%d\n", bytes)
fmt.Printf("%d\n", lines)
}
答案1
得分: 1
根据这段代码计算文件中的'\n'
,在Zorin虚拟机(VMWare Player)上运行大约需要1.26秒(通常更快)。虚拟机配置为6GB RAM,4个核心,并且已连接电源(因为电源管理器有时会限制CPU的电池消耗速度)。主机操作系统是Windows 8。我在一些真实项目中使用Go语言不到6个月,也是一个Linux新手。但我认为问题出在从Go调用C代码上,这比纯Go要慢得多——我在调用一些C代码时也有这种经验,无论是作为dll
调用还是使用cgo
编译。
package main
import (
"fmt"
"io"
"os"
"runtime"
"time"
)
func main() {
tstart := time.Now()
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
done := make(chan bool)
var cnt int64 = 0
go func() {
var Buffer = make([]byte, BufferSize)
for {
n, err := file.Read(Buffer)
if err != nil && err != io.EOF {
panic(err)
}
cnt += count(Buffer[:n])
if err == io.EOF {
done <- true
return
}
}
}()
<-done
// 在这种情况下(Zorin ISO映像),应该是5860298,而且确实是。
fmt.Println(cnt)
fmt.Printf("%s took %s\n", "counting", time.Since(tstart))
}
func count(s []byte) int64 {
count := int64(0)
for i := 0; i < len(s); i++ {
if s[i] == NewLineByte {
count++
}
}
return count
}
const (
NewLineByte = '\n' // 或者10
BufferSize = 32 * 1024
)
var (
filePath = "/.../zorin-os-9.1-core-64++.iso"
maxt int
)
func init() {
maxt = runtime.NumCPU()
runtime.GOMAXPROCS(maxt)
}
英文:
As far as this is about counting '\n'
in a file, this code runs in ~1.26 sec (and mostly faster), on a Zorin VM (VMWare Player), 6 GB RAM, 4 Cores (& power is plugged in; because power managers sometimes prevent CPU from consuming battery too fast), Host OS is Windows 8. I am using Go in some real world projects for less than 6 months and I'm a Linux noob too. But I think the problem is calling C from Go and that's much slower than pure Go - I've experienced this in calling some C code, both as dll
and got compiled with cgo
.
package main
import (
"fmt"
"io"
"os"
"runtime"
"time"
)
func main() {
tstart := time.Now()
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
done := make(chan bool)
var cnt int64 = 0
go func() {
var Buffer = make([]byte, BufferSize)
for {
n, err := file.Read(Buffer)
if err != nil && err != io.EOF {
panic(err)
}
cnt += count(Buffer[:n])
if err == io.EOF {
done <- true
return
}
}
}()
<-done
// should be 5860298 in this case (zorin iso image) & it is.
fmt.Println(cnt)
fmt.Printf("%s took %s\n", "counting", time.Since(tstart))
}
func count(s []byte) int64 {
count := int64(0)
for i := 0; i < len(s); i++ {
if s[i] == NewLineByte {
count++
}
}
return count
}
const (
NewLineByte = '\n' // or 10
BufferSize = 32 * 1024
)
var (
filePath = "/.../zorin-os-9.1-core-64++.iso"
maxt int
)
func init() {
maxt = runtime.NumCPU()
runtime.GOMAXPROCS(maxt)
}
答案2
得分: 1
这是一个既不太难也不太慢的方法,使用bytes.IndexByte
(因为你发现Go的汇编实现对此有帮助)和syscall.Mmap
:
package main
import (
"bytes"
"fmt"
"log"
"os"
"syscall"
)
func main() {
if len(os.Args) < 2 {
log.Fatal("pass filename on command line")
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal("open: ", err)
}
stat, err := f.Stat()
if err != nil {
log.Fatal("stat: ", err)
}
data, err := syscall.Mmap(int(f.Fd()), 0, int(stat.Size()), syscall.PROT_READ, syscall.MAP_SHARED)
if err != nil {
log.Fatal("mmap: ", err)
}
newlines := 0
for {
i := bytes.IndexByte(data, 10)
if i == -1 {
break
}
newlines++
data = data[i+1:]
}
fmt.Println(newlines)
}
Mmap
看起来很奇怪,但在这里,它的作用就像你将文件读入一个切片一样,只是由于操作系统的帮助,资源消耗较少。
你可以并行计数,不需要太多额外的工作,但我不确定是否值得这样做。(如果单核计数受到内存带宽的限制,那么在amd64
上的性能提升可能为零或负值,但我无法快速测试这一点。)
英文:
Here's a not too hard and not too slow way, using bytes.IndexByte
(since you found Go's asm implementation of it helped) and syscall.Mmap
:
package main
import (
"bytes"
"fmt"
"log"
"os"
"syscall"
)
func main() {
if len(os.Args) < 2 {
log.Fatal("pass filename on command line")
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal("open: ", err)
}
stat, err := f.Stat()
if err != nil {
log.Fatal("stat: ", err)
}
data, err := syscall.Mmap(int(f.Fd()), 0, int(stat.Size()), syscall.PROT_READ, syscall.MAP_SHARED)
if err != nil {
log.Fatal("mmap: ", err)
}
newlines := 0
for {
i := bytes.IndexByte(data, 10)
if i == -1 {
break
}
newlines++
data = data[i+1:]
}
fmt.Println(newlines)
}
Mmap
looks weird, but here it's much as if you'd read the file into a slice, except less resource-intensive thanks to the OS's help.
You can parallelize the counting without too much more work, but I'm not sure that's worth it. (It would not shock me if the gain on amd64
were zero or negative if, for example, single-core counting were limited by memory bandwidth, but that's not quick for me to test.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论