这在Go语言中是否可以更高效?

huangapple go评论76阅读模式
英文:

Could this be more efficient in Go?

问题

我写了一段代码来演示Go语言中的标准命令grep,但速度远远落后于它,有人能给我一些建议吗?以下是代码:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strings"
	"sync"
)

func parse_args() (file, pat string) {
	if len(os.Args) < 3 {
		log.Fatal("usage: gorep2 <file_name> <pattern>")
	}

	file = os.Args[1]
	pat = os.Args[2]
	return
}

func readFile(file string, to chan<- string) {
	f, err := os.Open(file)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	freader := bufio.NewReader(f)
	for {
		line, er := freader.ReadBytes('\n')
		if er == nil {
			to <- string(line)
		} else {
			break
		}

	}
	close(to)
}

func grepLine(pat string, from <-chan string, result chan<- bool) {
	var wg sync.WaitGroup

	for line := range from {
		wg.Add(1)

		go func(l string) {
			defer wg.Done()
			if strings.Contains(l, pat) {
				result <- true
			}
		}(string(line))
	}

	wg.Wait()
	close(result)
}

func main() {
	file, pat := parse_args()
	text_chan := make(chan string, 10)
	result_chan := make(chan bool, 10)

	go readFile(file, text_chan)
	go grepLine(pat, text_chan, result_chan)

	var total uint = 0
	for r := range result_chan {
		if r == true {
			total += 1
		}
	}

	fmt.Printf("Total %d\n", total)
}

Go中的time

>>> time gogrep /var/log/task.log DEBUG

Total 21089

real	0m0.156s
user	0m0.156s
sys	0m0.015s

grep中的time

>>> time grep DEBUG /var/log/task.log | wc -l

21089

real	0m0.069s
user	0m0.046s
sys	0m0.064s

请问有什么我可以帮助您的吗?

英文:

I wrote a piece of code to illustrate the standard command grep in Go, but the speed is
far behind it, could someone give me any advances? here is the code:

<!-- language: Go -->

package main
import (
&quot;bufio&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;os&quot;
&quot;strings&quot;
&quot;sync&quot;
)
func parse_args() (file, pat string) {
if len(os.Args) &lt; 3 {
log.Fatal(&quot;usage: gorep2 &lt;file_name&gt; &lt;pattern&gt;&quot;)
}
file = os.Args[1]
pat = os.Args[2]
return
}
func readFile(file string, to chan&lt;- string) {
f, err := os.Open(file)
if err != nil {
log.Fatal(err)
}
defer f.Close()
freader := bufio.NewReader(f)
for {
line, er := freader.ReadBytes(&#39;\n&#39;)
if er == nil {
to &lt;- string(line)
} else {
break
}
}
close(to)
}
func grepLine(pat string, from &lt;-chan string, result chan&lt;- bool) {
var wg sync.WaitGroup
for line := range from {
wg.Add(1)
go func(l string) {
defer wg.Done()
if strings.Contains(l, pat) {
result &lt;- true
}
}(string(line))
}
wg.Wait()
close(result)
}
func main() {
file, pat := parse_args()
text_chan := make(chan string, 10)
result_chan := make(chan bool, 10)
go readFile(file, text_chan)
go grepLine(pat, text_chan, result_chan)
var total uint = 0
for r := range result_chan {
if r == true {
total += 1
}
}
fmt.Printf(&quot;Total %d\n&quot;, total)
}

The time in Go:
<!-- language: shell -->

&gt;&gt;&gt; time gogrep /var/log/task.log DEBUG 
Total 21089
real	0m0.156s
user	0m0.156s
sys	0m0.015s

The time in grep:
<!-- language: shell -->

&gt;&gt;&gt; time grep DEBUG /var/log/task.log | wc -l
21089
real	0m0.069s
user	0m0.046s
sys	0m0.064s

答案1

得分: 16

为了进行易于重现的基准测试,我统计了莎士比亚文本中“and”一词的出现次数。

gogrep:

$ go build gogrep.go && time ./gogrep /home/peter/shakespeare.txt and
总计 21851
实际 0m0.613s
用户 0m0.651s
系统 0m0.068s

grep:

$ time grep and /home/peter/shakespeare.txt | wc -l
21851
实际 0m0.108s
用户 0m0.107s
系统 0m0.014s

petergrep:

$ go build petergrep.go && time ./petergrep /home/peter/shakespeare.txt and
总计 21851
实际 0m0.098s
用户 0m0.092s
系统 0m0.008s

petergrep 是用 Go 编写的。它很快。

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"log"
	"os"
)

func parse_args() (file, pat string) {
	if len(os.Args) < 3 {
		log.Fatal("usage: petergrep <file_name> <pattern>")
	}
	file = os.Args[1]
	pat = os.Args[2]
	return
}

func grepFile(file string, pat []byte) int64 {
	patCount := int64(0)
	f, err := os.Open(file)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
		if bytes.Contains(scanner.Bytes(), pat) {
			patCount++
		}
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, err)
	}
	return patCount
}

func main() {
	file, pat := parse_args()
	total := grepFile(file, []byte(pat))
	fmt.Printf("总计 %d\n", total)
}

数据:莎士比亚文集:pg100.txt

英文:

For an easily reproducible benchmark, I counted the number of occurences of the text "and" in Shakespeare.

<pre>
gogrep:

$ go build gogrep.go && time ./gogrep /home/peter/shakespeare.txt and
Total 21851
real 0m0.613s
user 0m0.651s
sys 0m0.068s

grep:

$ time grep and /home/peter/shakespeare.txt | wc -l
21851
real 0m0.108s
user 0m0.107s
sys 0m0.014s

petergrep:

$ go build petergrep.go && time ./petergrep /home/peter/shakespeare.txt and
Total 21851
real 0m0.098s
user 0m0.092s
sys 0m0.008s
</pre>

petergrep is written in Go. It's fast.

package main
import (
&quot;bufio&quot;
&quot;bytes&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;os&quot;
)
func parse_args() (file, pat string) {
if len(os.Args) &lt; 3 {
log.Fatal(&quot;usage: petergrep &lt;file_name&gt; &lt;pattern&gt;&quot;)
}
file = os.Args[1]
pat = os.Args[2]
return
}
func grepFile(file string, pat []byte) int64 {
patCount := int64(0)
f, err := os.Open(file)
if err != nil {
log.Fatal(err)
}
defer f.Close()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
if bytes.Contains(scanner.Bytes(), pat) {
patCount++
}
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return patCount
}
func main() {
file, pat := parse_args()
total := grepFile(file, []byte(pat))
fmt.Printf(&quot;Total %d\n&quot;, total)
}

Data: Shakespeare: pg100.txt

答案2

得分: 4

Go正则表达式完全支持UTF-8,我认为这会带来一些开销。它们还有一个不同的理论基础,这意味着它们的运行时间始终与输入的长度成正比。显然,与其他语言中使用的pcre正则表达式相比,Go的正则表达式速度不够快。如果你查看正则表达式测试的基准测试结果,你就会明白我的意思。

不过,如果你想要更快的速度,你可以直接使用pcre库

英文:

Go regular expressions are fully utf-8 and I think that has some overhead. They also have a different theoretical basis meaning they will always run in a time proportional to the length of the input. It is noticeable that Go regexps just aren't as fast as the pcre regexp in use by other languages. If you look at the benchmarks game shootouts for the regexp test you'll see what I mean.

You can always use the pcre library directly if you want a bit more speed though.

答案3

得分: -1

UTF-8在正则表达式解析中的相关性数据点:我有一个长期使用的自定义perl5脚本用于源代码搜索。最近我对它进行了修改,以支持UTF-8,这样它就可以匹配复杂的golang符号名称。在反复测试中,它的运行速度慢了一个数量级。因此,虽然golang的正则表达式在运行时具有可预测性的代价,但我们还必须考虑UTF-8处理的因素。

英文:

A datapoint on the relevance of UTF-8 in regexp parsing: I've a long-used custom perl5 script for source grepping. I recently modified it to support UTF-8 so it could match fancy golang symbol names. It ran a FULL ORDER OF MAGNITUDE slower in repeated tests. So while golang regexp's do pay a price for the predictability of it's runtime, we also have to factor UTF-8 handling into the equation.

huangapple
  • 本文由 发表于 2014年11月3日 15:56:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/26709971.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定