2014年11月3日 15:56:28go评论168阅读模式

英文:

Could this be more efficient in Go?

问题

我写了一段代码来演示Go语言中的标准命令grep，但速度远远落后于它，有人能给我一些建议吗？以下是代码：

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strings"
	"sync"
)

func parse_args() (file, pat string) {
	if len(os.Args) < 3 {
		log.Fatal("usage: gorep2 <file_name> <pattern>")
	}

	file = os.Args[1]
	pat = os.Args[2]
	return
}

func readFile(file string, to chan<- string) {
	f, err := os.Open(file)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	freader := bufio.NewReader(f)
	for {
		line, er := freader.ReadBytes('\n')
		if er == nil {
			to <- string(line)
		} else {
			break
		}

	}
	close(to)
}

func grepLine(pat string, from <-chan string, result chan<- bool) {
	var wg sync.WaitGroup

	for line := range from {
		wg.Add(1)

		go func(l string) {
			defer wg.Done()
			if strings.Contains(l, pat) {
				result <- true
			}
		}(string(line))
	}

	wg.Wait()
	close(result)
}

func main() {
	file, pat := parse_args()
	text_chan := make(chan string, 10)
	result_chan := make(chan bool, 10)

	go readFile(file, text_chan)
	go grepLine(pat, text_chan, result_chan)

	var total uint = 0
	for r := range result_chan {
		if r == true {
			total += 1
		}
	}

	fmt.Printf("Total %d\n", total)
}

Go中的time：

>>> time gogrep /var/log/task.log DEBUG

Total 21089

real	0m0.156s
user	0m0.156s
sys	0m0.015s

grep中的time：

>>> time grep DEBUG /var/log/task.log | wc -l

21089

real	0m0.069s
user	0m0.046s
sys	0m0.064s

请问有什么我可以帮助您的吗？

英文:

I wrote a piece of code to illustrate the standard command grep in Go, but the speed is
far behind it, could someone give me any advances? here is the code:

package main
import (
&quot;bufio&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;os&quot;
&quot;strings&quot;
&quot;sync&quot;
)
func parse_args() (file, pat string) {
if len(os.Args) &lt; 3 {
log.Fatal(&quot;usage: gorep2 &lt;file_name&gt; &lt;pattern&gt;&quot;)
}
file = os.Args[1]
pat = os.Args[2]
return
}
func readFile(file string, to chan&lt;- string) {
f, err := os.Open(file)
if err != nil {
log.Fatal(err)
}
defer f.Close()
freader := bufio.NewReader(f)
for {
line, er := freader.ReadBytes(&#39;\n&#39;)
if er == nil {
to &lt;- string(line)
} else {
break
}
}
close(to)
}
func grepLine(pat string, from &lt;-chan string, result chan&lt;- bool) {
var wg sync.WaitGroup
for line := range from {
wg.Add(1)
go func(l string) {
defer wg.Done()
if strings.Contains(l, pat) {
result &lt;- true
}
}(string(line))
}
wg.Wait()
close(result)
}
func main() {
file, pat := parse_args()
text_chan := make(chan string, 10)
result_chan := make(chan bool, 10)
go readFile(file, text_chan)
go grepLine(pat, text_chan, result_chan)
var total uint = 0
for r := range result_chan {
if r == true {
total += 1
}
}
fmt.Printf(&quot;Total %d\n&quot;, total)
}

The time in Go:

&gt;&gt;&gt; time gogrep /var/log/task.log DEBUG 
Total 21089
real	0m0.156s
user	0m0.156s
sys	0m0.015s

The time in grep:

&gt;&gt;&gt; time grep DEBUG /var/log/task.log | wc -l
21089
real	0m0.069s
user	0m0.046s
sys	0m0.064s

答案1

得分: 16

为了进行易于重现的基准测试，我统计了莎士比亚文本中“and”一词的出现次数。

gogrep:

$ go build gogrep.go && time ./gogrep /home/peter/shakespeare.txt and
总计 21851
实际 0m0.613s
用户 0m0.651s
系统 0m0.068s

grep:

$ time grep and /home/peter/shakespeare.txt | wc -l
21851
实际 0m0.108s
用户 0m0.107s
系统 0m0.014s

petergrep:

$ go build petergrep.go && time ./petergrep /home/peter/shakespeare.txt and
总计 21851
实际 0m0.098s
用户 0m0.092s
系统 0m0.008s

petergrep 是用 Go 编写的。它很快。

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"log"
	"os"
)

func parse_args() (file, pat string) {
	if len(os.Args) < 3 {
		log.Fatal("usage: petergrep <file_name> <pattern>")
	}
	file = os.Args[1]
	pat = os.Args[2]
	return
}

func grepFile(file string, pat []byte) int64 {
	patCount := int64(0)
	f, err := os.Open(file)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
		if bytes.Contains(scanner.Bytes(), pat) {
			patCount++
		}
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, err)
	}
	return patCount
}

func main() {
	file, pat := parse_args()
	total := grepFile(file, []byte(pat))
	fmt.Printf("总计 %d\n", total)
}

数据：莎士比亚文集：pg100.txt

英文:

For an easily reproducible benchmark, I counted the number of occurences of the text "and" in Shakespeare.

<pre>
gogrep:

$ go build gogrep.go && time ./gogrep /home/peter/shakespeare.txt and
Total 21851
real 0m0.613s
user 0m0.651s
sys 0m0.068s

grep:

$ time grep and /home/peter/shakespeare.txt | wc -l
21851
real 0m0.108s
user 0m0.107s
sys 0m0.014s

petergrep:

$ go build petergrep.go && time ./petergrep /home/peter/shakespeare.txt and
Total 21851
real 0m0.098s
user 0m0.092s
sys 0m0.008s
</pre>

petergrep is written in Go. It's fast.

package main
import (
&quot;bufio&quot;
&quot;bytes&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;os&quot;
)
func parse_args() (file, pat string) {
if len(os.Args) &lt; 3 {
log.Fatal(&quot;usage: petergrep &lt;file_name&gt; &lt;pattern&gt;&quot;)
}
file = os.Args[1]
pat = os.Args[2]
return
}
func grepFile(file string, pat []byte) int64 {
patCount := int64(0)
f, err := os.Open(file)
if err != nil {
log.Fatal(err)
}
defer f.Close()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
if bytes.Contains(scanner.Bytes(), pat) {
patCount++
}
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return patCount
}
func main() {
file, pat := parse_args()
total := grepFile(file, []byte(pat))
fmt.Printf(&quot;Total %d\n&quot;, total)
}

Data: Shakespeare: pg100.txt

答案2

得分: 4

Go正则表达式完全支持UTF-8，我认为这会带来一些开销。它们还有一个不同的理论基础，这意味着它们的运行时间始终与输入的长度成正比。显然，与其他语言中使用的pcre正则表达式相比，Go的正则表达式速度不够快。如果你查看正则表达式测试的基准测试结果，你就会明白我的意思。

不过，如果你想要更快的速度，你可以直接使用pcre库。

英文:

Go regular expressions are fully utf-8 and I think that has some overhead. They also have a different theoretical basis meaning they will always run in a time proportional to the length of the input. It is noticeable that Go regexps just aren't as fast as the pcre regexp in use by other languages. If you look at the benchmarks game shootouts for the regexp test you'll see what I mean.

You can always use the pcre library directly if you want a bit more speed though.

答案3

得分: -1

UTF-8在正则表达式解析中的相关性数据点：我有一个长期使用的自定义perl5脚本用于源代码搜索。最近我对它进行了修改，以支持UTF-8，这样它就可以匹配复杂的golang符号名称。在反复测试中，它的运行速度慢了一个数量级。因此，虽然golang的正则表达式在运行时具有可预测性的代价，但我们还必须考虑UTF-8处理的因素。

英文:

A datapoint on the relevance of UTF-8 in regexp parsing: I've a long-used custom perl5 script for source grepping. I recently modified it to support UTF-8 so it could match fancy golang symbol names. It ran a FULL ORDER OF MAGNITUDE slower in repeated tests. So while golang regexp's do pay a price for the predictability of it's runtime, we also have to factor UTF-8 handling into the equation.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

这在Go语言中是否可以更高效？

问题

答案1

答案2

答案3

Appengine Go devserver构建问题

How do I use a TypeConverter in Gorp?

手动编辑boltdb文件

将GOPATH文件夹符号链接到其他位置是常见做法吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论