英文:
Improving performance of reading with bufio.NewScanner
问题
一个简单的程序,用于实现一个目标:
- 逐行读取脚本文件,创建一个字符串,忽略任何空白行或注释(包括shebang)。如果需要,在行尾添加一个分号。 (我知道,我知道,反斜杠和和号等)
我的问题是:
如何提高这个小程序的性能?在另一个答案中,我读到了利用scanner.Bytes()
而不是scanner.Text()
,但这似乎不可行,因为我需要的是一个字符串。
带有测试文件的示例代码:https://play.golang.org/p/gzSTLkP3BoB
这是简单的程序:
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
var a strings.Builder
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Text()
switch {
case lines == "" || lines[:1] == "#":
continue
case lines[len(lines)-1:] != ";":
a.WriteString(lines + "; ")
default:
a.WriteString(lines + " ")
}
}
fmt.Println(a.String())
}
英文:
A simple program to serve one purpose:
- Read a script file line by line, create a string while ignoring any blank new lines or comments (including the shebang). Adding a ';' at the end of a line if needed. (I know, I know, backslashes and ampersands, etc)
My question is:
How to improve the performance of this small program? In a different answer I've read about utilizing scanner.Bytes()
instead of scanner.Text()
, but this doesn't seem feasible as a string is what I want.
Sample code with test file: https://play.golang.org/p/gzSTLkP3BoB
Here is the simple program:
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
var a strings.Builder
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Text()
switch {
case lines == "" || lines[:1] == "#":
continue
case lines[len(lines)-1:] != ";":
a.WriteString(lines + "; ")
default:
a.WriteString(lines + " ")
}
}
fmt.Println(a.String())
}
答案1
得分: 2
我使用了strings.Builder
和ioutil.ReadAll
来提高性能。由于你处理的是小型shell脚本,我假设一次性读取整个文件不会对内存造成压力(我使用了ioutil.ReadAll
)。我还只分配了一次内存,为strings.Builder
提供足够的存储空间,从而减少了内存分配。
现在,让我们来看一下基准测试结果:
goos: darwin
goarch: amd64
pkg: test
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkDoFast-8 342602 3334 ns/op 1280 B/op 3 allocs/op
BenchmarkDoSlow-8 258896 4408 ns/op 4624 B/op 8 allocs/op
PASS
ok test 2.477s
我们可以看到,doFast
不仅更快,而且分配的内存更少。度量指标越低越好。
package main
import (
"bufio"
"bytes"
"fmt"
"io/ioutil"
"os"
"strings"
)
func open(filename string) (*os.File, error) {
return os.Open(filename)
}
func main() {
fd, err := open("test.sh")
if err != nil {
panic(err)
}
defer fd.Close()
outputA, err := doFast(fd)
if err != nil {
panic(err)
}
fd.Seek(0, 0)
outputB, err := doSlow(fd)
if err != nil {
panic(err)
}
fmt.Println(outputA)
fmt.Println(outputB)
}
func doFast(fd *os.File) (string, error) {
b, err := ioutil.ReadAll(fd)
if err != nil {
return "", err
}
var res strings.Builder
res.Grow(len(b))
bLines := bytes.Split(b, []byte("\n"))
for i := range bLines {
switch {
case len(bLines[i]) == 0 || bLines[i][0] == '#':
case bLines[i][len(bLines[i])-1] != ';':
res.Write(bLines[i])
res.WriteString("; ")
default:
res.Write(bLines[i])
res.WriteByte(' ')
}
}
return res.String(), nil
}
func doSlow(fd *os.File) (string, error) {
var a strings.Builder
scanner := bufio.NewScanner(fd)
for scanner.Scan() {
lines := scanner.Text()
switch {
case lines == "" || lines[:1] == "#":
continue
case lines[len(lines)-1:] != ";":
a.WriteString(lines + "; ")
default:
a.WriteString(lines + " ")
}
}
return a.String(), nil
}
注意:我没有使用bufio.NewScanner
,它是否是必需的?
英文:
I used strings.Builder
and ioutil.ReadAll
to improve the performance. As you are dealing with small shell scripts I assumed that read the file all at once should not put pressure on memory (I used ioutil.ReadAll
). I also allocated just once to make sufficient store for strings.Builder
— reduced allocations.
- doFast: faster implementation
- doSlow: slower implementation (what you've originally done)
Now, let's look at the benchmark results:
goos: darwin
goarch: amd64
pkg: test
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkDoFast-8 342602 3334 ns/op 1280 B/op 3 allocs/op
BenchmarkDoSlow-8 258896 4408 ns/op 4624 B/op 8 allocs/op
PASS
ok test 2.477s
We can see that doFast
is not only faster but only makes lesser allocations. Metrics measured are lower the better.
package main
import (
"bufio"
"bytes"
"fmt"
"io/ioutil"
"os"
"strings"
)
func open(filename string) (*os.File, error) {
return os.Open(filename)
}
func main() {
fd, err := open("test.sh")
if err != nil {
panic(err)
}
defer fd.Close()
outputA, err := doFast(fd)
if err != nil {
panic(err)
}
fd.Seek(0, 0)
outputB, err := doSlow(fd)
if err != nil {
panic(err)
}
fmt.Println(outputA)
fmt.Println(outputB)
}
func doFast(fd *os.File) (string, error) {
b, err := ioutil.ReadAll(fd)
if err != nil {
return "", err
}
var res strings.Builder
res.Grow(len(b))
bLines := bytes.Split(b, []byte("\n"))
for i := range bLines {
switch {
case len(bLines[i]) == 0 || bLines[i][0] == '#':
case bLines[i][len(bLines[i])-1] != ';':
res.Write(bLines[i])
res.WriteString("; ")
default:
res.Write(bLines[i])
res.WriteByte(' ')
}
}
return res.String(), nil
}
func doSlow(fd *os.File) (string, error) {
var a strings.Builder
scanner := bufio.NewScanner(fd)
for scanner.Scan() {
lines := scanner.Text()
switch {
case lines == "" || lines[:1] == "#":
continue
case lines[len(lines)-1:] != ";":
a.WriteString(lines + "; ")
default:
a.WriteString(lines + " ")
}
}
return a.String(), nil
}
Note: I didn't use bufio.NewScanner
; is it required?
答案2
得分: 1
使用scanner.Bytes()
是可行的。以下是代码:
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
var a strings.Builder
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Bytes()
switch {
case len(lines) == 0 || lines[0] == '#':
continue
case lines[len(lines)-1] != ';':
a.Write(lines)
a.WriteString("; ")
default:
a.Write(lines)
a.WriteByte(' ')
}
}
fmt.Println(a.String())
}
该程序避免了在scanner.Text()
中进行字符串分配。如果程序的速度受到I/O限制,那么实际上该程序可能不会更快。
如果你的目标是将结果写入标准输出(stdout),那么可以使用bufio.Writer
而不是strings.Builder
来进行写入。这个改变将strings.Builder
中的一个或多个分配替换为bufio.Writer
中的单个分配。
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
a := bufio.NewWriter(os.Stdout)
defer a.Flush() // 在 main 函数返回时刷新缓冲区中的数据。
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Bytes()
switch {
case len(lines) == 0 || lines[0] == '#':
continue
case lines[len(lines)-1] != ';':
a.Write(lines)
a.WriteString("; ")
default:
a.Write(lines)
a.WriteByte(' ')
}
}
}
额外的改进:使用lines := bytes.TrimSpace(scanner.Bytes())
来处理#
之前和;
之后的空白字符。
英文:
It is feasible to use scanner.Bytes(). Here's the code:
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
var a strings.Builder
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Bytes()
switch {
case len(lines) == 0 || lines[0] == '#':
continue
case lines[len(lines)-1] != ';':
a.Write(lines)
a.WriteString("; ")
default:
a.Write(lines)
a.WriteByte(' ')
}
}
fmt.Println(a.String())
}
This program avoids the string allocation in scanner.Text(). The program may not be faster in practice if the program speed is limited by I/O.
If your goal is to write the result to stdout, then write to a bufio.Writer instead of a strings.Builder. This change replaces one or more allocations in strings.Builder with a single allocation in bufio.Writer.
func main() {
file, err := os.Open("./script.sh")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
a := bufio.NewWriter(os.Stdout)
defer a.Flush() // flush buffered data on return from main.
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines := scanner.Bytes()
switch {
case len(lines) == 0 || lines[0] == '#':
continue
case lines[len(lines)-1] != ';':
a.Write(lines)
a.WriteString("; ")
default:
a.Write(lines)
a.WriteByte(' ')
}
}
}
Bonus improvement: use lines := bytes.TrimSpace(scanner.Bytes())
to handle whitespace before a '#'
and after a ';'
答案3
得分: 0
你可以通过对输出进行缓冲来提高性能。
func main() {
output := bufio.NewWriter(os.Stdout)
// 使用 fmt.Fprintf 替代 Printf
fmt.Fprintf(output, "%s\n", a)
}
英文:
You may be able to improve performance by buffering the output as well.
func main() {
output := bufio.NewWriter(os.Stdout)
// instead of Printf, use
fmt.Fprintf(output, "%s\n", a)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论