英文:
What is the fastest way to rewrite file with go
问题
我可以帮你翻译这段代码。这段代码使用Go语言实现了将一个包含各种大小字符串的大文件(无法完全放入内存)重新写入另一个文件的功能,但是每个字符串都转换为大写。你想知道如何以最快的方式实现这个功能。
以下是我能想到的最有效的方法。有没有什么办法可以让它运行得更快呢?
package main
import (
"bufio"
"log"
"os"
"strings"
)
func main() {
inputFile, err := os.Open("input.txt")
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
outputFile, err := os.Create("output.txt")
if err != nil {
log.Fatal(err)
}
defer outputFile.Close()
scanner := bufio.NewScanner(inputFile)
writer := bufio.NewWriter(outputFile)
for scanner.Scan() {
line := scanner.Text()
capitalized := strings.ToUpper(line)
_, err := writer.WriteString(capitalized + "\n")
if err != nil {
log.Fatal(err)
}
}
err = writer.Flush()
if err != nil {
log.Fatal(err)
}
}
希望对你有帮助!
英文:
I have a large file(can't fit entirely in memory) containing strings of various sizes. I want to rewrite these strings to another file, but with each string capitalized. What is the fastest way to achieve this in Go?
Here is the most efficient way that I could come up with. Any ideas on how to make it faster?
package main
import (
"bufio"
"log"
"os"
"strings"
)
func main() {
inputFile, err := os.Open("input.txt")
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
outputFile, err := os.Create("output.txt")
if err != nil {
log.Fatal(err)
}
defer outputFile.Close()
scanner := bufio.NewScanner(inputFile)
writer := bufio.NewWriter(outputFile)
for scanner.Scan() {
line := scanner.Text()
capitalized := strings.ToUpper(line)
_, err := writer.WriteString(capitalized + "\\n")
if err != nil {
log.Fatal(err)
}
}
err = writer.Flush()
if err != nil {
log.Fatal(err)
}
}
答案1
得分: 1
一种开始的方法是运行Go测试包的基准测试。
对于基准测试数据,我使用了一个包含275,502个单词、大部分为小写字母、3,077,701字节的Linux字典文件:/usr/share/dict/brazilian
。鉴于你对文件的描述比较模糊,这是我能找到的最好的文件。为了避免基准测试中的磁盘I/O,我使用bytes.Reader
作为io.Reader
,并使用ioutil.Discard
作为io.Writer
。
你的代码的结果:
$ go test upper_so_test.go -run=! -benchmem -bench=.
BenchmarkSO-12 48 22765120 ns/op 8143216 B/op 550993 allocs/op
Blunderific的代码的结果:
BenchmarkB-12 94 13061407 ns/op 3782866 B/op 275505 allocs/op
作为概念验证(PoC),我使用字典文件编写了一段使用最小CPU和内存的代码。到目前为止,我的PoC代码的结果如下:
BenchmarkTU-12 182 6457334 ns/op 8240 B/op 3 allocs/op
将我的PoC代码作为程序运行,使用SSD文件存储来读取和写入字典文件,只需要几毫秒:
$ time ./upper
real 0m0.031s
user 0m0.014s
sys 0m0.009s
没有你的文件的一个小样本,无法对性能改进做出具体的建议。然而,使用字典文件,我的PoC基准测试结果与你的基准测试结果(6,457,334 ns/op vs. 22,765,120,8,240 B/op vs. 8,143,216,3 allocs/op vs. 550,993)表明你过度使用CPU和内存可能会影响性能。
upper_so_test.go:
package main
import (
"bufio"
"bytes"
"io"
"io/ioutil"
"os"
"strings"
"testing"
)
func SOToUpper(r io.Reader, w io.Writer) error {
scanner := bufio.NewScanner(r)
writer := bufio.NewWriter(w)
for scanner.Scan() {
line := scanner.Text()
capitalized := strings.ToUpper(line)
_, err := writer.WriteString(capitalized + "\n")
if err != nil {
return err
}
}
err := writer.Flush()
if err != nil {
return err
}
return nil
}
var benchData = func() []byte {
data, err := os.ReadFile(`/usr/share/dict/brazilian`)
if err != nil {
panic(err)
}
return data
}()
func BenchmarkSO(b *testing.B) {
for i := 0; i < b.N; i++ {
r := bytes.NewReader(benchData)
w := ioutil.Discard
err := SOToUpper(r, w)
if err != nil {
b.Error(err)
}
}
}
英文:
One place to start is to run Go testing package benchmarks.
For benchmark data I use a 275,502 word, largely lowercase, 3,077,701 byte, Linux dictionary file: /usr/share/dict/brazilian
. It's the best I could do given your vague description of your file. To avoid benchmark disk I/O, I use bytes.Reader for io.Reader and ioutil.Discard for io.Writer.
The results for your code:
$ go test upper_so_test.go -run=! -benchmem -bench=.
BenchmarkSO-12 48 22765120 ns/op 8143216 B/op 550993 allocs/op
The results for Blunderific's code:
BenchmarkB-12 94 13061407 ns/op 3782866 B/op 275505 allocs/op
As a Proof of Concept (PoC), using the dictionary file, I wrote code which uses minimal CPU and memory. The results, so far, for my PoC code:
BenchmarkTU-12 182 6457334 ns/op 8240 B/op 3 allocs/op
Running my PoC code as a program, using SSD file storage for reading and writing the dictionary file, takes a few milliseconds:
$ time ./upper
real 0m0.031s
user 0m0.014s
sys 0m0.009s
Without even a small sample of your file, it is not possible to make concrete recommendations for performance improvement. However, using the dictionary file, my PoC benchmark results versus your benchmark results (6,457,334 ns/op vs. 22,765,120, 8,240 B/op vs. 8,143,216, 3 allocs/op vs. 550,993) do make it likely that your profligate use of CPU and memory is hurting performance.
upper_so_test.go:
package main
import (
"bufio"
"bytes"
"io"
"io/ioutil"
"os"
"strings"
"testing"
)
func SOToUpper(r io.Reader, w io.Writer) error {
scanner := bufio.NewScanner(r)
writer := bufio.NewWriter(w)
for scanner.Scan() {
line := scanner.Text()
capitalized := strings.ToUpper(line)
_, err := writer.WriteString(capitalized + "\n")
if err != nil {
return err
}
}
err := writer.Flush()
if err != nil {
return err
}
return nil
}
var benchData = func () []byte {
data, err := os.ReadFile(`/usr/share/dict/brazilian`)
if err != nil {
panic(err)
}
return data
}()
func BenchmarkSO(b *testing.B) {
for i := 0; i < b.N; i++ {
r := bytes.NewReader(benchData)
w := ioutil.Discard
err := SOToUpper(r, w)
if err != nil {
b.Error(err)
}
}
}
答案2
得分: 0
在内部循环中,使用[]byte
而不是string
,以避免从[]byte
到string
的转换。
Scanner.String()方法将数据复制到一个新的字符串中。
Scanner.Bytes()返回扫描器缓冲区上的切片。
for scanner.Scan() {
line := scanner.Bytes()
capitalized := bytes.ToUpper(line)
_, err := writer.Write(capitalized)
if err != nil {
log.Fatal(err)
}
err = writer.WriteByte('\n')
if err != nil {
log.Fatal(err)
}
}
英文:
Use []byte
instead of string
in the inner loop to avoid conversions from []byte
to string
.
The Scanner.String() method copies the data to a new string.
The Scanner.Bytes() returns a slice on the scanner's buffer.
for scanner.Scan() {
line := scanner.Bytes()
capitalized := bytes.ToUpper(line)
_, err := writer.Write(capitalized)
if err != nil {
log.Fatal(err)
}
err = writer.WriteByte('\n')
if err != nil {
log.Fatal(err)
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论