英文:
Avoiding excessive memory allocation in golang when using an io.Writer
问题
我正在使用Go语言开发一个命令行工具,名为redis-mass,它可以将一系列的Redis命令转换为Redis协议格式。
首先,我将Node.js版本几乎完全移植到了Go语言。我使用了ioutil.ReadFile(inputFileName)
来获取文件的字符串版本,然后将编码后的字符串作为输出。
当我在一个包含200万个Redis命令的文件上运行时,耗时约为8秒,而Node.js版本大约需要16秒。我猜想之所以只快了两倍,是因为它首先将整个文件读入内存,所以我修改了编码函数,使其接受一个(raw io.Reader, enc io.Writer)
的参数对,代码如下:
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := strings.TrimSpace(scanner.Text())
args = parse(command)
length = len(args)
if length > 0 {
io.WriteString(enc, fmt.Sprintf("*%d\r\n", length))
for _, arg := range args {
io.WriteString(enc, fmt.Sprintf("$%d\r\n%s\r\n", len(arg), arg))
}
}
}
}
然而,在这个200万行的文件上,这个方法耗时12秒,所以我使用了github.com/pkg/profile
来查看它如何使用内存,结果发现内存分配的数量非常大:
# Alloc = 3162912
# TotalAlloc = 1248612816
# Mallocs = 46001048
# HeapAlloc = 3162912
我能否限制io.Writer
使用固定大小的缓冲区,并避免所有这些内存分配?
更一般地说,我该如何避免这个方法中的过多内存分配?这里是完整的源代码,以便更好地理解上下文。
英文:
I am working on a command line tool in Go called redis-mass that converts a bunch of redis commands into redis protocol format.
The first step was to port the node.js version, almost literally to Go. I used ioutil.ReadFile(inputFileName)
to get a string version of the file and then returned an encoded string as output.
When I ran this on a file with 2,000,000 redis commands, it took about 8 seconds, compared to about 16 seconds with the node version. I guessed that the reason it was only twice as fast was because it was reading the whole file into memory first, so I changed my encoding function to accept a pair (raw io.Reader, enc io.Writer)
, and it looks like this:
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := strings.TrimSpace(scanner.Text())
args = parse(command)
length = len(args)
if length > 0 {
io.WriteString(enc, fmt.Sprintf("*%d\r\n", length))
for _, arg := range args {
io.WriteString(enc, fmt.Sprintf("$%d\r\n%s\r\n", len(arg), arg))
}
}
}
}
However, this took 12 seconds on the 2 million line file, so I used github.com/pkg/profile to see how it was using memory, and it looks like the number of memory allocations is huge:
# Alloc = 3162912
# TotalAlloc = 1248612816
# Mallocs = 46001048
# HeapAlloc = 3162912
Can I constrain the io.Writer
to use a fixed sized buffer and avoid all those allocations?
More generally, how can I avoid excessive allocations in this method? Here's the full source for more context
答案1
得分: 1
通过使用[]byte而不是字符串来减少分配。直接使用fmt.Printf输出而不是fmt.Sprintf和io.WriteString。
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := bytes.TrimSpace(scanner.Bytes())
args = parse(command)
length = len(args)
if length > 0 {
fmt.Fprintf(enc, "*%d\r\n", length)
for _, arg := range args {
fmt.Fprintf(enc, "$%d\r\n%s\r\n", len(arg), arg)
}
}
}
}
希望对你有帮助!
英文:
Reduce allocations by working with []byte instead of strings. fmt.Printf directly to the output instead of fmt.Sprintf and io.WriteString.
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := bytes.TrimSpace(scanner.Bytes())
args = parse(command)
length = len(args)
if length > 0 {
fmt.Printf(enc, "*%d\r\n", length))
for _, arg := range args {
fmt.Printf(enc, "$%d\r\n%s\r\n", len(arg), arg))
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论