英文:
Fix memory consumption of a go program with goroutines
问题
我正在解决一个涉及生产者-消费者模式的问题。我有一个生产者负责生成任务,还有n个消费者负责消费任务。消费者的任务是从文件中读取一些数据,然后将数据上传到S3。一个消费者可以读取多达xMB(8/16/32)的数据,然后将其上传到S3。将所有数据保存在内存中导致内存消耗超出了程序的预期,所以我改为从文件中读取数据,然后将其写入临时文件,最后将文件上传到S3。虽然这在内存方面表现更好,但CPU的负担增加了。我想知道是否有办法在不同的goroutine之间分配一定大小的内存,并在每个goroutine调用中重复使用它?
我希望的是,如果我有4个goroutine,那么我可以分配4个不同大小为xMB的数组,并在每个goroutine调用中使用相同的数组,这样goroutine就不需要每次都分配内存,也不需要依赖垃圾回收来释放内存。
编辑:添加了我的代码要点。我的Go消费者代码如下:
type Block struct {
offset int64
size int64
}
func consumer(blocks []Block) {
var dataArr []byte
for _, block := range blocks {
data := file.Read(block.offset, block.size)
dataArr = append(dataArr, data)
}
upload(dataArr)
}
我根据块从文件中读取数据,这个块可以包含多个由xMB限制的小块或一个大小为xMB的大块。
编辑2:根据评论中的建议尝试了sync.Pool,但是内存消耗没有改善。我做错了什么吗?
var pool *sync.Pool
func main() {
pool = &sync.Pool{
New: func() interface{} {
return make([]byte, 16777216)
},
}
for i := 0; i < 4; i++ {
// blocks是一个二维数组,每个索引包含一组块。
go consumer(blocks[i])
}
}
func consumer(blocks []Block) {
var dataArr []byte
d := pool.Get().([]byte)
for _, block := range blocks {
file.Read(block.offset, block.size, d[block.offset:block.size])
}
upload(data)
pool.Put(data)
}
英文:
I am working on a problem that involves a producer-consumer pattern. I have one producer who produces the task and 'n' consumers that consumes the task. A consumer task is to read some data from a file and then upload that data to S3. One consumer can read up to xMB(8/16/32) of data and then uploads it to s3. keeping all the data in memory was causing memory consumption that was more than what is expected from the program so I switched to reading the data from file and then writing it to some temporary file and then uploading the file to S3, though this performed better in terms of memory but CPU took a hit. I wonder if there is any way to allocate a fixed size of memory once and then use it among different goroutines?
What I would want is that if I have 4 goroutines then I can allocate 4 different array of xMB and then use the same array in each goroutine invocation, so that a goroutine doesn't allocate for memory every time and also doesn't depend on GC to free the memory?
Edit: Adding a crux of my code. My go consumer looks like:
type struct Block {
offset int64
size int64
}
func consumer (blocks []Block) {
var dataArr []byte
for _, block := range blocks {
data := file.Read(block.offset, block.size)
dataArr = append(dataArr, data)
}
upload(dataArr)
}
I read the data from file based on Blocks, this block can contain several small chunks limited by xMB or one big chunk of xMB.
Edit2: Tried sync.Pool based on suggestions in comment. but I did not see any improvement in memory consumption. Am I doing something wrong?
var pool *sync.Pool
func main() {
pool = &sync.Pool{
New: func()interface{} {
return make([]byte, 16777216)
},
}
for i:=0; i < 4; i++ {
// blocks is 2-d array each index contains array of blocks.
go consumer(blocks[i])
}
}
go consumer(blocks []Blocks) {
var dataArr []byte
d := pool.(Get).([]byte)
for _, block := range blocks {
file.Read(block.offset,block.size,d[block.offset:block.size])
}
upload(data)
pool.put(data)
}
答案1
得分: 1
请看一下StaticCheck的SA6002,关于sync.Pool
的内容。你也可以使用pprof
工具。
英文:
Take a look at SA6002 of StaticCheck, about sync.Pool
. You can also use pprof
tool.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论