英文:
concurrent memory allocation using `make`?
问题
我打算读取一个大型的CSV文件,并返回一个结构体数组。因此,我决定将大文件拆分成多个每个文件包含100万行的小文件,并使用Go协程并行处理它们。
在每个工作协程中,我创建一个数组来插入文件的行:
for i := 0; i < 10 ; i++ {
go func(index int) {
lines := make([]MyStruct, 1000000)
}(i)
}
看起来这些Go协程在这一行上互相等待。因此,如果为数组分配内存需要1秒钟,那么同时进行这个操作的10个协程将需要10秒钟,而不是1秒钟!
你能帮我理解为什么会这样吗?如果是这样的话,我想我会在启动Go协程之前分配内存,并将数组的指针以及它们需要开始读取行和设置值的元素的索引传递给它们。
英文:
I am going to read a large csv file and return an array of structs. So, I decided to split the large file into multiple smaller files with 1 million lines each and use go routines to process them in parallel.
Inside each worker, I create an array to insert the file lines in:
for i := 0; i < 10 ; i++ {
go func(index int) {
lines := make([]MyStruct, 1000000)
}(i)
}
It seems like the go routines wait for each other on this line. So, if the memory allocation for the array takes 1 second, 10 concurrent routines doing that will take 10 seconds, instead of 1 second!
Could you please help me understand why? If this is so, I guess I will allocate memory before starting the go routines and pass the array's pointer to each of them, plus the index of the element that they need to start with while reading lines and setting values.
答案1
得分: 3
你需要设置runtime.GOMAXPROCS(runtime.NumCPU())
或者GOMAXPROCS
环境变量,以便实际上使用多个核心。
参考:http://golang.org/pkg/runtime/#GOMAXPROCS
引用@siritinga的话:
> 当然,你需要对行进行一些操作。
>
> 目前,它们被分配然后被垃圾收集器丢弃。
另一种方法是预先分配切片,然后将其的部分传递给goroutine,例如:
N := 1000000
lines := make([]MyStruct, N * 10)
for i := 0; i < 10 ; i++ {
idx := i * N
go func(lines []MyStruct) {
//对lines进行操作
}(lines[idx:idx+N])
}
英文:
You need to set runtime.GOMAXPROCS(runtime.NumCPU())
or GOMAXPROCS
environment variable for it to actually use multiple cores.
ref: http://golang.org/pkg/runtime/#GOMAXPROCS
And to quote @siritinga:
> And of course, you need to do something with lines.
>
> Right now, they are allocated and then lost for the garbage collector.
A different approach is to preallocate the slice then pass parts of it to the goroutines, for example:
N := 1000000
lines := make([]MyStruct, N * 10)
for i := 0; i < 10 ; i++ {
idx := i * N
go func(lines []MyStruct) {
//do stuff with lines
}(lines[idx:idx+N])
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论