英文:
How do I translate the following threading model from C++ to go?
问题
在我的C++项目中,我有一个大的、占用GB级别的二进制文件存储在磁盘上,我将其读入内存进行只读计算。
我的当前C++实现涉及一次性将整个块读入内存,然后生成线程从块中读取以进行各种计算(无互斥锁且运行速度快)。从技术上讲,每个线程实际上只需要一次性读取文件的一小部分,因此将来,如果文件变得太大,我可能会更改这个实现以使用mmap()。我注意到了这个gommap库,所以我认为我应该可以解决这个问题。
在考虑运行时效率的情况下,我应该采取什么方法将我的当前C++线程模型(一个大的只读内存块)转换为go线程模型?
goroutines?其他选择?
英文:
In my C++ project, I have a large, GB binary file on disk that I read into memory for read-only calculations.
My current C++ implementation involves reading the entire chunk into memory once and then spawning threads to read from the chunk in order to do various calculations (mutex-free and runs quickly). Technically, each thread really only needs a small part of the file at a time, so in the future, I may change this implementation to use mmap(), especially if the file gets too big. I've noticed this gommap lib so I think I should be covered going forward.
What approach should I take to translate my current C++ threading model (one large chunk of read-only memory) into a go threading model, keeping run-time efficiency in mind?
goroutines? alternatives?
答案1
得分: 3
我确定这个答案会引起很多争议,但是我还是要说:
如果你的代码已经没有互斥锁,那么切换到Go语言并不会减少运行时间,尤其是Go语言不能保证goroutine的高效平衡,也不能充分利用可用的核心。生成的代码比C++慢。Go语言目前的优势在于清晰的抽象和并发性,而不是并行性。
如果你需要在后面的过程中再次使用文件的某些部分,那么一开始就读取整个文件并不是特别高效。文件中的某些部分会从缓存中删除,然后在后面重新加载。如果你的平台允许,你应该考虑使用内存映射,这样页面会在需要时从磁盘加载。
如果有任何强烈的线程间通信或数据之间的依赖关系,你应该尝试将算法改为单线程。不了解你应用于数据的例程的具体情况很难说,但是你可能过早地使用了线程,希望获得神奇的性能提升。
如果由于文件大小或其他平台限制无法依赖内存映射,你应该考虑使用pread调用,这样可以重用一个文件描述符,并且只在需要时读取。
如常,优化遵循以下规则。你必须进行性能分析。你必须检查你所做的改变是否改善了性能。很多时候,你会发现内存映射、线程和其他花招对性能几乎没有任何影响。如果你从C或C++切换过来,这也是一场艰苦的战斗。
此外,你应该为处理文件的每个部分生成goroutine,并通过通道减少计算结果。确保将GOMAXPROCS
设置为适当的值。
英文:
I'm sure this answer will cop a lot of heat but here goes:
You won't get reduced running time by switching to Go, especially if your code is already mutex free. Go doesn't guarantee efficient balancing of goroutines, and will not currently make best use of the available cores. The generated code is slower than C++. Go's current strengths are in clean abstractions, and concurrency, not parallelism.
Reading the entire file up front isn't particular efficient if you then have to go and backtrack through memory. Parts of the file you won't use again until much later will be dropped from the cache, only to be reloaded again later. You should consider memory mapping if your platform will allow it, so that pages are loaded from disk as they're required.
If there is any intense inter-routine communication, or dependencies between the data you should try to make the algorithm single threaded. It's difficult to say without knowing more about the routines you're applying to the data, but it does sound possible that you've pulled out threads prematurely in the hope to get a magic performance boost.
If you're unable to rely on memory mapping due to file size, or other platform constraints, you should consider making use of the pread call, thereby reusing a single file descriptor, and only reading as required.
As always, the following rule applies to optimization. You must profile. You must check that changes you make from a working solution, are improving things. Very often you'll find that memory mapping, threading and other shenanigans have no noticeable effect on performance whatsoever. It's also an uphill battle if you're switching away from C or C++.
Also, you should spawn goroutines to handle each part of the file, and reduce the results of the calculations through a channel. Make sure to set GOMAXPROCS
to an appropriate value.
答案2
得分: 1
这个程序在多个goroutine中对文件中的所有字节求和(不用担心溢出)。
你需要重新实现processChunk和aggregateResults来适应你的情况。你可能还想改变结果通道的类型。根据你的需求,你甚至可能不需要聚合结果。块大小和通道的缓冲区大小是你可以调整的参数。
package main
import (
"fmt"
"io/ioutil"
)
func main() {
data, err := ioutil.ReadFile("filename")
if err != nil {
// 处理这个错误
panic(err.String())
}
// 调整这个来控制块的大小。
// 我随意选择了它。
const chunkSize = 0x10000
// 这个通道是无缓冲的。为了更好的性能,可以添加一个缓冲区。
results := make(chan int64)
chunks := 0
for len(data) > 0 {
size := chunkSize
if len(data) < chunkSize {
size = len(data)
}
go processChunk(data[:size], results)
data = data[size:]
chunks++
}
aggregateResults(results, chunks)
}
func processChunk(chunk []byte, results chan int64) {
sum := int64(0)
for _, b := range chunk {
sum += int64(b)
}
results <- sum
}
func aggregateResults(results chan int64, chunks int) {
sum := int64(0)
for chunks > 0 {
sum += <-results
chunks--
}
fmt.Println("所有字节的和为", sum)
}
英文:
This program sums all the bytes in a file in multiple goroutines (without worrying about overflow).
You'll want to reimplement processChunk and aggregateResults for your case. You may also want to change the channel type of the results channel. Depending on what you're doing, you may not even need to aggregate the results. The chunk size and the channel's buffer size are other knobs you can tweak.
package main
import (
"fmt"
"io/ioutil"
)
func main() {
data, err := ioutil.ReadFile("filename")
if err != nil {
// handle this error somehow
panic(err.String())
}
// Adjust this to control the size of chunks.
// I've chosen it arbitrarily.
const chunkSize = 0x10000
// This channel's unbuffered. Add a buffer for better performance.
results := make(chan int64)
chunks := 0
for len(data) > 0 {
size := chunkSize
if len(data) < chunkSize {
size = len(data)
}
go processChunk(data[:size], results)
data = data[size:]
chunks++
}
aggregateResults(results, chunks)
}
func processChunk(chunk []byte, results chan int64) {
sum := int64(0)
for _, b := range chunk {
sum += int64(b)
}
results <- sum
}
func aggregateResults(results chan int64, chunks int) {
sum := int64(0)
for chunks > 0 {
sum += <-results
chunks--
}
fmt.Println("The sum of all bytes is", sum)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论