英文:
Golang and Concurrency/Parallelism
问题
我正在深入学习Golang,并且遇到了一个问题,我已经花了几天的时间努力解决,但似乎无法理解Go协程的概念以及它们的使用方法。
基本上,我正在尝试生成数百万条随机记录。我有一些函数用于生成随机数据,并将这些数据写入一个包含这些数据的巨大的CSV文件。
我的问题是,是否可能并发地进行这个过程以加快速度?
我的代码基本上是生成一个随机字符串,将字符串写入文件,重复N次(其中N是您想要的次数)。
我的问题是,是否可能并发地执行这个过程以减少执行时间。无论我如何解决这个问题,似乎都无法获得比不使用Go协程时更好的性能。
以下是我目前的代码示例:
func worker(c chan string) {
for {
c <- /* 在这里使用其他函数生成随机数据 */
}
close(c)
}
func writer(s string) {
csvfile.WriteString(s)
}
func main() {
receive := make(chan string)
for i := 0; i < 100; i++ {
go worker(receive)
}
for i := 0; i < 10000; i++ {
go writer(<-receive)
}
}
在生成数据的过程中,我使用了大量的函数调用,来自于https://github.com/Pallinder/go-randomdata。您认为这可能是我失去所有时间的原因吗?
非常感谢您的帮助。
英文:
I am diving into Golang and have a problem that I have been working on a few days and I just cant seem to grasp the concept of go routines and how they are used.
Basically I am, trying to generate millions of random records. I have functions that make the random data, and will create a giant .CSV file containing this data.
My questions is if it is possible to make this concurrent and speed things up?
My code is basically generate a random string, write string to file up to N times (where N is whatever you want).
My question is if its possible to even do this concurrently in order to reduce execution time. It seems that no matter how I approach this problem, I still get the same benchmark as if I did it without go routines.
This is a sample of what I have so far:
func worker(c chan string) {
for {
c <- /* Generate random data using other functions here */
}
close(c)
}
func writer(s string) {
csvfile.WriteString(s)
}
func main(){
receive := make(chan string)
for i := 0; i < 100; i++ {
go worker(receive)
}
for i := 0; i < 10000; i++ {
go writer(<-receive)
}
}
Where I generate data, I am using tons and tons of function calls from: https://github.com/Pallinder/go-randomdata. Do you think that that could be where I am losing all this time?
Any help would be appreciated.
答案1
得分: 1
我认为你不应该在这里尝试使用go routine。文件写入几乎总是原子性的,你只需要使写入文件的机制并发即可...这将需要一个复杂的锁机制,但最终可能不会提高应用程序的性能,因为写入本身仍然是原子的。
如果数据生成成为程序的瓶颈,那么将这部分工作拆分到go routine中并从获取所有数据的地方进行写入是有意义的。但是,
for i := 0; i < 100; i++ {
go worker(receive)
}
for {
select {
case item := <-receive:
writer(item)
case <-abort:
cleanUp()
return
}
}
你不能只是在某个整数上循环,同时从通道接收并无限调用函数...你可以在select中从通道接收。或者只需执行item := <-receive
,它会阻塞直到读取一个项目。在上面的示例中,我提供了一些伪代码,以更好地演示在这种情况下你的设计应该是什么样的。你需要一个中止通道,这样你就可以在需要停止应用程序时退出go routine。它应该在返回之前完成对文件的写入并关闭文件。
英文:
I don't think you should be trying to use a go routine here. File writes are almost always atomic, you want to make the mechanism which writes to your file concurrent... That would require a complicated locking mechanism that ultimately probably won't improve application performance due to the write itself still being atomic.
If data generation were bottle necking your program then it would make sense to split that work off in go routines and write from that place where you get all the data. But
for i := 0; i < 100; i++ {
go worker(receive)
}
for {
select {
case item := <-receive:
writer(item)
case <-abort:
cleanUp()
return
}
}
You can't just loop on some int while recieving from a channel and calling a function endlessly... You can receive from a channel in a select though. Or just by doing item := <-recieve
which would block until one item gets read. In my example above I've provided some pseudo code to demonstrate more what your design should be in this case. You need an abort channel so you can get out of your go routines should you want to stop the application. It should probably finalize the write to your file and then close it before returning.
答案2
得分: 0
尝试使用缓冲通道来缓解问题:
receive := make(chan string, 1000)
写入速度受到磁盘的限制,所以通过并发写入只能帮助到一定程度。根据你的描述,同时生成数据也无法提高性能。
并发并不是解决一切缓慢问题的方法,要么接受当前的限制,要么进行优化。
英文:
receive := make(chan string, 1000)
Write speed is limited by your disk, so there's only so much you can do to help by writing concurrently, and from what you're telling generating data concurrently doesn't help either.
Concurrency isn't the solution for anything slow, either accept you're at the limit or optimize.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论