英文:
Lock when using io.Copy in a goroutine
问题
我有一个包含大量"FileMetadata"结构体的切片(filesMeta)。我还有另一个切片(candidates),其中包含一些结构体的索引。我想要做的是修改filesMeta切片,只为candidates切片中索引对应的元素添加一个md5哈希值。
我正在使用goroutines来并行处理,但是io.Copy部分会导致锁定,我不明白为什么会这样。
以下是代码:
for i := range candidates {
wg.Add(1)
go func(i int) {
defer wg.Done()
filesMeta[candidates[i]].Hash = md5Hash(filesMeta[candidates[i]].FullPath)
}(i)
}
wg.Wait()
func md5Hash(filePath string) string {
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
panic(err)
}
hashInBytes := hash.Sum(nil)
return hex.EncodeToString(hashInBytes)
}
谢谢!
编辑: 还有一个细节,当被哈希的文件在SSD上时,不会发生锁定,但是当文件在文件共享上时会发生锁定。
编辑2: 我注意到我忘记传递wg了,现在的代码看起来像这样(仍然出现相同的错误):
for i := range candidates {
wg.Add(1)
go func(i int, wg *sync.WaitGroup) {
defer wg.Done()
filesMeta[candidates[i]].Hash = md5Hash(filesMeta[candidates[i]].FullPath)
}(i, &wg)
}
wg.Wait()
func md5Hash(filePath string) string {
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
panic(err)
}
hashInBytes := hash.Sum(nil)
return hex.EncodeToString(hashInBytes)
}
英文:
I have a slice (filesMeta) containing a large number of "FileMetadata" structs. I also have another slice (candidates) containing the index of some of those structs. What I'm trying to do is modify the filesMeta slice to add an md5 hash but only for the elements which indexes are in the candidates slice.
I'm using goroutines to parallelize the work but the io.Copy part is causing a lock and I don't understand why.
This is the code:
for i := range candidates{
wg.Add(1)
go func(i int) {
defer wg.Done()
filesMeta[candidates[i]].Hash = md5Hash(filesMeta[candidates[i]].FullPath)
}(i)
}
wg.Wait()
func md5Hash(filePath string) string {
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
panic(err)
}
hashInBytes := hash.Sum(nil)
return hex.EncodeToString(hashInBytes)
}
Thanks!
Edit: One more detail, it doesn't lock when the files being hashed are in my SSD but it does when the files are on a fileshare.
Edit2: I noticed I forgot to pass the wg, the code now looks like this (still getting the same error):
for i := range candidates{
wg.Add(1)
go func(i int, wg *sync.WaitGroup) {
defer wg.Done()
filesMeta[candidates[i]].Hash = md5Hash(filesMeta[candidates[i]].FullPath)
}(i, &wg)
}
wg.Wait()
func md5Hash(filePath string) string {
file, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
panic(err)
}
hashInBytes := hash.Sum(nil)
return hex.EncodeToString(hashInBytes)
}
答案1
得分: 0
MarcoLucidi是对的,我一次打开了太多的文件。我限制了并发goroutine的数量,现在它正常工作了。
英文:
MarcoLucidi was right, I was opening too many files at a time. I limited the number of concurrent goroutines and now it works fine.
答案2
得分: -2
当从外部存储读取数据时,特别是通过网络读取时,可能会出现读取挂起的情况。我建议在读取网络驱动器上的文件时,一次只读取一个文件。我知道这会降低并行化能力,但我们不能假装网络驱动器和本地驱动器一样可靠。
编辑 我提出上述解决方案是因为有许多网络参数会影响网络存储设备的性能,例如:流量、传输速度等。我记得曾经使用网络驱动器存储一个Unity项目。有一天,Windows资源管理器开始崩溃,因为Unity使用了太多的文件。我确定它们不是数百万个文件。基于此,我推测这不太可能是由于大量的goroutine导致的。我建议一次处理一个文件,考虑到可能发生的文件很大(超过50GB),这可能会导致与网络存储提供商的通信崩溃。
英文:
When reading from external storage, especially over network, it happens that the read can hang. I recommend that when you read files over networked drive, read just one at a time. I understand that this kills parallelization ability, but we cannot pretend same reliability over networked drives as we do about local ones.
Edit I proposed above solution because there are many network parameters which affect the performance over networked storage devices like: traffic, transmission speed etc. I remember once that I used a drive over network to store a Unity project. One day Windows Explorer started to crash because Unity was using too many files. I am sure they were not millions. Based on that I supposed that it was unlikely that this was happening due to the high number of goroutines. I proposed processing 1 file at a time taking into consideration the case when it may happen that files are big(over 50 GB) which may crash the communication with the network storage provider.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论