使用goroutines复制子目录

huangapple go评论158阅读模式
英文:

Copy subdirectories using goroutines

问题

我的程序将计算机中不同位置的多个文件和目录复制到一个地方。

其中一个目录非常大,所以复制它大约需要20-30秒的时间。目前,我只是创建了一个将该目录作为goroutine启动的方法:

func CopySpecificDirectory(source, dest string, quit chan int) (err error) {
    files, err := os.Open(source)
    file, err := files.Readdir(0)

    if err != nil {
        fmt.Printf("Error reading directory %s: %s\n", source, err)
        return err
    }

    for _, f := range file {
        if f.IsDir() {
            copy.CopyDir(source+"\\"+f.Name(), dest+"\\"+f.Name())
        } else {
            copy.CopyFile(source+"\\"+f.Name(), dest+"\\"+f.Name())
        }
    }

    quit <- 1

    return nil
}

主函数:

quit := make(chan int)
go CopySpecificDirectory(config.Location+"\\Directory", config.Destination, quit)

这样只能将程序改进几秒钟。在我的CopySpecificDirectory方法中(如果这是最好的方法),我希望为每个目录创建一个goroutine,可能是这样的:

c := make(chan int)
for _, f := range file {
    if f.IsDir() {
        go func() {
            copy.CopyDir(source+"\\"+f.Name(), dest+"\\"+f.Name())
            c <- 1
        }()
    } else {
        copy.CopyFile(source+"\\"+f.Name(), dest+"\\"+f.Name())
    }
}

使用这种方法,我不知道在哪里等待每个目录的复制完成(<- c)。这是最好的方法吗?如果有其他建议,关于最快的复制目录的方法,我很愿意听取。

编辑:

我使用了网站上sync.WaitGroup示例中的方法。

for _, f := range file {
    if f.IsDir() {
        wg.Add(1)
        go func() {
            defer wg.Done()
            copy.CopyDir(source+"\\"+f.Name(), dest+"\\"+f.Name())
        }()
    }
    // 更多代码
}

我在全局声明了var wg sync.WaitGroup,并在主函数中在调用CopySpecificDirectory之后使用wg.Wait()

但是,CopySpecificDirectory在复制所有内容之前就已经完成了。我做错了什么?看起来它没有等待goroutine完成。

英文:

My program copies multiple files and directories from different parts of the computer to one place.

One of the directories is very big, so it takes about 20-30 seconds to copy it. For now I just made this method which copies that directory to start as a goroutine:

func CopySpecificDirectory(source, dest string, quit chan int) (err error) {
	files, err := os.Open(source)
	file, err := files.Readdir(0)

	if err != nil {
		fmt.Printf(&quot;Error reading directory %s: %s\n&quot;, source, err)
		return err
	}

	for _, f := range file {
		if f.IsDir() {
			copy.CopyDir(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
		} else {
			copy.CopyFile(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
		}
	}

	quit &lt;- 1

	return nil
}

Main:

quit := make(chan int)
go CopySpecificDirectory(config.Location+&quot;\\Directory&quot;, config.Destination, quit)

This just improves my program by a few seconds. Inside my CopySpecificDirectory method (if this is the best way) I want for each directory to create a goroutine, something like this maybe:

c := make(chan int)
for _, f := range file {
	if f.IsDir() {
		go func() {
			copy.CopyDir(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
			c &lt;- 1
		}()
	} else {
		copy.CopyFile(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
	}
}

With this approach I don't know where to wait for the copy to finish for every directory (<- c).
Is this the best way ? If anyone has other suggestion what is the fastest way to copy a directory, I will love to hear it.

edit:

I used the aproach form the example of sync.WaitGroup from the website.

for _, f := range file {
	if f.IsDir() {
		wg.Add(1)
		go func() {
			defer wg.Done()
			copy.CopyDir(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
		}()
    // more code

I have declared var wg sync.WaitGroup as global, and I do wg.Wait() in main right after I call CopySpecificDirectory.

But CopySpecificDirectory finishes before copying all the contents. What am I doing wrong ? Looks like it is not waiting for the goroutines to finish.

答案1

得分: 2

使用sync.WaitGroup()代替通道:

  1. 创建一个等待组对象。
  2. 在启动goroutine之前,对该对象调用Add()方法增加一个计数。
  3. 当一个goroutine即将退出时,调用该对象的Done()方法。
  4. 在主要的(等待)代码中,调用该对象的Wait()方法。一旦所有以这种方式“跟踪”的goroutine完成执行,该函数将返回。

请注意,你的程序是I/O绑定的,而不是CPU绑定的。如果你的代码需要从物理上不同的设备复制文件到(其他)物理上不同的设备,你可以节省一些时间。如果你只是在同一个文件系统上移动文件,或者所有的源文件都在同一个文件系统上,或者所有的目标文件都在同一个文件系统上,你不会获得太多好处,因为你的goroutine只会竞争单个共享资源——存储设备——而最终结果与顺序执行复制操作时的情况并没有太大区别。

举个例子,经典Unix系统上的/etc/fstab文件的手册页面提到,操作系统从不同时检查位于同一物理介质上的文件系统——只会按顺序检查,而同时会并行检查位于不同驱动器上的文件系统。请参阅手册页面fs_passno参数的条目。

英文:

Use sync.WaitGroup() instead of channels:

  1. Create a wait group object.
  2. Before spawning a goroutine, Add() one to it.
  3. When a goroutine is about to quit, it calls Done() on that object.
  4. In your main (waiting) code, call Wait() on that object. This function will return once all the goroutines "tracked" this way finish their execution.

Note that your program is I/O bound, not CPU-bound. You could save some time if your code would need to copy files from physically different devices to (other) physically different devices. If you're just shuffling files around on the same filesystem, or all your sources are on the same filesystem, or all your destinations are on the same filesystem, you won't gain much as your goroutines would just compete over the single shared resource&mdash;the storage device&mdash;and the end result won't be much more different from the case when you were just executing copying operations sequentially.

To provide an example, the manual page for the /etc/fstab file which contains information on mounted/mountable filesystems on classic Unix systems mentions that the OS never checks filesystems located on the same physical medium at the same time&mdash;only sequentially, while at the same time it would check filesystems located on different drives in parallel. See the entry for the fs_passno parameter in the manual page.

答案2

得分: 1

使用这种方法,我不知道在每个目录(<- c)中等待复制完成的位置。

您可以使用SyncGroup来协调所有的goroutine,而不是在通道上发出信号。您可以为每个生成的goroutine调用wg.Add(1),并在它们完成时调用wg.Done()。然后,在生成它们之后调用wg.Wait(),以等待它们全部完成。

至于如何提高复制速度,没有确定的答案。这取决于许多因素(可能是操作系统、文件系统、硬盘、负载等)。

英文:

> With this approach I don't know where to wait for the copy to finish for every directory (<- c).

Instead of signaling on a channel, you could use SyncGroup to coordinate all your goroutines. You call wg.Add(1) for each spawned goroutine and make them call wg.Done() when they're, well, done. Then you do wg.Wait() after spawning all of them to wait until they all finish.

As for how to speed up copying in general, there is no definite answer. It depends on a lot of factors (OS probably, filesystem, hard disk, load, etc.).

答案3

得分: 0

感谢@kostix和@justinas的帮助。我遵循他们的解决方案,唯一剩下的问题是在我的for循环中,直到循环完成后f才会绑定。

所以我不得不添加f := f。现在这个问题解决了:

for _, f := range file {
    f := f
    if f.IsDir() {
        wg.Add(1)
        go func() {
            copy.CopyDir(source+"\\"+f.Name(), dest+"\\"+f.Name())
            defer wg.Done()
        }()
    } else {
        copy.CopyFile(source+"\\"+f.Name(), dest+"\\"+f.Name())
    }
}
英文:

Thanks to both @kostix and @justinas for helping out. I follow their solution, the only problem still left was that inside my for loop f doesn't bind necessarily until after the loop completes.

So I had to add f := f. This works now:

for _, f := range file {
	f := f
	if f.IsDir() {
		wg.Add(1)
		go func() {
			copy.CopyDir(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
			defer wg.Done()
		}()
	} else {
		copy.CopyFile(source+&quot;\\&quot;+f.Name(), dest+&quot;\\&quot;+f.Name())
	}
}

huangapple
  • 本文由 发表于 2014年1月30日 22:40:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/21459457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定