为什么具有网络I/O的goroutines被阻塞?

huangapple go评论130阅读模式
英文:

Why are goroutines with network i/o being blocked?

问题

我正在Ubuntu 13.04上使用go 1.1 devel版本。

根据http://golang.org/doc/faq#goroutines的说明:

当一个协程阻塞,比如调用一个阻塞系统调用时,运行时会自动将同一操作系统线程上的其他协程移动到一个不同的可运行线程上,以避免它们被阻塞。

我正在尝试编写一个可以使用协程以分块方式下载大文件的下载器,以下是我想出的最佳协程:

func download(uri string, chunks chan int, offset int, file *os.file) {
    for current := range chunks {

        fmt.println("downloading range: ", current, "-", current+offset)

        client := &http.client{}
        req, _ := http.newrequest("get", uri, nil)
        req.header.set("range: ", fmt.sprintf("bytes=%d-%d", current, current+offset))
        resp, err := client.do(req)
        if err != nil {
            panic(err)
        }
        defer resp.body.close()
        body, err := ioutil.readall(resp.body)
        if err != nil {
            panic(err)
        }
        file.write(body)
    }
}

完整的脚本可在https://github.com/tuxcanfly/godown/blob/master/godown.go找到。

尽管文件被正确地下载和保存,但我可以看到第二个块只有在第一个块完成后才开始。

难道分块下载不应该并行运行,还是我做错了什么?

英文:

I'm using go 1.1 devel on Ubuntu 13.04

go version devel +ebe8bca920ad Wed May 15 15:34:47 2013 +1000 linux/386

According to http://golang.org/doc/faq#goroutines

> When a coroutine blocks, such as by calling a blocking system call,
> the run-time automatically moves other coroutines on the same
> operating system thread to a different, runnable thread so they won't
> be blocked.

I'm trying to write a downloader which can download a large file in chunks using goroutines
and this is the best goroutine I've come up with:

func download(uri string, chunks chan int, offset int, file *os.file) {
    for current := range chunks {

        fmt.println("downloading range: ", current, "-", current+offset)

        client := &http.client{}
        req, _ := http.newrequest("get", uri, nil)
        req.header.set("range: ", fmt.sprintf("bytes=%d-%d", current, current+offset))
        resp, err := client.do(req)
        if err != nil {
            panic(err)
        }
        defer resp.body.close()
        body, err := ioutil.readall(resp.body)
        if err != nil {
            panic(err)
        }
        file.write(body)
    }
}

The full script is available at https://github.com/tuxcanfly/godown/blob/master/godown.go

Even though, the files are being dowloaded and saved correctly, I can see that the second chunk starts
only when the first finishes.

Shouldn't the chunked downloads run in parallel, or is there something I'm doing wrong?

答案1

得分: 17

你只有一个goroutine在下载块。

第64行:

go download(*download_url, chunks, offset, file)

你可能想要的是:

for i := 0; i < *threads; i++ {
    go download(*download_url, chunks, offset, file)
}

这将同时下载*threads个块。


在并发工作正常之后,你可能会注意到第29行的工作方式并不是你想要的。如果块1在块2之前完成,那么部分将按照错误的顺序写入。你可能希望改用http://golang.org/pkg/os/#File.WriteAt。


你的Range头部还有两个问题。

  1. 你没有下载剩余部分。如果文件大小为3002,你有3个线程,它将请求0-1000,1000-2000,2000-3000,最后2个字节将永远不会被下载。
  2. 字节范围是包含的。这意味着你(如前面的例子所示)会重复下载一些字节。字节1000和2000被请求了两次。当然,只要写入正确的位置,你不应该有太多问题。

通过将第19行从

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset))

更改为

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset-1))

可以很容易地解决第二个问题。

有关Range头部的更多信息,我建议阅读RFC2616的第14.35节

英文:

You only have one goroutine downloading chunks.

Line 64:

go download(*download_url, chunks, offset, file)

What you probably want is:

for i := 0; i &lt; *threads; i++ {
    go download(*download_url, chunks, offset, file)
}

This will download *threads chunks at once.


After you have concurrency working, you will probably notice that line 29 doesn't work how you intend. If chunk 1 finishes before chunk 2, the parts will be written out of order. You may want to instead use http://golang.org/pkg/os/#File.WriteAt.


You also have two problems with your Range header.

  1. You don't download the remainder. If the file size is 3002 and you have 3 threads, it will request 0-1000, 1000-2000, 2000-3000 and the last 2 bytes will never be downloaded.
  2. Byte ranges are inclusive. that means you are (as you can see in the previous example) downloading some bytes twice. Byte 1000 and 2000 are requested twice. Of course, as long as you write to the correct locations, you shouldn't have too much of a problem.

Number two is easy enough to fix by changing line 19 from

req.Header.Set(&quot;Range: &quot;, fmt.Sprintf(&quot;bytes=%d-%d&quot;, current, current+offset))

to this

req.Header.Set(&quot;Range: &quot;, fmt.Sprintf(&quot;bytes=%d-%d&quot;, current, current+offset-1))

For more information on the Range header, I suggest reading Section 14.35 in RFC2616

huangapple
  • 本文由 发表于 2013年5月20日 12:50:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/16642799.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定