2013年5月20日 12:50:37go评论162阅读模式

英文:

Why are goroutines with network i/o being blocked?

问题

我正在Ubuntu 13.04上使用go 1.1 devel版本。

根据http://golang.org/doc/faq#goroutines的说明：

当一个协程阻塞，比如调用一个阻塞系统调用时，运行时会自动将同一操作系统线程上的其他协程移动到一个不同的可运行线程上，以避免它们被阻塞。

我正在尝试编写一个可以使用协程以分块方式下载大文件的下载器，以下是我想出的最佳协程：

func download(uri string, chunks chan int, offset int, file *os.file) {
    for current := range chunks {
        fmt.println("downloading range: ", current, "-", current+offset)
        client := &http.client{}
        req, _ := http.newrequest("get", uri, nil)
        req.header.set("range: ", fmt.sprintf("bytes=%d-%d", current, current+offset))
        resp, err := client.do(req)
        if err != nil {
            panic(err)
        }
        defer resp.body.close()
        body, err := ioutil.readall(resp.body)
        if err != nil {
            panic(err)
        }
        file.write(body)
    }
}

完整的脚本可在https://github.com/tuxcanfly/godown/blob/master/godown.go找到。

尽管文件被正确地下载和保存，但我可以看到第二个块只有在第一个块完成后才开始。

难道分块下载不应该并行运行，还是我做错了什么？

英文:

I'm using go 1.1 devel on Ubuntu 13.04

go version devel +ebe8bca920ad Wed May 15 15:34:47 2013 +1000 linux/386

According to http://golang.org/doc/faq#goroutines

> When a coroutine blocks, such as by calling a blocking system call,
> the run-time automatically moves other coroutines on the same
> operating system thread to a different, runnable thread so they won't
> be blocked.

I'm trying to write a downloader which can download a large file in chunks using goroutines
and this is the best goroutine I've come up with:

func download(uri string, chunks chan int, offset int, file *os.file) {
    for current := range chunks {
        fmt.println(&quot;downloading range: &quot;, current, &quot;-&quot;, current+offset)
        client := &amp;http.client{}
        req, _ := http.newrequest(&quot;get&quot;, uri, nil)
        req.header.set(&quot;range: &quot;, fmt.sprintf(&quot;bytes=%d-%d&quot;, current, current+offset))
        resp, err := client.do(req)
        if err != nil {
            panic(err)
        }
        defer resp.body.close()
        body, err := ioutil.readall(resp.body)
        if err != nil {
            panic(err)
        }
        file.write(body)
    }
}

The full script is available at https://github.com/tuxcanfly/godown/blob/master/godown.go

Even though, the files are being dowloaded and saved correctly, I can see that the second chunk starts
only when the first finishes.

Shouldn't the chunked downloads run in parallel, or is there something I'm doing wrong?

答案1

得分: 17

你只有一个goroutine在下载块。

第64行：

go download(*download_url, chunks, offset, file)

你可能想要的是：

for i := 0; i < *threads; i++ {
    go download(*download_url, chunks, offset, file)
}

这将同时下载*threads个块。

在并发工作正常之后，你可能会注意到第29行的工作方式并不是你想要的。如果块1在块2之前完成，那么部分将按照错误的顺序写入。你可能希望改用http://golang.org/pkg/os/#File.WriteAt。

你的Range头部还有两个问题。

你没有下载剩余部分。如果文件大小为3002，你有3个线程，它将请求0-1000，1000-2000，2000-3000，最后2个字节将永远不会被下载。
字节范围是包含的。这意味着你（如前面的例子所示）会重复下载一些字节。字节1000和2000被请求了两次。当然，只要写入正确的位置，你不应该有太多问题。

通过将第19行从

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset))

更改为

req.Header.Set("Range: ", fmt.Sprintf("bytes=%d-%d", current, current+offset-1))

可以很容易地解决第二个问题。

有关Range头部的更多信息，我建议阅读RFC2616的第14.35节。

英文:

You only have one goroutine downloading chunks.

Line 64:

go download(*download_url, chunks, offset, file)

What you probably want is:

for i := 0; i &lt; *threads; i++ {
    go download(*download_url, chunks, offset, file)
}

This will download *threads chunks at once.

After you have concurrency working, you will probably notice that line 29 doesn't work how you intend. If chunk 1 finishes before chunk 2, the parts will be written out of order. You may want to instead use http://golang.org/pkg/os/#File.WriteAt.

You also have two problems with your Range header.

You don't download the remainder. If the file size is 3002 and you have 3 threads, it will request 0-1000, 1000-2000, 2000-3000 and the last 2 bytes will never be downloaded.
Byte ranges are inclusive. that means you are (as you can see in the previous example) downloading some bytes twice. Byte 1000 and 2000 are requested twice. Of course, as long as you write to the correct locations, you shouldn't have too much of a problem.

Number two is easy enough to fix by changing line 19 from

req.Header.Set(&quot;Range: &quot;, fmt.Sprintf(&quot;bytes=%d-%d&quot;, current, current+offset))

to this

req.Header.Set(&quot;Range: &quot;, fmt.Sprintf(&quot;bytes=%d-%d&quot;, current, current+offset-1))

For more information on the Range header, I suggest reading Section 14.35 in RFC2616

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么具有网络I/O的goroutines被阻塞？

问题

答案1

Parse .Net JSON date with Go

无需使用结构体进行JSON解组。

去掉最后一个斜杠及其右边的多个字符。

在Golang中，`unsafe.Pointer()`是否具有引用计数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。