为什么 `nbytes, err := io.Copy(ioutil.Discard, resp.Body)` 总是返回 0?

huangapple go评论93阅读模式
英文:

why `nbytes, err := io.Copy(ioutil.Discard, resp.Body)` always returns 0?

问题

我已经写了一个简单的程序,用于获取一系列URL并将它们存储在文件中。在这个例子中,我们获取了Google和Gmail的网页内容。我总是在不同的软件版本中运行相同的命令。程序存储在goFetchAll中,这是算法的编译版本的名称。

> 0.23秒 0字节 http://www.google.com
> 1.15秒 0字节 http://www.gmail.com

第二个数字应该是内容的字节数。但它始终为0。

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"os"
	"strings"
	"time"
)

func main() {
	start := time.Now()

	ch := make(chan string)

	for _, url := range os.Args[1:] {
		go fetch(url, ch)
	}

	for range os.Args[1:] {
		fmt.Println(<-ch)
	}

	secs := time.Since(start).Seconds()
	fmt.Sprintf("%.2f秒 经过\n", secs)
}

func fetch(url string, ch chan<- string) {
	start := time.Now()
	resp, err := http.Get(url)
	if err != nil {
		ch <- fmt.Sprint(err)
		return
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		ch <- fmt.Sprintf("无法获取内容")
		return
	}

	nbytes, err := io.Copy(ioutil.Discard, resp.Body)
	defer resp.Body.Close()
	if err != nil {
		ch <- fmt.Sprintf("读取 %s 时出错:%v", url, err)
		return
	}

	secs := time.Since(start).Seconds()
	ch <- fmt.Sprintf("%.2f秒  %7d字节  %s", secs, nbytes, url)

	// 存储到文件
	filename := string(url)
	filename = strings.Replace(filename, ":", "", -1)
	filename = strings.Replace(filename, "//", "-", -1)
	filename = strings.Replace(filename, "/", "", -1)
	filename = strings.Replace(filename, ".", "-", -1)
	filename = "downloads/" + filename + ".html"

	f, err := os.Create(filename)
	f.Write(body)
	defer f.Close()
	if err != nil {
		ch <- fmt.Sprintf("写入 %s 时出错:%v", url, err)
		return
	}
}

我还有一个旧版本的相同脚本,实际上它是正常工作的:

> 0.25秒 10363字节 http://www.google.com
> 0.89秒 66576字节 http://www.gmail.com

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"os"
	"time"
)

func main() {
	start := time.Now()

	ch := make(chan string)

	for _, url := range os.Args[1:] {
		go fetch(url, ch)
	}

	for range os.Args[1:] {
		fmt.Println(<-ch)
	}

	fmt.Println("%.2f秒 经过\n", time.Since(start).Seconds())
}

func fetch(url string, ch chan<- string) {
	start := time.Now()
	resp, err := http.Get(url)
	if err != nil {
		ch <- fmt.Sprint(err)
		return
	}

	nbytes, err := io.Copy(ioutil.Discard, resp.Body)
	resp.Body.Close()
	if err != nil {
		ch <- fmt.Sprintf("读取 %s 时出错:%v", url, err)
		return
	}

	secs := time.Since(start).Seconds()
	ch <- fmt.Sprintf("%.2f秒  %7d字节  %s", secs, nbytes, url)
}

有人能解释一下为什么最新版本总是计算为0秒吗

---

我的部分解决方案如下我只是再次请求了`http.Get(url)`

```go
resp, err := http.Get(url)
nbytes, err := io.Copy(ioutil.Discard, resp.Body)
defer resp.Body.Close() // 避免资源泄漏
if err != nil {
	ch <- fmt.Sprintf("读取 %s 时出错:%v", url, err)
	return
}

resp, err = http.Get(url)
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
	ch <- fmt.Sprintf("无法获取内容")
	return
}
英文:

I've written a simple program that fetch a list of url to store them inside some files. In this example google and gmail. I always run same command in different software version. Program is stored inside goFetchAll: this is the name of compiled version of the algorithm.

> 0.23s 0 http://www.google.com
> 1.15s 0 http://www.gmail.com

The second number should be the number of bytes of content. But it is alway 0.

package main
import (
&quot;fmt&quot;
&quot;io&quot;
&quot;io/ioutil&quot;
&quot;net/http&quot;
&quot;os&quot;
&quot;strings&quot;
&quot;time&quot;
)
func main() {
start := time.Now()
ch := make(chan string)
for _, url := range os.Args[1:] {
go fetch(url, ch)
}
for range os.Args[1:] {
fmt.Println(&lt;-ch)
}
secs := time.Since(start).Seconds()
fmt.Sprintf(&quot;%.2fs elapsed\n&quot;, secs)
}
func fetch(url string, ch chan&lt;- string) {
start := time.Now()
resp, err := http.Get(url)
if err != nil {
ch &lt;- fmt.Sprint(err)
return
}
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
ch &lt;- fmt.Sprintf(&quot;Cant catch content&quot;)
return
}
nbytes, err := io.Copy(ioutil.Discard, resp.Body)
defer resp.Body.Close()
if err != nil {
ch &lt;- fmt.Sprintf(&quot;while reading %s: %v&quot;, url, err)
return
}
secs := time.Since(start).Seconds()
ch &lt;- fmt.Sprintf(&quot;%.2fs  %7d  %s&quot;, secs, nbytes, url)
// store on file
filename := string(url)
filename = strings.Replace(filename, &quot;:&quot;, &quot;&quot;, -1)
filename = strings.Replace(filename, &quot;//&quot;, &quot;-&quot;, -1)
filename = strings.Replace(filename, &quot;/&quot;, &quot;&quot;, -1)
filename = strings.Replace(filename, &quot;.&quot;, &quot;-&quot;, -1)
filename = &quot;downloads/&quot; + filename + &quot;.html&quot;
f, err := os.Create(filename)
f.Write(body)
defer f.Close()
if err != nil {
ch &lt;- fmt.Sprintf(&quot;while writing %s: %v&quot;, url, err)
return
}
}

I've also an older version of same script that actually works:

> 0.25s 10363 http://www.google.com
> 0.89s 66576 http://www.gmail.com

package main
import (
&quot;fmt&quot;
&quot;io&quot;
&quot;io/ioutil&quot;
&quot;net/http&quot;
&quot;os&quot;
&quot;time&quot;
)
func main() {
start := time.Now()
ch := make(chan string)
for _, url := range os.Args[1:] {
go fetch(url, ch)
}
for range os.Args[1:] {
fmt.Println(&lt;-ch)
}
fmt.Println(&quot;%.2fs elapsed\n&quot;, time.Since(start).Seconds())
}
func fetch(url string, ch chan&lt;- string) {
start := time.Now()
resp, err := http.Get(url)
if err != nil {
ch &lt;- fmt.Sprint(err)
return
}
nbytes, err := io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
if err != nil {
ch &lt;- fmt.Sprintf(&quot;whioe reading %s: %v&quot;, url, err)
return
}
secs := time.Since(start).Seconds()
ch &lt;- fmt.Sprintf(&quot;%.2fs  %7d  %s&quot;, secs, nbytes, url)
}

Can someone explain why the newest version always count 0 seconds?


My partial solution is the following. I've just request again http.Get(url)

resp, err := http.Get(url)
nbytes, err := io.Copy(ioutil.Discard, resp.Body)
defer resp.Body.Close() // dont leak resources
if err != nil {
ch &lt;- fmt.Sprintf(&quot;while reading %s: %v&quot;, url, err)
return
}
resp, err = http.Get(url)
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
ch &lt;- fmt.Sprintf(&quot;Cant catch content&quot;)
return
}

答案1

得分: 1

这是因为在第一次调用时,您已经读取了响应。所以第二次调用时,从流中读取了0字节。在删除错误检查调用之后:

resp, err := http.Get(url)
body, err := ioutil.ReadAll(resp.Body)
nbytes, err := io.Copy(ioutil.Discard, resp.Body)

请注意第二行的ReadAll调用。

我还想提出一个小建议(与实际问题无关),就是在初始化流之后立即使用defer调用。例如:

resp, err := http.Get(url)
if err != nil {
ch <- fmt.Sprint(err)
return
}
defer resp.Body.Close()

虽然没有明确提到,但可以从Effective Go的这一部分推断出来。这里是引述的内容:

其次,这意味着关闭操作与打开操作紧邻,比将其放在函数末尾要清晰得多。

英文:

The reason for this is because you've already read the response at the time of that call once. So the second time, 0 bytes are read from the stream. After the error checking calls are removed:

resp, err := http.Get(url)
body, err := ioutil.ReadAll(resp.Body)
nbytes, err := io.Copy(ioutil.Discard, resp.Body)

Note the ReadAll call on the second line.

One more small suggestion I'd like to propose (not related to the actual question) is to use the defer calls right after initialising the stream. For instance:

resp, err := http.Get(url)
if err != nil {
ch &lt;- fmt.Sprint(err)
return
}
defer resp.Body.Close()

Although not mentioned specifically, it can be inferred from this section in Effective Go. Paraphrasing here:

> Second, it means that the close sits near the open, which is much clearer than placing it at the end of the function.

huangapple
  • 本文由 发表于 2017年1月14日 16:58:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/41648308.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定