Golang下载文件而不是HTML页面

huangapple go评论94阅读模式
英文:

golang downloading file instead of html page

问题

我想从这个URL下载一个pgn文本文件:http://www.chess.com/echess/download_pgn?lid=1222621131。
我有以下(已编辑)的代码,它应该可以实现这个功能,但它下载的是一个HTML页面。我可能做错了什么?

package main

import (
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
)

func main() {
	url := "http://www.chess.com/echess/download_pgn?lid=1222621131"
	filename := "game.pgn"
	resp, err := http.Get(url)
	...

	file, err := os.Create(filename)
	defer file.Close()

	...

	size, err := io.Copy(file, resp.Body)
}
英文:

I would like to download a pgn text file from this URL: http://www.chess.com/echess/download_pgn?lid=1222621131.
I have the following (redacted) code which is supposed to do this, but it is downloading a html page instead.What could I be doing wrong?

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "os"
)

func main() {
    url := "http://www.chess.com/echess/download_pgn?lid=1222621131"
    filename := "game.pgn"
    resp, err := http.Get(url)
    ...

    file, err := os.Create(filename)
    defer file.Close()
 
    ...

    size, err := io.Copy(file, resp.Body)   
}

答案1

得分: 2

首先猜测是你没有提供浏览器会话通常会提供的所有正常身份验证、cookie和标头。作为一个实验,打开Chrome的无痕模式,然后打开开发者工具,在该窗口中输入上面的GET URL。当我这样做时,我查看Chrome的网络选项卡中的第一个GET请求。注意下面的请求和响应详细信息。特别注意302响应代码,表示找到了,但是正在重定向。现在查找Location标头。它显示为'/login'。我怀疑这就是你的代码正在下载的页面,因为你的Go程序没有像浏览器那样具有此站点的登录会话/cookie。

浏览器在浏览网站时会做很多工作。从头开始编写这个过程可能有点麻烦。你必须注意cookie、身份验证、标头、重定向等等。

远程地址:174.35.7.172:80
请求URL:http://www.chess.com/echess/download_pgn?lid=1222621131
请求方法:GET
状态代码:302 Found
响应标头
查看解析
HTTP/1.1 302 Found
日期:Sat, 25 Jul 2015 20:49:43 GMT
服务器:PWS/8.1.20.22
X-Px:ms h0-s1027.p12-sjc(源)
P3P:CP="ALL DSP COR LAW CURa ADMa DEVa TAIa OUR BUS IND ONL UNI COM NAV DEM CNT"
缓存控制:private
Pragma:no-cache
过期:Thu, 19 Nov 1981 08:52:00 GMT
内容长度:0
内容类型:text/html; charset=utf-8
位置:/login
连接:keep-alive
设置Cookie:PHPSESSID=pach18her77q4asgsq2heohvj1;路径=/;域=.chess.com;HttpOnly
请求标头
查看解析
GET /echess/download_pgn?lid=1222621131 HTTP/1.1
主机:www.chess.com
连接:keep-alive
接受:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
用户代理:Mozilla/5.0(Macintosh; Intel Mac OS X 10_10_4)AppleWebKit/537.36(KHTML, like Gecko)Chrome/43.0.2357.134 Safari/537.36
接受编码:gzip, deflate, sdch
接受语言:en-US,en;q=0.8,es;q=0.6
查询字符串参数
查看源代码
查看URL编码
lid:1222621131

英文:

First guess is that you've failed to supply all of the normal auth, cookies, and headers a browser session normally would supply. As an experiment, open up Chrome in Incognito mode, then open your developer tools, now in that window hit the URL you GET above. When I do this I look at the first GET in the Network tab in Chrome. Notice the request and response details below. Pay attention to the response code of 302 which means it is found, but you are being redirected. Now take a look for the Location header. It reads '/login'. I suspect this is the very page your code is downloading since your Go program does not have the login session/cookies for this site like your browser does.

There's a lot of work our browsers do to navigate a website. Coding that up from scratch can be a bit of work. You have to pay attention to cookies, authentication, headers, redirects, and more.

<pre>
Remote Address:174.35.7.172:80
Request URL:http://www.chess.com/echess/download_pgn?lid=1222621131
Request Method:GET
Status Code:302 Found
Response Headers
view parsed
HTTP/1.1 302 Found
Date: Sat, 25 Jul 2015 20:49:43 GMT
Server: PWS/8.1.20.22
X-Px: ms h0-s1027.p12-sjc ( origin)
P3P: CP="ALL DSP COR LAW CURa ADMa DEVa TAIa OUR BUS IND ONL UNI COM NAV DEM CNT"
Cache-Control: private
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Content-Length: 0
Content-Type: text/html; charset=utf-8
Location: /login
Connection: keep-alive
Set-Cookie: PHPSESSID=pach18her77q4asgsq2heohvj1; path=/; domain=.chess.com; HttpOnly
Request Headers
view parsed
GET /echess/download_pgn?lid=1222621131 HTTP/1.1
Host: www.chess.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,es;q=0.6
Query String Parameters
view source
view URL encoded
lid:1222621131
</pre>

huangapple
  • 本文由 发表于 2015年7月26日 04:38:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/31630896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定