英文:
Go encoding transform issue
问题
我在Go语言中有以下代码:
import (
"log"
"net/http"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/encoding/charmap"
)
...
res, err := http.Get(url)
if err != nil {
log.Println("无法读取", url)
log.Println(err)
continue
}
defer res.Body.Close()
我加载的页面包含非UTF-8符号。所以我尝试使用`transform`:
utfBody := transform.NewReader(res.Body, charmap.Windows1251.NewDecoder())
但问题是,即使在这个简单的场景中,它也会返回错误:
bytes, err := ioutil.ReadAll(utfBody)
log.Println(err)
if err == nil {
log.Println(bytes)
}
`transform: short destination buffer`
实际上,它还将`bytes`设置为一些数据,但在我的真实代码中,我使用了`goquery`:
doc, err := goquery.NewDocumentFromReader(utfBody)
它看到了一个错误,并且没有返回任何数据。
我尝试将`res.Body`的“块”传递给`transform.NewReader`,并发现只要`res.Body`不包含非UTF-8数据,它就能正常工作。当它包含非UTF-8字节时,它会失败并显示上述错误。
我对Go语言还不太熟悉,不太明白发生了什么以及如何处理这个问题。
<details>
<summary>英文:</summary>
I have a following code in go:
import (
"log"
"net/http"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/encoding/charmap"
)
...
res, err := http.Get(url)
if err != nil {
log.Println("Cannot read", url);
log.Println(err);
continue
}
defer res.Body.Close()
The page I load contain non UTF-8 symbols. So I try to use `transform`
utfBody := transform.NewReader(res.Body, charmap.Windows1251.NewDecoder())
But the problem is, that it returns error even in this simple scenarion:
bytes, err := ioutil.ReadAll(utfBody)
log.Println(err)
if err == nil {
log.Println(bytes)
}
`transform: short destination buffer`
It also actually sets `bytes` with some data, but in my real code I use `goquery`:
doc, err := goquery.NewDocumentFromReader(utfBody)
Which sees an error and fails with not data in return
I tried to pass "chunks" of `res.Body` to `transform.NewReader` and figuried out, that as long as res.Body contains no non-UTF8 data it works well. And when it contains non-UTF8 byte it fails with an error above.
I'm quite new to go and don't really understand what's going on and how to deal with this
</details>
# 答案1
**得分**: 7
没有整个代码和示例URL,很难确定出现了什么问题。
话虽如此,我可以推荐使用[`golang.org/x/net/html/charset`](https://godoc.org/golang.org/x/net/html/charset)包来解决这个问题,因为它支持*字符猜测*和转换为UTF-8。
```go
func fetchUtf8Bytes(url string) ([]byte, error) {
res, err := http.Get(url)
if err != nil {
return nil, err
}
contentType := res.Header.Get("Content-Type") // 可选,更好的猜测
utf8reader, err := charset.NewReader(res.Body, contentType)
if err != nil {
return nil, err
}
return ioutil.ReadAll(utf8reader)
}
完整示例:http://play.golang.org/p/olcBM9ughv
英文:
Without the whole code along with an example URL it's hard to tell what exactly is going wrong here.
That said, I can recommend the golang.org/x/net/html/charset
package for this as it supports both char guessing and converting to UTF 8.
func fetchUtf8Bytes(url string) ([]byte, error) {
res, err := http.Get(url)
if err != nil {
return nil, err
}
contentType := res.Header.Get("Content-Type") // Optional, better guessing
utf8reader, err := charset.NewReader(res.Body, contentType)
if err != nil {
return nil, err
}
return ioutil.ReadAll(utf8reader)
}
Complete example: http://play.golang.org/p/olcBM9ughv
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论