英文:
How to decompress a []byte content in gzip format that gives an error when unmarshaling
问题
我正在请求一个API,从响应中获取一个[]byte
(使用ioutil.ReadAll(resp.Body)
)。我试图对这个内容进行解码,但似乎它不是以utf-8格式编码的,因为解码时会返回错误。我尝试了以下代码:
package main
import (
"encoding/json"
"fmt"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte变量
var data interface{}
err := json.Unmarshal(content, &data)
if err != nil {
panic(err.Error())
}
fmt.Println("来自响应的数据", data)
}
我得到的错误是invalid character '\x1f' looking for beginning of value
。值得一提的是,响应头中包含了Content-Type:[application/json; charset=utf-8]
。
在解码时,如何对content
进行解码以避免出现这些无效字符?
编辑
这是content
的十六进制转储:play.golang.org/p/oJ5mqERAmj
英文:
I'm making a request to an API, which with I get a []byte
out of the response (ioutil.ReadAll(resp.Body)
). I'm trying to unmarshal this content, but seems to be not encoded on utf-8 format, as unmarshal returns an error. I'm trying this to do so:
package main
import (
"encoding/json"
"fmt"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte variable
var data interface{}
err := json.Unmarshal(content, &data)
if err != nil {
panic(err.Error())
}
fmt.Println("Data from response", data)
}
I get as an error that invalid character '\x1f' looking for beginning of value
. For the record, the response includes in the header that Content-Type:[application/json; charset=utf-8]
.
How can I decode content
to avoid these invalid characters when unmarshaling?
Edit
This is the hexdump of content
: play.golang.org/p/oJ5mqERAmj
答案1
得分: 13
根据你的十六进制转储,你正在接收gzip编码的数据,所以你需要先使用compress/gzip来解码它。
尝试像这样的代码:
package main
import (
"bytes"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte变量
// 将内容解压缩为io.Reader
buf := bytes.NewBuffer(content)
reader, err := gzip.NewReader(buf)
if err != nil {
panic(err)
}
// 使用流接口从io.Reader解码json
var data interface{}
dec := json.NewDecoder(reader)
err = dec.Decode(&data)
if err != nil && err != io.EOF {
panic(err)
}
fmt.Println("来自响应的数据", data)
}
之前的内容:
字符\x1f
是ASCII和UTF-8中的单元分隔符字符。它从不是UTF-8编码的一部分,但可以用于标记不同的文本部分。带有\x1f
的字符串可以是有效的UTF-8,但据我所知,不是有效的JSON。
我认为你需要仔细阅读API规范,以找出他们在使用\x1f
标记的用途,但同时你可以尝试将它们删除,看看会发生什么,例如:
import (
"bytes"
"fmt"
)
func main() {
b := []byte("hello\x1fGoodbye")
fmt.Printf("b was %q\n", b)
b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
fmt.Printf("b is now %q\n", b)
}
输出:
b was "hello\x1fGoodbye"
b is now "hello Goodbye"
英文:
Judging by your hex dump you are receiving gzip encoded data so you'll need to use compress/gzip to decode it first.
Try something like this
package main
import (
"bytes"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte variable
// decompress the content into an io.Reader
buf := bytes.NewBuffer(content)
reader, err := gzip.NewReader(buf)
if err != nil {
panic(err)
}
// Use the stream interface to decode json from the io.Reader
var data interface{}
dec := json.NewDecoder(reader)
err = dec.Decode(&data)
if err != nil && err != io.EOF {
panic(err)
}
fmt.Println("Data from response", data)
}
Previous
Character \x1f
is the unit separator character in ASCII and UTF-8. It is never part of an UTF-8 encoding, however can be used to mark off different bits of text. A string with an \x1f
can valid UTF-8 but not valid json as far as I know.
I think you need to read the API specification closely to find out what they are using the \x1f
markers for, but in the meantime you could try removing them and see what happens, eg
import (
"bytes"
"fmt"
)
func main() {
b := []byte("hello\x1fGoodbye")
fmt.Printf("b was %q\n", b)
b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
fmt.Printf("b is now %q\n", b)
}
Prints
b was "hello\x1fGoodbye"
b is now "hello Goodbye"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论