如何解压缩一个以gzip格式压缩的[]byte内容,在解组时出现错误。

huangapple go评论81阅读模式
英文:

How to decompress a []byte content in gzip format that gives an error when unmarshaling

问题

我正在请求一个API,从响应中获取一个[]byte(使用ioutil.ReadAll(resp.Body))。我试图对这个内容进行解码,但似乎它不是以utf-8格式编码的,因为解码时会返回错误。我尝试了以下代码:

package main

import (
	"encoding/json"
	"fmt"

	"some/api"
)

func main() {
	content := api.SomeAPI.SomeRequest() // []byte变量
	var data interface{}
	err := json.Unmarshal(content, &data)
	if err != nil {
		panic(err.Error())
	}
	fmt.Println("来自响应的数据", data)
}

我得到的错误是invalid character '\x1f' looking for beginning of value。值得一提的是,响应头中包含了Content-Type:[application/json; charset=utf-8]

在解码时,如何对content进行解码以避免出现这些无效字符?

编辑

这是content的十六进制转储:play.golang.org/p/oJ5mqERAmj

英文:

I'm making a request to an API, which with I get a []byte out of the response (ioutil.ReadAll(resp.Body)). I'm trying to unmarshal this content, but seems to be not encoded on utf-8 format, as unmarshal returns an error. I'm trying this to do so:

package main

import (
	"encoding/json"
	"fmt"

    "some/api"
)

func main() {
	content := api.SomeAPI.SomeRequest() // []byte variable
 	var data interface{}
	err := json.Unmarshal(content, &data)
	if err != nil {
		panic(err.Error())
	}
	fmt.Println("Data from response", data)
}

I get as an error that invalid character '\x1f' looking for beginning of value. For the record, the response includes in the header that Content-Type:[application/json; charset=utf-8].

How can I decode content to avoid these invalid characters when unmarshaling?

Edit

This is the hexdump of content: play.golang.org/p/oJ5mqERAmj

答案1

得分: 13

根据你的十六进制转储,你正在接收gzip编码的数据,所以你需要先使用compress/gzip来解码它。

尝试像这样的代码:

package main

import (
	"bytes"
	"compress/gzip"
	"encoding/json"
	"fmt"
	"io"
	"some/api"
)

func main() {
	content := api.SomeAPI.SomeRequest() // []byte变量

	// 将内容解压缩为io.Reader
	buf := bytes.NewBuffer(content)
	reader, err := gzip.NewReader(buf)
	if err != nil {
		panic(err)
	}

	// 使用流接口从io.Reader解码json
	var data interface{}
	dec := json.NewDecoder(reader)
	err = dec.Decode(&data)
	if err != nil && err != io.EOF {
		panic(err)
	}
	fmt.Println("来自响应的数据", data)
}

之前的内容:

字符\x1f是ASCII和UTF-8中的单元分隔符字符。它从不是UTF-8编码的一部分,但可以用于标记不同的文本部分。带有\x1f的字符串可以是有效的UTF-8,但据我所知,不是有效的JSON。

我认为你需要仔细阅读API规范,以找出他们在使用\x1f标记的用途,但同时你可以尝试将它们删除,看看会发生什么,例如:

import (
	"bytes"
	"fmt"
)

func main() {
	b := []byte("hello\x1fGoodbye")
	fmt.Printf("b was %q\n", b)
	b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
	fmt.Printf("b is now %q\n", b)
}

输出:

b was "hello\x1fGoodbye"
b is now "hello Goodbye"

Playground链接

英文:

Judging by your hex dump you are receiving gzip encoded data so you'll need to use compress/gzip to decode it first.

Try something like this

package main

import (
	"bytes"
	"compress/gzip"
	"encoding/json"
	"fmt"
	"io"
	"some/api"
)

func main() {
	content := api.SomeAPI.SomeRequest() // []byte variable

	// decompress the content into an io.Reader
	buf := bytes.NewBuffer(content)
	reader, err := gzip.NewReader(buf)
	if err != nil {
		panic(err)
	}

    // Use the stream interface to decode json from the io.Reader
	var data interface{}
   	dec := json.NewDecoder(reader)
	err = dec.Decode(&data)
	if err != nil && err != io.EOF {
		panic(err)
	}
	fmt.Println("Data from response", data)
}

Previous

Character \x1f is the unit separator character in ASCII and UTF-8. It is never part of an UTF-8 encoding, however can be used to mark off different bits of text. A string with an \x1f can valid UTF-8 but not valid json as far as I know.

I think you need to read the API specification closely to find out what they are using the \x1f markers for, but in the meantime you could try removing them and see what happens, eg

import (
	"bytes"
	"fmt"
)

func main() {
	b := []byte("hello\x1fGoodbye")
	fmt.Printf("b was %q\n", b)
	b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
	fmt.Printf("b is now %q\n", b)
}

Prints

b was "hello\x1fGoodbye"
b is now "hello Goodbye"

Playground link

huangapple
  • 本文由 发表于 2013年10月7日 23:17:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/19228514.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定