如何使用GORM将包含转义码的JSON插入到PostgreSQL的JSONB列中?

huangapple go评论95阅读模式
英文:

How to insert JSON containing escape codes into a JSONB column in PostgreSQL using GORM

问题

我正在尝试将JSON字节存储到PostgreSQL中,但出现了问题。

> 无法将\u0000转换为文本。

如下所示,JSON包含诸如\u0000之类的转义序列,似乎PostgreSQL将其解释为Unicode字符,而不是JSON字符串。

err := raws.SaveRawData(data, url)
// 如果字节中存在"\u0000"
if err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {
	// 尝试移除\u0000,但不起作用
    data = bytes.Trim(data, "\u0000")
	e := raws.SaveRawData(data, url) // 再次保存数据
	if e != nil {
		return e // 返回相同的错误
	}
	return nil
}

原始API数据可以从这里访问。其中包含\u0000:

{
  "code": 0,
  "message": "0",
  "ttl": 1,
  "data": {
    "bvid": "BV1jb411C7m3",
    "aid": 42443484,
    "videos": 1,
    "tid": 172,
    "tname": "手机游戏",
    "copyright": 1,
    "pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",
    "title": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
    "pubdate": 1549100438,
    "ctime": 1549100438,
    "desc": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
    "desc_v2": [
      {
        "raw_text": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
        "type": 1,
        "biz_id": 0
      }
    ],
    "state": 0,
    "duration": 265,
    "rights": {
      "bp": 0,
      "elec": 0,
      "download": 1,
      "movie": 0,
      "pay": 0,
      "hd5": 0,
      "no_reprint": 1,
      "autoplay": 1,
      "ugc_pay": 0,
      "is_cooperation": 0,
      "ugc_pay_preview": 0,
      "no_background": 0,
      "clean_mode": 0,
      "is_stein_gate": 0
    },
    "owner": {
      "mid": 39699039,
      "name": "明眸-雅望",
      "face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"
    },
    "stat": {
      "aid": 42443484,
      "view": 1107,
      "danmaku": 7,
      "reply": 22,
      "favorite": 5,
      "coin": 4,
      "share": 0,
      "now_rank": 0,
      "his_rank": 0,
      "like": 10,
      "dislike": 0,
      "evaluation": "",
      "argue_msg": ""
    },
    "dynamic": "#崩坏3#",
    "cid": 74479750,
    "dimension": {
      "width": 1280,
      "height": 720,
      "rotate": 0
    },
    "no_cache": false,
    "pages": [
      {
        "cid": 74479750,
        "page": 1,
        "from": "vupload",
        "part": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
        "duration": 265,
        "vid": "",
        "weblink": "",
        "dimension": {
          "width": 1280,
          "height": 720,
          "rotate": 0
        }
      }
    ],
    "subtitle": {
      "allow_submit": false,
      "list": []
    },
    "user_garb": {
      "url_image_ani_cut": ""
    }
  }
}

用于保存的结构体是:

type RawJSONData struct {
	ID        uint64         `gorm:"primarykey" json:"id"`
	CreatedAt time.Time      `json:"-"`
	DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`
	Data      datatypes.JSON `json:"data"`
	URL       string         `gorm:"index" json:"url"`
}

datatypes.JSON来自gorm.io/datatypes。它似乎就是json.RawMessage,它是一个[]byte

我使用PostgreSQL的JSONB类型来存储这些数据。

表格:

create table raw_json_data
(
    id         bigserial not null constraint raw_json_data_pke primary key,
    created_at timestamp with time zone,
    deleted_at timestamp with time zone,
    data       jsonb,
    url        text
);
英文:

I'm trying to store the JSON bytes to PostgreSQL, but there's a problem.

> \u0000 cannot be converted to text.

As you can see below, the JSON contains escape sequences such as \u0000, which it seems PostgreSQL is interpreting as unicode characters, not JSON strings.

err := raws.SaveRawData(data, url)
// if there is "\u0000" in the bytes
if err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {
	// try to remove \u0000, but not work
    data = bytes.Trim(data, "\u0000")
	e := raws.SaveRawData(data, url) // save data again
	if e != nil {
		return e // return the same error
	}
	return nil
}

Origin API data can be access form Here. There is \u0000 in it:

{
  "code": 0,
  "message": "0",
  "ttl": 1,
  "data": {
    "bvid": "BV1jb411C7m3",
    "aid": 42443484,
    "videos": 1,
    "tid": 172,
    "tname": "手机游戏",
    "copyright": 1,
    "pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",
    "title": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
    "pubdate": 1549100438,
    "ctime": 1549100438,
    "desc": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
    "desc_v2": [
      {
        "raw_text": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
        "type": 1,
        "biz_id": 0
      }
    ],
    "state": 0,
    "duration": 265,
    "rights": {
      "bp": 0,
      "elec": 0,
      "download": 1,
      "movie": 0,
      "pay": 0,
      "hd5": 0,
      "no_reprint": 1,
      "autoplay": 1,
      "ugc_pay": 0,
      "is_cooperation": 0,
      "ugc_pay_preview": 0,
      "no_background": 0,
      "clean_mode": 0,
      "is_stein_gate": 0
    },
    "owner": {
      "mid": 39699039,
      "name": "明眸-雅望",
      "face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"
    },
    "stat": {
      "aid": 42443484,
      "view": 1107,
      "danmaku": 7,
      "reply": 22,
      "favorite": 5,
      "coin": 4,
      "share": 0,
      "now_rank": 0,
      "his_rank": 0,
      "like": 10,
      "dislike": 0,
      "evaluation": "",
      "argue_msg": ""
    },
    "dynamic": "#崩坏3#",
    "cid": 74479750,
    "dimension": {
      "width": 1280,
      "height": 720,
      "rotate": 0
    },
    "no_cache": false,
    "pages": [
      {
        "cid": 74479750,
        "page": 1,
        "from": "vupload",
        "part": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
        "duration": 265,
        "vid": "",
        "weblink": "",
        "dimension": {
          "width": 1280,
          "height": 720,
          "rotate": 0
        }
      }
    ],
    "subtitle": {
      "allow_submit": false,
      "list": []
    },
    "user_garb": {
      "url_image_ani_cut": ""
    }
  }
}

The struct for save is:

type RawJSONData struct {
	ID        uint64         `gorm:"primarykey" json:"id"`
	CreatedAt time.Time      `json:"-"`
	DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`
	Data      datatypes.JSON `json:"data"`
	URL       string         `gorm:"index" json:"url"`
}

datatypes.JSON is from gorm.io/datatypes. It seems just is json.RawMessage, it is (extend from?) a []byte.

I use PostgreSQL's JSONB type for storage this data.

Table:

create table raw_json_data
(
    id         bigserial not null constraint raw_json_data_pke primary key,
    created_at timestamp with time zone,
    deleted_at timestamp with time zone,
    data       jsonb,
    url        text
);

答案1

得分: 5

Unicode转义序列\u0000在Postgres的TEXTJSONB列中是不支持的

> jsonb类型也拒绝\u0000(因为它无法表示为PostgreSQL的文本类型)

你可以将列类型更改为JSON

create table Foo (test JSON);
insert into Foo (test) values ('{"text": "明天再水\u0000绿塔的"}');
-- 可行

> json数据类型存储输入文本的精确副本

这样做的好处是保持数据与从API接收到的数据完全相同,以防转义序列具有需要保留的某种含义。

它还允许您使用Postgres的JSON运算符(例如->>)进行查询,尽管将带有\u0000的JSON字段转换为文本仍将失败:

select test->>'text' from Foo
-- ERROR:  unsupported Unicode escape sequence

<hr>

BYTEA类型的列也接受任何字节序列,而无需操作数据。在Gorm中,使用type:bytea标签:

type RawJSONData struct {
    // ... 其他字段
    Data      string `gorm:"type:bytea" json:"data"`
}

<hr>

如果上述任何方法都不可接受,请对输入字符串进行清理处理...

英文:

The Unicode escape sequence \u0000 is simply not supported in Postgres TEXT and JSONB columns:

> The jsonb type also rejects \u0000 (because that cannot be represented in PostgreSQL's text type)

You can change the column type to JSON:

create table Foo (test JSON);
insert into Foo (test) values (&#39;{&quot;text&quot;: &quot;明天再水\u0000绿塔的&quot;}&#39;);
-- works

> The json data type stores an exact copy of the input text

This has the advantage of keeping the data identical to what you received from the API, in case the escape sequence has some meaning that you need to preserve.

It'll also allow you to query using Postgres JSON operators (e.g. -&gt;&gt;), albeit converting a JSON field with \u0000 to text will still fail:

select test-&gt;&gt;&#39;text&#39; from Foo
-- ERROR:  unsupported Unicode escape sequence

<hr>

Columns of type BYTEA also accept any byte sequence without having to manipulate the data. In Gorm, use type:bytea tag:

type RawJSONData struct {
// ... other fields
Data      string `gorm:&quot;type:bytea&quot; json:&quot;data&quot;`
}

<hr>

If any of the above is not acceptable for you, then you must sanitize the input string...

huangapple
  • 本文由 发表于 2021年7月8日 22:14:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/68303357.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定