英文:
How to insert JSON containing escape codes into a JSONB column in PostgreSQL using GORM
问题
我正在尝试将JSON字节存储到PostgreSQL中,但出现了问题。
> 无法将\u0000转换为文本。
如下所示,JSON包含诸如\u0000
之类的转义序列,似乎PostgreSQL将其解释为Unicode字符,而不是JSON字符串。
err := raws.SaveRawData(data, url)
// 如果字节中存在"\u0000"
if err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {
// 尝试移除\u0000,但不起作用
data = bytes.Trim(data, "\u0000")
e := raws.SaveRawData(data, url) // 再次保存数据
if e != nil {
return e // 返回相同的错误
}
return nil
}
原始API数据可以从这里访问。其中包含\u0000:
{
"code": 0,
"message": "0",
"ttl": 1,
"data": {
"bvid": "BV1jb411C7m3",
"aid": 42443484,
"videos": 1,
"tid": 172,
"tname": "手机游戏",
"copyright": 1,
"pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",
"title": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
"pubdate": 1549100438,
"ctime": 1549100438,
"desc": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
"desc_v2": [
{
"raw_text": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
"type": 1,
"biz_id": 0
}
],
"state": 0,
"duration": 265,
"rights": {
"bp": 0,
"elec": 0,
"download": 1,
"movie": 0,
"pay": 0,
"hd5": 0,
"no_reprint": 1,
"autoplay": 1,
"ugc_pay": 0,
"is_cooperation": 0,
"ugc_pay_preview": 0,
"no_background": 0,
"clean_mode": 0,
"is_stein_gate": 0
},
"owner": {
"mid": 39699039,
"name": "明眸-雅望",
"face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"
},
"stat": {
"aid": 42443484,
"view": 1107,
"danmaku": 7,
"reply": 22,
"favorite": 5,
"coin": 4,
"share": 0,
"now_rank": 0,
"his_rank": 0,
"like": 10,
"dislike": 0,
"evaluation": "",
"argue_msg": ""
},
"dynamic": "#崩坏3#",
"cid": 74479750,
"dimension": {
"width": 1280,
"height": 720,
"rotate": 0
},
"no_cache": false,
"pages": [
{
"cid": 74479750,
"page": 1,
"from": "vupload",
"part": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
"duration": 265,
"vid": "",
"weblink": "",
"dimension": {
"width": 1280,
"height": 720,
"rotate": 0
}
}
],
"subtitle": {
"allow_submit": false,
"list": []
},
"user_garb": {
"url_image_ani_cut": ""
}
}
}
用于保存的结构体是:
type RawJSONData struct {
ID uint64 `gorm:"primarykey" json:"id"`
CreatedAt time.Time `json:"-"`
DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`
Data datatypes.JSON `json:"data"`
URL string `gorm:"index" json:"url"`
}
datatypes.JSON
来自gorm.io/datatypes
。它似乎就是json.RawMessage
,它是一个[]byte
。
我使用PostgreSQL的JSONB
类型来存储这些数据。
表格:
create table raw_json_data
(
id bigserial not null constraint raw_json_data_pke primary key,
created_at timestamp with time zone,
deleted_at timestamp with time zone,
data jsonb,
url text
);
英文:
I'm trying to store the JSON bytes to PostgreSQL, but there's a problem.
> \u0000 cannot be converted to text.
As you can see below, the JSON contains escape sequences such as \u0000
, which it seems PostgreSQL is interpreting as unicode characters, not JSON strings.
err := raws.SaveRawData(data, url)
// if there is "\u0000" in the bytes
if err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {
// try to remove \u0000, but not work
data = bytes.Trim(data, "\u0000")
e := raws.SaveRawData(data, url) // save data again
if e != nil {
return e // return the same error
}
return nil
}
Origin API data can be access form Here. There is \u0000 in it:
{
"code": 0,
"message": "0",
"ttl": 1,
"data": {
"bvid": "BV1jb411C7m3",
"aid": 42443484,
"videos": 1,
"tid": 172,
"tname": "手机游戏",
"copyright": 1,
"pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",
"title": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
"pubdate": 1549100438,
"ctime": 1549100438,
"desc": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
"desc_v2": [
{
"raw_text": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
"type": 1,
"biz_id": 0
}
],
"state": 0,
"duration": 265,
"rights": {
"bp": 0,
"elec": 0,
"download": 1,
"movie": 0,
"pay": 0,
"hd5": 0,
"no_reprint": 1,
"autoplay": 1,
"ugc_pay": 0,
"is_cooperation": 0,
"ugc_pay_preview": 0,
"no_background": 0,
"clean_mode": 0,
"is_stein_gate": 0
},
"owner": {
"mid": 39699039,
"name": "明眸-雅望",
"face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"
},
"stat": {
"aid": 42443484,
"view": 1107,
"danmaku": 7,
"reply": 22,
"favorite": 5,
"coin": 4,
"share": 0,
"now_rank": 0,
"his_rank": 0,
"like": 10,
"dislike": 0,
"evaluation": "",
"argue_msg": ""
},
"dynamic": "#崩坏3#",
"cid": 74479750,
"dimension": {
"width": 1280,
"height": 720,
"rotate": 0
},
"no_cache": false,
"pages": [
{
"cid": 74479750,
"page": 1,
"from": "vupload",
"part": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
"duration": 265,
"vid": "",
"weblink": "",
"dimension": {
"width": 1280,
"height": 720,
"rotate": 0
}
}
],
"subtitle": {
"allow_submit": false,
"list": []
},
"user_garb": {
"url_image_ani_cut": ""
}
}
}
The struct for save is:
type RawJSONData struct {
ID uint64 `gorm:"primarykey" json:"id"`
CreatedAt time.Time `json:"-"`
DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`
Data datatypes.JSON `json:"data"`
URL string `gorm:"index" json:"url"`
}
datatypes.JSON
is from gorm.io/datatypes
. It seems just is json.RawMessage
, it is (extend from?) a []byte
.
I use PostgreSQL's JSONB
type for storage this data.
Table:
create table raw_json_data
(
id bigserial not null constraint raw_json_data_pke primary key,
created_at timestamp with time zone,
deleted_at timestamp with time zone,
data jsonb,
url text
);
答案1
得分: 5
Unicode转义序列\u0000
在Postgres的TEXT
和JSONB
列中是不支持的:
> jsonb类型也拒绝\u0000(因为它无法表示为PostgreSQL的文本类型)
你可以将列类型更改为JSON
:
create table Foo (test JSON);
insert into Foo (test) values ('{"text": "明天再水\u0000绿塔的"}');
-- 可行
> json数据类型存储输入文本的精确副本
这样做的好处是保持数据与从API接收到的数据完全相同,以防转义序列具有需要保留的某种含义。
它还允许您使用Postgres的JSON运算符(例如->>
)进行查询,尽管将带有\u0000
的JSON字段转换为文本仍将失败:
select test->>'text' from Foo
-- ERROR: unsupported Unicode escape sequence
<hr>
BYTEA
类型的列也接受任何字节序列,而无需操作数据。在Gorm中,使用type:bytea
标签:
type RawJSONData struct {
// ... 其他字段
Data string `gorm:"type:bytea" json:"data"`
}
<hr>
如果上述任何方法都不可接受,请对输入字符串进行清理处理...
英文:
The Unicode escape sequence \u0000
is simply not supported in Postgres TEXT
and JSONB
columns:
> The jsonb type also rejects \u0000 (because that cannot be represented in PostgreSQL's text type)
You can change the column type to JSON
:
create table Foo (test JSON);
insert into Foo (test) values ('{"text": "明天再水\u0000绿塔的"}');
-- works
> The json data type stores an exact copy of the input text
This has the advantage of keeping the data identical to what you received from the API, in case the escape sequence has some meaning that you need to preserve.
It'll also allow you to query using Postgres JSON operators (e.g. ->>
), albeit converting a JSON field with \u0000
to text will still fail:
select test->>'text' from Foo
-- ERROR: unsupported Unicode escape sequence
<hr>
Columns of type BYTEA
also accept any byte sequence without having to manipulate the data. In Gorm, use type:bytea
tag:
type RawJSONData struct {
// ... other fields
Data string `gorm:"type:bytea" json:"data"`
}
<hr>
If any of the above is not acceptable for you, then you must sanitize the input string...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论