英文:
Tcl, database insertion mysqlexec/db server: Incorrect string value:
问题
I have a weird issue for several years now. Here's the thing.
我有一个奇怪的问题已经持续了好几年。以下是问题描述:
I run Rocky Linux (happens also in CentOS), running Apache 2.4.53 with MariaDB (mysql Ver 8.0.30 for Linux on x86_64 (Source distribution).
我运行Rocky Linux(在CentOS中也发生),使用Apache 2.4.53和MariaDB(mysql版本为8.0.30,运行在x86_64的Linux上,源发行版)。
I have a Tcl script which executes a "curl" to retrieve data from another site. It comes in JSON format which I then parse (using the JSON package). I then insert data into a database, such as:
我有一个Tcl脚本,执行"curl"以从另一个网站检索数据。它以JSON格式返回,我随后解析它(使用JSON包)。然后,我将数据插入到数据库中,例如:
insert into table set name='Mário Flores';
插入到表格中,设置名称为'Mário Flores'。
As you can see there is a UTF-8 character (á). I have the database in utf8mb4 charset, everything is correctly set, the locale in the system is "en_US.UTF-8".
正如你所看到的,存在一个UTF-8字符(á)。我的数据库使用utf8mb4字符集,一切都正确设置,系统的区域设置为"en_US.UTF-8"。
Now... if I have the script run on my Linux box, there are no issues.
如果我在我的Linux系统上运行脚本,没有问题。
If I use my website, I click on a button which does a POST to my webserver (index.cgi) and I get an error:
但如果我使用我的网站,点击一个按钮执行POST请求到我的Web服务器(index.cgi),我会收到一个错误:
Error: mysqlexec/db server: Incorrect string value: '\xE1rio...' for column 'name' at row 1
错误:mysqlexec/db服务器:对于列'name',第1行的字符串值不正确:'\xE1rio...'。
And that will then run the "curl" to get the data, parse the JSON and insert into the database. The code is the same, called the same way.
然后,将运行"curl"以获取数据,解析JSON并插入到数据库中。代码是相同的,以相同的方式调用。
What could be the issue here? I can only solve the problem if, when run by the web, I do:
这里可能出现了什么问题?我只能在通过Web运行时解决这个问题,如果我执行以下操作:
set name [encoding convertto utf-8 $name]
将名称设置为[encoding convertto utf-8 $name]。
And then insert into the DB.
然后插入到数据库中。
Tried both in Linux or via the web, with different results. Expected everything being already UTF-8 compatible and no conversion needed.
无论是在Linux上还是通过Web尝试,结果都不同。预期一切都已经是UTF-8兼容的,不需要转换。
英文:
I have a weird issue for several years now. Here's the thing.
I run Rocky Linux (happens also in CentOS), running Apache 2.4.53 wiith MariaDB (mysql Ver 8.0.30 for Linux on x86_64 (Source distribution)
I have a Tcl script which executes a "curl" to retrieve data from another site. It comes in JSON format which I then parse (using the JSON package). I then insert data into a database, such as:
insert into table set name='Mário Flores';
As you can see there is an UTF-8 character (á). I have the database in utf8mb4 charset, everything is correctly set, the locale in the system is "en_US.UTF-8".
Now... if I have the script run in my Linux box, there are no issues.
If I use my website, I click on a button which does a POST to my webserver (index.cgi) and I get an error:
Error: mysqlexec/db server: Incorrect string value: '\xE1rio...' for column 'name' at row 1
and that will then run the "curl" to get the data, parse the JSON and insert into the database. The code is the same, called the same way.
What could be the issue here? I can only solve the problem if, when run by web I do:
set name [encoding convertto utf-8 $name]
And then insert into the DB.
Tried both in Linux or via web, with different results. Expected everything being already UTF-8 compatible and no conversion needed
答案1
得分: 1
\xE1
声音像 latin1,绝对不是 utf8。连接后,设置 客户端 的字符编码。或者,在连接后使用 SET NAMES latin1;
。
E1 是 cp1250、dec8、latin1、latin2、latin5 中的 á 的十六进制表示。
C3A1 是 utf8 / utf8mb4 中的下一个。
至于“数据库中的数据是否是...”
- 在 数据库 中使用 utf8mb4 允许表示世界上所有字符集,包括表情符号。
- 在正确配置的情况下,MySQL 可以在插入/选择时进行 UTF-8 的转换。目标字符集(在客户端中)可以是任何编码。Latin1 是常见的;它除了普通的 ASCII 字母、数字和简单标点符号外,还包括约 120 个额外字符(带重音的字母和常见符号)。
列定义控制了在数据库中存储的内容。
连接参数指定了客户端的字符集。
英文:
\xE1
sounds like latin1, definitely not utf8. Then connecting, set the charset encoding of the client. Alternatively, use SET NAMES latin1;
after connecting.
E1 is the hex for á in any of these: cp1250, dec8, latin1, latin2, latin5.
C3A1 is the next in utf8 / utf8mb4.
As to "whether the data in the DB is..."...
- Using utf8mb4 in the database allows all character sets of the world, including Emoji, to be represented.
- With the correct configuration, MySQL is happy to convert to/from UTF-8 when INSERTing/SELECTing. The target charset (in the client) can be essentially any encoding. Latin1 is common; it has about 120 extra characters (accented letters and common symbols) in addition to ordinary ASCII letters, digits, and simple punctuation.
The column definitions control what is stored in the database.
The connection parameters specify what the client's charset is.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论