英文:
How to persist text containing html emoji in mysql through hibernate
问题
我正在使用它们的HTML代码(示例:👆)通过JavaScript函数将表情符号添加到contenteditable的HTML元素中(该元素显示表情符号)。
然后,我通过以下方式获取元素的内容:
var content = el.innerText;
我将内容添加到JSON中:
var myjson = {"content" : content};
我使用content-type为application/json
通过ajax发送JSON数据。
在后端,我使用Gson将JSON转换为POJO,然后使用Hibernate将其持久化到数据库中。
在Servlet中,使用System.out.println
打印内容时,数据库中的内容显示为ð,选择查询将其显示为ð(矩形内有小数字)。
我花了很多时间搜索,将数据库中的字符集设置为utf8mb4,还有Hibernate配置文件中的设置,类似以下链接中所述:
- https://stackoverflow.com/questions/46140583/caused-by-java-sql-sqlexception-unsupported-character-encoding-utf8mb4
- https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database
我对此有些困惑。我的需求是持久化包含表情符号的文本,并能够检索它(通过Hibernate或简单的JDBC),然后在网页中显示它。
哪种方法是最佳的?
英文:
I am adding the emojis using their html codes (example: & # 1 2 8 0 7 0 ; ) using a javascript function in a contenteditable html element which displays the emoji (example: 👆)
I then retrieve the content of the element with
var content = el.innerText;
I add the content in a json
var myjson = {"content" : content};
I send the json through ajax with content-type being
application/json
In, the backend i convert the json to a pojo using Gson, then persist it in the database using hibernate
In the servlet, using system.out.println, the content displays as ð in the db, a select query displays it as ð (with small numbers inside the rectangles)
I spent hours searching, setting the character set as utf8mb4 in the DB as well as the hibernate config file...example as described in the following links
https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database
I'm a bit lost with it. My need is to persist a text containing emoji and be able to retrieve it(through hibernate or simple jdbc) and display it in a webpage.
What would be the best approach?
答案1
得分: 1
在数据库层面尝试以下操作:
修改数据库、表格和列
将数据库、表格和列的字符集和排序规则属性从utf8更改为utf8mb4。
对于每个数据库:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
对于每个表格:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
对于每个列:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
检查列和索引键的最大长度
show VARIABLES like 'ver%'
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
====结果应为==== character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8 character_set_system utf8 collation_connection utf8mb4_general_ci collation_database utf8_general_ci collation_server utf8_general_ci
修复和优化所有表格
对于每个表格
REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases
在数据库层面尝试这些操作。你的问题可能与数据库有关。
英文:
Try this at database level:
Modify databases, tables, and columns
Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.
For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Check the maximum length of columns and index keys
show VARIABLES like 'ver%'
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
====result should be====
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8
character_set_system utf8
collation_connection utf8mb4_general_ci
collation_database utf8_general_ci
collation_server utf8_general_ci
Repair and optimize all tables
For each table
REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases
Try this at database level. Your problem might be linked to a database.
答案2
得分: 0
以下是翻译好的内容:
这里,我因为过去有过一段时间没有使用过的Java-Script而感到困扰…… 在这个页面上玩一下,看看Emoji's
是如何表示的。
CodePoint
不是表示上的改变,而是解释**上的改变…… 在一个单一的Stack Overflow回答中根本没有足够的空间来解释转义序列和字符表示的所有内容…… 这将突出显示从HTML页面检索Emoji
时的主要问题。
<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
// 打印 EMOJI ITSELF... (我无法在这里输入)
var s = document.getElementById("thediv").innerHTML;
alert(s);
// 打印字符: � �
//
// 注意:UNICODE 字符 (55357 和 56390) 是不可打印的
// 因此会打印问号 (�)
var s2 = " ";
for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
alert(s2);
// 打印字符:55357 56390
//
// 注意:这肯定是一个介于 1 和 65,535 之间的数字
// 这是 2^16 - UNICODE 是一个 16 位空间。
var s3 = " ";
for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
alert(s3);
// 打印字符:128070 56390
//
// 注意:在这里,第二个数字是不相关的... 它被“合并”到第一个数字中。
// 如前所述,连续解释两个字符,使用 CODE-POINT 意味着,第二个数字和第一个数字被“一起处理”。
// 第二个数字称为“高代理项”
var s4 = " ";
for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
alert(s4);
// 打印字符串:“&#128070;”
//
// 你可能应该将这个发送到你的数据库...
var s5 = "&#" + s.codePointAt(0) + ";";
alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>
<H1>分隔线</H1>
<DIV ID="thediv">&#128070;</DIV>
<BR /><BR /><BUTTON onclick="show();">点击我</BUTTON>
</BODY>
</HTML>
上述代码将触发以下 Java-Script 提示:
document.getElementById("thediv").innerHTML;
"&#" + s.codePointAt(0) + ";";
英文:
Here, i got flustered because of the Java-Script which I have gotten away from in the past while... Play with this page, to see how Emoji's
are represented.
CodePoint
is not a change in representation, it is a change in interpretation... There is simply not enough space to explain eveything about escape sequences and character representation in a single Stack Oveflow answer... This is going to HiLite the major issues when retrieving an Emoji
from an HTML Page.
<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
// Prints the EMOJI ITSELF... (I can't enter that here)
var s = document.getElementById("thediv").innerHTML;
alert(s);
// Prints characters: � �
//
// NOTE: The UNICODE characters (55357 and 56390) are unprintable
// Therefore the question mark (�) is printed
var s2 = " ";
for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
alert(s2);
// Prints Characters: 55357 56390
//
// NOTE: This is guaranteed to be a number between 1 and 65,535
// which 2^16 - and UNICODE is a 16 bit space.
var s3 = " ";
for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
alert(s3);
// Prints Characters: 128070 56390
//
// NOTE: here, the SECOND NUMBER is IRRELEVANT... It is "incorporated" in the first..
// As mentioned, interpreting two characters in a row using
// CODE-POINT means, the second number and the first are "Chomped" together.
// The second number is called the "High Surrogate"
var s4 = " ";
for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
alert(s4);
// Prints the String "&#128070;"
//
// YOU SHOULD PROBABLY BE SENDING THIS TO YOUR DATABASE...
var s5 = "&#" + s.codePointAt(0) + ";";
alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>
<H1>Divider</H1>
<DIV ID="thediv">&#128070;</DIV>
<BR /><BR /><BUTTON onclick="show();">Click Me</BUTTON>
</BODY>
</HTML>
The above code will send the following java-script alerts:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论