如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

huangapple go评论77阅读模式
英文:

How to persist text containing html emoji in mysql through hibernate

问题

我正在使用它们的HTML代码(示例:👆)通过JavaScript函数将表情符号添加到contenteditable的HTML元素中(该元素显示表情符号)。

然后,我通过以下方式获取元素的内容:

var content = el.innerText;

我将内容添加到JSON中:

var myjson = {"content" : content};

我使用content-type为application/json通过ajax发送JSON数据。

在后端,我使用Gson将JSON转换为POJO,然后使用Hibernate将其持久化到数据库中。

在Servlet中,使用System.out.println打印内容时,数据库中的内容显示为👆,选择查询将其显示为👆(矩形内有小数字)。

我花了很多时间搜索,将数据库中的字符集设置为utf8mb4,还有Hibernate配置文件中的设置,类似以下链接中所述:

我对此有些困惑。我的需求是持久化包含表情符号的文本,并能够检索它(通过Hibernate或简单的JDBC),然后在网页中显示它。

哪种方法是最佳的?

英文:

I am adding the emojis using their html codes (example: & # 1 2 8 0 7 0 ; ) using a javascript function in a contenteditable html element which displays the emoji (example: 👆)

I then retrieve the content of the element with

var content = el.innerText;

I add the content in a json

var myjson = {"content" : content};

I send the json through ajax with content-type being

application/json

In, the backend i convert the json to a pojo using Gson, then persist it in the database using hibernate

In the servlet, using system.out.println, the content displays as 👆 in the db, a select query displays it as 👆 (with small numbers inside the rectangles)

I spent hours searching, setting the character set as utf8mb4 in the DB as well as the hibernate config file...example as described in the following links

https://stackoverflow.com/questions/46140583/caused-by-java-sql-sqlexception-unsupported-character-encoding-utf8mb4

https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database

I'm a bit lost with it. My need is to persist a text containing emoji and be able to retrieve it(through hibernate or simple jdbc) and display it in a webpage.

What would be the best approach?

答案1

得分: 1

在数据库层面尝试以下操作:

修改数据库、表格和列

将数据库、表格和列的字符集和排序规则属性从utf8更改为utf8mb4。

对于每个数据库:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

对于每个表格:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

对于每个列:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

检查列和索引键的最大长度

show VARIABLES like 'ver%'
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====结果应为==== character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8 character_set_system utf8 collation_connection utf8mb4_general_ci collation_database utf8_general_ci collation_server utf8_general_ci

修复和优化所有表格
对于每个表格

REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases

在数据库层面尝试这些操作。你的问题可能与数据库有关。

英文:

Try this at database level:

Modify databases, tables, and columns
Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.

For each database:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

For each table:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each column:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Check the maximum length of columns and index keys

show VARIABLES like 'ver%'

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====result should be====
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8
character_set_system utf8
collation_connection utf8mb4_general_ci
collation_database utf8_general_ci
collation_server utf8_general_ci

Repair and optimize all tables
For each table

REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases

Try this at database level. Your problem might be linked to a database.

答案2

得分: 0

以下是翻译好的内容:

这里,我因为过去有过一段时间没有使用过的Java-Script而感到困扰…… 在这个页面上玩一下,看看Emoji's是如何表示的。

CodePoint不是表示上的改变,而是解释**上的改变…… 在一个单一的Stack Overflow回答中根本没有足够的空间来解释转义序列和字符表示的所有内容…… 这将突出显示从HTML页面检索Emoji时的主要问题。

<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
    // 打印 EMOJI ITSELF... (我无法在这里输入)
    var s = document.getElementById("thediv").innerHTML;
    alert(s);

    // 打印字符: � � 
    //
    // 注意:UNICODE 字符 (55357 和 56390) 是不可打印的
    //       因此会打印问号 (�)
    var s2 = " ";
    for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
    alert(s2);
    
    // 打印字符:55357 56390 
    //
    // 注意:这肯定是一个介于 1 和 65,535 之间的数字
    //       这是 2^16 - UNICODE 是一个 16 位空间。
    var s3 = " ";
    for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
    alert(s3);
    
    // 打印字符:128070 56390 
    //
    // 注意:在这里,第二个数字是不相关的... 它被“合并”到第一个数字中。
    //       如前所述,连续解释两个字符,使用 CODE-POINT 意味着,第二个数字和第一个数字被“一起处理”。
    // 第二个数字称为“高代理项”
    var s4 = " ";
    for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
    alert(s4);

    // 打印字符串:“👆”
    //
    // 你可能应该将这个发送到你的数据库...
    var s5 = "&#" + s.codePointAt(0) + ";";
    alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>

<H1>分隔线</H1>
<DIV ID="thediv">👆</DIV>
<BR /><BR /><BUTTON onclick="show();">点击我</BUTTON>
</BODY>
</HTML>

上述代码将触发以下 Java-Script 提示:

document.getElementById("thediv").innerHTML;
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charCodeAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.codePointAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

"&#" + s.codePointAt(0) + ";";
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

英文:

Here, i got flustered because of the Java-Script which I have gotten away from in the past while... Play with this page, to see how Emoji's are represented.

CodePoint is not a change in representation, it is a change in interpretation... There is simply not enough space to explain eveything about escape sequences and character representation in a single Stack Oveflow answer... This is going to HiLite the major issues when retrieving an Emoji from an HTML Page.

<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
    // Prints the EMOJI ITSELF... (I can't enter that here)
    var s = document.getElementById("thediv").innerHTML;
    alert(s);

    // Prints characters:  � � 
    //
    // NOTE: The UNICODE characters (55357 and 56390) are unprintable
    //       Therefore the question mark (�) is printed
    var s2 = " ";
    for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
    alert(s2);
    
    // Prints Characters: 55357 56390 
    //
    // NOTE: This is guaranteed to be a number between 1 and 65,535
    //       which 2^16 - and UNICODE is a 16 bit space.
    var s3 = " ";
    for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
    alert(s3);
    
    // Prints Characters: 128070 56390 
    //
    // NOTE: here, the SECOND NUMBER is IRRELEVANT... It is "incorporated" in the first..
    //       As mentioned, interpreting two characters in a row using
    //       CODE-POINT means, the second number and the first are "Chomped" together.
    // The second number is called the "High Surrogate"
    var s4 = " ";
    for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
    alert(s4);

    // Prints the String "👆"
    //
    // YOU SHOULD PROBABLY BE SENDING THIS TO YOUR DATABASE...
    var s5 = "&#" + s.codePointAt(0) + ";";
    alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>

<H1>Divider</H1>
<DIV ID="thediv">👆</DIV>
<BR /><BR /><BUTTON onclick="show();">Click Me</BUTTON>
</BODY>
</HTML>

The above code will send the following java-script alerts:

document.getElementById("thediv").innerHTML;
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charCodeAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.codePointAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

"&#" + s.codePointAt(0) + ";";
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

huangapple
  • 本文由 发表于 2020年10月6日 20:40:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/64226004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定