如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

huangapple go评论107阅读模式
英文:

How to persist text containing html emoji in mysql through hibernate

问题

我正在使用它们的HTML代码(示例:👆)通过JavaScript函数将表情符号添加到contenteditable的HTML元素中(该元素显示表情符号)。

然后,我通过以下方式获取元素的内容:

  1. var content = el.innerText;

我将内容添加到JSON中:

  1. var myjson = {"content" : content};

我使用content-type为application/json通过ajax发送JSON数据。

在后端,我使用Gson将JSON转换为POJO,然后使用Hibernate将其持久化到数据库中。

在Servlet中,使用System.out.println打印内容时,数据库中的内容显示为👆,选择查询将其显示为👆(矩形内有小数字)。

我花了很多时间搜索,将数据库中的字符集设置为utf8mb4,还有Hibernate配置文件中的设置,类似以下链接中所述:

我对此有些困惑。我的需求是持久化包含表情符号的文本,并能够检索它(通过Hibernate或简单的JDBC),然后在网页中显示它。

哪种方法是最佳的?

英文:

I am adding the emojis using their html codes (example: & # 1 2 8 0 7 0 ; ) using a javascript function in a contenteditable html element which displays the emoji (example: 👆)

I then retrieve the content of the element with

  1. var content = el.innerText;

I add the content in a json

  1. var myjson = {"content" : content};

I send the json through ajax with content-type being

  1. application/json

In, the backend i convert the json to a pojo using Gson, then persist it in the database using hibernate

In the servlet, using system.out.println, the content displays as 👆 in the db, a select query displays it as 👆 (with small numbers inside the rectangles)

I spent hours searching, setting the character set as utf8mb4 in the DB as well as the hibernate config file...example as described in the following links

https://stackoverflow.com/questions/46140583/caused-by-java-sql-sqlexception-unsupported-character-encoding-utf8mb4

https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database

I'm a bit lost with it. My need is to persist a text containing emoji and be able to retrieve it(through hibernate or simple jdbc) and display it in a webpage.

What would be the best approach?

答案1

得分: 1

在数据库层面尝试以下操作:

修改数据库、表格和列

将数据库、表格和列的字符集和排序规则属性从utf8更改为utf8mb4。

对于每个数据库:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

对于每个表格:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

对于每个列:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

检查列和索引键的最大长度

show VARIABLES like 'ver%'
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====结果应为==== character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8 character_set_system utf8 collation_connection utf8mb4_general_ci collation_database utf8_general_ci collation_server utf8_general_ci

修复和优化所有表格
对于每个表格

REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases

在数据库层面尝试这些操作。你的问题可能与数据库有关。

英文:

Try this at database level:

Modify databases, tables, and columns
Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.

For each database:

  1. ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

For each table:

  1. ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each column:

  1. ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Check the maximum length of columns and index keys

  1. show VARIABLES like 'ver%'

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====result should be====
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8
character_set_system utf8
collation_connection utf8mb4_general_ci
collation_database utf8_general_ci
collation_server utf8_general_ci

Repair and optimize all tables
For each table

  1. REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p auto-repair optimize all-databases

Try this at database level. Your problem might be linked to a database.

答案2

得分: 0

以下是翻译好的内容:

这里,我因为过去有过一段时间没有使用过的Java-Script而感到困扰…… 在这个页面上玩一下,看看Emoji's是如何表示的。

CodePoint不是表示上的改变,而是解释**上的改变…… 在一个单一的Stack Overflow回答中根本没有足够的空间来解释转义序列和字符表示的所有内容…… 这将突出显示从HTML页面检索Emoji时的主要问题。

  1. <HTML>
  2. <HEAD>
  3. <TITLE>Emoji</TITLE>
  4. <SCRIPT TYPE='text/javascript'>
  5. function show()
  6. {
  7. // 打印 EMOJI ITSELF... (我无法在这里输入)
  8. var s = document.getElementById("thediv").innerHTML;
  9. alert(s);
  10. // 打印字符: � �
  11. //
  12. // 注意:UNICODE 字符 (55357 和 56390) 是不可打印的
  13. // 因此会打印问号 (�)
  14. var s2 = " ";
  15. for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
  16. alert(s2);
  17. // 打印字符:55357 56390
  18. //
  19. // 注意:这肯定是一个介于 1 和 65,535 之间的数字
  20. // 这是 2^16 - UNICODE 是一个 16 位空间。
  21. var s3 = " ";
  22. for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
  23. alert(s3);
  24. // 打印字符:128070 56390
  25. //
  26. // 注意:在这里,第二个数字是不相关的... 它被“合并”到第一个数字中。
  27. // 如前所述,连续解释两个字符,使用 CODE-POINT 意味着,第二个数字和第一个数字被“一起处理”。
  28. // 第二个数字称为“高代理项”
  29. var s4 = " ";
  30. for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
  31. alert(s4);
  32. // 打印字符串:“👆”
  33. //
  34. // 你可能应该将这个发送到你的数据库...
  35. var s5 = "&#" + s.codePointAt(0) + ";";
  36. alert(s5);
  37. }
  38. </SCRIPT>
  39. </HEAD>
  40. <BODY>
  41. <H1>分隔线</H1>
  42. <DIV ID="thediv">👆</DIV>
  43. <BR /><BR /><BUTTON onclick="show();">点击我</BUTTON>
  44. </BODY>
  45. </HTML>

上述代码将触发以下 Java-Script 提示:

document.getElementById("thediv").innerHTML;
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charCodeAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.codePointAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

"&#" + s.codePointAt(0) + ";";
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

英文:

Here, i got flustered because of the Java-Script which I have gotten away from in the past while... Play with this page, to see how Emoji's are represented.

CodePoint is not a change in representation, it is a change in interpretation... There is simply not enough space to explain eveything about escape sequences and character representation in a single Stack Oveflow answer... This is going to HiLite the major issues when retrieving an Emoji from an HTML Page.

  1. <HTML>
  2. <HEAD>
  3. <TITLE>Emoji</TITLE>
  4. <SCRIPT TYPE='text/javascript'>
  5. function show()
  6. {
  7. // Prints the EMOJI ITSELF... (I can't enter that here)
  8. var s = document.getElementById("thediv").innerHTML;
  9. alert(s);
  10. // Prints characters: � �
  11. //
  12. // NOTE: The UNICODE characters (55357 and 56390) are unprintable
  13. // Therefore the question mark (�) is printed
  14. var s2 = " ";
  15. for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
  16. alert(s2);
  17. // Prints Characters: 55357 56390
  18. //
  19. // NOTE: This is guaranteed to be a number between 1 and 65,535
  20. // which 2^16 - and UNICODE is a 16 bit space.
  21. var s3 = " ";
  22. for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
  23. alert(s3);
  24. // Prints Characters: 128070 56390
  25. //
  26. // NOTE: here, the SECOND NUMBER is IRRELEVANT... It is "incorporated" in the first..
  27. // As mentioned, interpreting two characters in a row using
  28. // CODE-POINT means, the second number and the first are "Chomped" together.
  29. // The second number is called the "High Surrogate"
  30. var s4 = " ";
  31. for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
  32. alert(s4);
  33. // Prints the String "👆"
  34. //
  35. // YOU SHOULD PROBABLY BE SENDING THIS TO YOUR DATABASE...
  36. var s5 = "&#" + s.codePointAt(0) + ";";
  37. alert(s5);
  38. }
  39. </SCRIPT>
  40. </HEAD>
  41. <BODY>
  42. <H1>Divider</H1>
  43. <DIV ID="thediv">👆</DIV>
  44. <BR /><BR /><BUTTON onclick="show();">Click Me</BUTTON>
  45. </BODY>
  46. </HTML>

The above code will send the following java-script alerts:

document.getElementById("thediv").innerHTML;
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.charCodeAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

s.codePointAt(i)
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

"&#" + s.codePointAt(0) + ";";
如何通过Hibernate将包含HTML表情的文本持久化到MySQL。

huangapple
  • 本文由 发表于 2020年10月6日 20:40:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/64226004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定