Jsoup.Element.text()方法没有正确编码UTF-8。

huangapple go评论106阅读模式
英文:

Jsoup.Element.text() not correctly encoding utf-8

问题

我在Eclipse中进行我的项目,使用的是JDK 1.8。
最近,我的客户提出了一个新的要求,希望能够保存和检索阿拉伯文字。我已经在jdbc url中添加了useUnicode=true&characterEncoding=UTF-8
现在,保存数据的操作正常运行,并且我以UTF-8编码形式获取响应。一切正常。为此,我已经在所有的控制器中添加了

path = "/v2",consumes="application/json;charset=UTF-8", produces = "application/json;charset=UTF-8"

我有一个用于生成标签的API,在这个API中,我使用jsoup来编辑HTML模板。然后使用wkhtmltopdf库将其转换为PDF。如果我使用英语,这个函数可以正常工作,

org.jsoup.nodes.Document doc = Jsoup.parse(template, "UTF-8", "");
Element customerName = doc.getElementById("name");
customerName.text(orderAddress.getName());

但是如果orderAddress.getName()中包含阿拉伯文字,我得到的是?????。我尝试在控制台打印,结果也是一样的。logger.debug("Name:"+orderAddress.getName());
Eclipse已经启用了UTF-8。
我还尝试了这样的方法

customerName.text(new String(orderAddress.getName().getBytes(),"UTF-8"));
logger.debug("Name:"+new String(orderAddress.getName().getBytes(),"UTF-8"));

结果还是一样。
在我的单元测试中,我尝试了这样的方法 customerName.text("فاسيلة");,这个方法可以正常工作。并且生成的PDF完全符合我的需求。

我看过一些类似的问题,但是没有一个解决了我的问题。由于GET请求可以正常工作,我确信从数据库中检索数据没有问题。由于单元测试也可以正常工作,因此在该端的编码也没有问题。现在我在jsoup方面遗漏了一些东西。
我在尝试中漏掉了什么?如果有人知道,请帮帮我。

英文:

I am doing my project in eclipse, JDK 1.8.
My client recently add a new request to enable the saving and retrieval in Arabic letters too. I am have added useUnicode=true&characterEncoding=UTF-8
in jdbc url. Now saving the data works correctly and I am getting the response in UTF-8 encoded form. it is working fine. for that I have added

path = "/v2",consumes="application/json;charset=UTF-8", produces = "application/json;charset=UTF-8"

In all my controllers. I have an Api to generate labels in which I am using jsoup to edit html template. And then converting to pdf using wkhtmltopdf library. This fuction is working correctly if iam using english,

org.jsoup.nodes.Document doc = Jsoup.parse(template, "UTF-8", "");
Element customerName = doc.getElementById("name");
customerName.text(orderAddress.getName());

if orderAddress.getName()
is in Arabic I am getting ????? I just tried to print in console is also getting the same.logger.debug("Name:"+orderAddress.getName());
Eclipse is enabled to use utf-8.
I also tried to use like this

customerName.text(new String(orderAddress.getName().getBytes(),"UTF-8"));
logger.debug("Name:"+new String(orderAddress.getName().getBytes(),"UTF-8"));

also getting same.
in my unit testing I tried to use like this customerName.text("فاسيلة");working correctly. and generating pdf exactly what I needed.

I have seen few questions similar to this but not none of them solved my issue. Since GET is working fine , I am sure about retrieving data from DB is not an issue. Since unit testing working fine encoding in that end also working fine. Now i am missing something related to jsoup.
What is I am missing in my attempt?
Some one know please help me.

答案1

得分: 2

Change UTF-8 with ISO-8859-9

Jsoup.parse(template, "ISO-8859-9", "");

Most cases UTF-8 covers the language, but some language are not supported in UTF-8

notes on ISO-8859-9: https://en.wikipedia.org/wiki/ISO/IEC_8859-9

英文:

Change UTF-8 with ISO-8859-9

 Jsoup.parse(template, "ISO-8859-9", "");

Most cases UTF-8 covers the language, but some language are not supported in UTF-8

notes on ISO-8859-9: https://en.wikipedia.org/wiki/ISO/IEC_8859-9

答案2

得分: 2

我已经在将字符串写入输出时使用了UTF-8,并解决了问题。

FileUtils.writeStringToFile(tempHTML, doc.outerHtml(), "UTF-8");

无需将编码更改为"ISO-8859-9"
保持为Jsoup.parse(template, "UTF-8", "");

英文:

I have used used UTF-8 wile writing the string to outpuut and solved

FileUtils.writeStringToFile(tempHTML, doc.outerHtml(), "UTF-8");

No need to change the encoding to "ISO-8859-9"
keep as Jsoup.parse(template, "UTF-8", "");

huangapple
  • 本文由 发表于 2020年10月27日 14:06:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/64549061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定