英文:
Jsoup.Element.text() not correctly encoding utf-8
问题
我在Eclipse中进行我的项目,使用的是JDK 1.8。
最近,我的客户提出了一个新的要求,希望能够保存和检索阿拉伯文字。我已经在jdbc url中添加了useUnicode=true&characterEncoding=UTF-8
。
现在,保存数据的操作正常运行,并且我以UTF-8编码形式获取响应。一切正常。为此,我已经在所有的控制器中添加了
path = "/v2",consumes="application/json;charset=UTF-8", produces = "application/json;charset=UTF-8"
我有一个用于生成标签的API,在这个API中,我使用jsoup来编辑HTML模板。然后使用wkhtmltopdf库将其转换为PDF。如果我使用英语,这个函数可以正常工作,
org.jsoup.nodes.Document doc = Jsoup.parse(template, "UTF-8", "");
Element customerName = doc.getElementById("name");
customerName.text(orderAddress.getName());
但是如果orderAddress.getName()
中包含阿拉伯文字,我得到的是?????
。我尝试在控制台打印,结果也是一样的。logger.debug("Name:"+orderAddress.getName());
Eclipse已经启用了UTF-8。
我还尝试了这样的方法
customerName.text(new String(orderAddress.getName().getBytes(),"UTF-8"));
logger.debug("Name:"+new String(orderAddress.getName().getBytes(),"UTF-8"));
结果还是一样。
在我的单元测试中,我尝试了这样的方法 customerName.text("فاسيلة");
,这个方法可以正常工作。并且生成的PDF完全符合我的需求。
我看过一些类似的问题,但是没有一个解决了我的问题。由于GET请求可以正常工作,我确信从数据库中检索数据没有问题。由于单元测试也可以正常工作,因此在该端的编码也没有问题。现在我在jsoup方面遗漏了一些东西。
我在尝试中漏掉了什么?如果有人知道,请帮帮我。
英文:
I am doing my project in eclipse, JDK 1.8.
My client recently add a new request to enable the saving and retrieval in Arabic letters too. I am have added useUnicode=true&characterEncoding=UTF-8
in jdbc url. Now saving the data works correctly and I am getting the response in UTF-8 encoded form. it is working fine. for that I have added
path = "/v2",consumes="application/json;charset=UTF-8", produces = "application/json;charset=UTF-8"
In all my controllers. I have an Api to generate labels in which I am using jsoup to edit html template. And then converting to pdf using wkhtmltopdf library. This fuction is working correctly if iam using english,
org.jsoup.nodes.Document doc = Jsoup.parse(template, "UTF-8", "");
Element customerName = doc.getElementById("name");
customerName.text(orderAddress.getName());
if orderAddress.getName()
is in Arabic I am getting ?????
I just tried to print in console is also getting the same.logger.debug("Name:"+orderAddress.getName());
Eclipse is enabled to use utf-8.
I also tried to use like this
customerName.text(new String(orderAddress.getName().getBytes(),"UTF-8"));
logger.debug("Name:"+new String(orderAddress.getName().getBytes(),"UTF-8"));
also getting same.
in my unit testing I tried to use like this customerName.text("فاسيلة");
working correctly. and generating pdf exactly what I needed.
I have seen few questions similar to this but not none of them solved my issue. Since GET is working fine , I am sure about retrieving data from DB is not an issue. Since unit testing working fine encoding in that end also working fine. Now i am missing something related to jsoup.
What is I am missing in my attempt?
Some one know please help me.
答案1
得分: 2
Change UTF-8 with ISO-8859-9
Jsoup.parse(template, "ISO-8859-9", "");
Most cases UTF-8 covers the language, but some language are not supported in UTF-8
notes on ISO-8859-9: https://en.wikipedia.org/wiki/ISO/IEC_8859-9
英文:
Change UTF-8 with ISO-8859-9
Jsoup.parse(template, "ISO-8859-9", "");
Most cases UTF-8 covers the language, but some language are not supported in UTF-8
notes on ISO-8859-9: https://en.wikipedia.org/wiki/ISO/IEC_8859-9
答案2
得分: 2
我已经在将字符串写入输出时使用了UTF-8,并解决了问题。
FileUtils.writeStringToFile(tempHTML, doc.outerHtml(), "UTF-8");
无需将编码更改为"ISO-8859-9"
,
保持为Jsoup.parse(template, "UTF-8", "");
。
英文:
I have used used UTF-8 wile writing the string to outpuut and solved
FileUtils.writeStringToFile(tempHTML, doc.outerHtml(), "UTF-8");
No need to change the encoding to "ISO-8859-9"
keep as Jsoup.parse(template, "UTF-8", "");
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论