英文:
Special characters appearing in java code
问题
我正在获取一个JSON字符串作为响应并将其转换为JSON对象。
在上面的图像中,可以看到描述字符串周围有一个奇怪的?
字符。我在调试器中检查了一下,问题出在将JSON字符串转换为JsonObject后。所以有一个代码(mm是JSON字符串):
JsonObject con = getCon(mm);
private JsonObject getCon(String mm) {
var file = new String(mm.getBytes(), StandardCharsets.UTF_8);
return new GsonBuilder().create().fromJson(file, JsonObject.class).getAsJsonObject("dict").getAsJsonObject("con");
}
我将第一行转换为:
var file = new String(mm.getBytes("UTF-8"), StandardCharsets.UTF_8);
之后,描述字符串就变成了附带图像中的最后一行。这真的很令人困惑。不确定这里可能出了什么问题。JSON中的实际字符串是这样的:Post Approval - Completed, Post Approval - Pending
。
在JSON字符串中有很多描述属性,这仅发生在其中的一些属性上。我应该如何进一步调试这个问题?
英文:
I am fetching a JSON string as a response and converting it to a JSON object.
In the image above the description String can be seen to have a weird ?
character surrounded by a color. I checked in the debugger the issue is after converting a JSON string to a JsonObject. So there is a code (mm is the JSON string):
JsonObject con=getCon(mm)
private JsonObject getCon(String mm) {
var file=new String(mm.getBytes(),StandardCharsets.UTF_8);
return new GsonBuilder().create().fromJson(file,JsonObject.class).getAsJsonObject("dict").getAsJsonObject("con");
}
I converted the first line to var file=new String(mm.getBytes("UTF-8"),StandardCharsets.UTF_8);
After this, the description String becomes like the last line in the attached image. This is really confusing. Not sure what could be going wrong here. The actual String in JSON is like Post Approval - Completed, Post Approval - Pending
There are a lot of description attributes in the JSON string and this is happening only for a few of them. How can I debug this further?
答案1
得分: 0
Gson仅基于char
工作,例如以String
或Reader
的形式。因此,您遇到的任何编码问题很可能发生在调用Gson之前。
导致new String(mm.getBytes(),StandardCharsets.UTF_8);
引发编码问题的原因是String.getBytes()
使用您操作系统的平台默认字符集,这很可能不是UTF-8,甚至可能不支持所有Unicode字符。因此,将字节再次转换为UTF-8将产生不正确的结果。通常情况下,没有理由使用String.getBytes()
(不带Charset
参数);代码分析工具通常会将其标记为警告。也许Policeman's Forbidden API Checker对您有用,它可以检测到使用此类容易出错的方法。
您调整后的代码new String(mm.getBytes("UTF-8"),StandardCharsets.UTF_8)
实际上是一个无操作;您首先使用UTF-8将一个String
转换为byte[]
,然后再次反转。 (这可能唯一的影响是可能会替换不完整的代理对。)
要进一步调试此问题,您需要检查mm
的值来自何处以及在哪一点(如果有的话)它仍然具有正确的值。如果您是从文件中读取它,请确保指定正确的编码。可能它没有使用UTF-8;像VS Code和Notepad++这样的编辑器可以自动检测编码并显示它。如果该值来自HTTP响应,请验证您是否遵循服务器在Content-Type
头部中指定的字符集。尽管最新的JSON规范表示必须使用UTF-8,但也许服务器出于某种原因指定了不同的编码。
英文:
Gson works only based on char
s, for example in the form of a String
or from a Reader
. So any encoding issues you encounter most likely happen before Gson is called.
The reason why new String(mm.getBytes(),StandardCharsets.UTF_8);
is causing encoding issues is that String.getBytes()
uses the platform default charset of your OS, which most likely is not UTF-8, and might not even support all Unicode characters. So converting the bytes then again to UTF-8 will produce incorrect results. There is normally never a good reason to use String.getBytes()
(without Charset
parameter); code analysis tools also often flag this as warning. Maybe the Policeman's Forbidden API Checker could be useful for you, it detects usage of error-prone methods like this.
Your adjusted code new String(mm.getBytes("UTF-8"),StandardCharsets.UTF_8)
is effectively a no-op; you are first converting a String
to byte[]
using UTF-8 and then reverse this again. (The only effect this might have is that incomplete surrogate pairs are replaced.)
To debug this further you would have to check where the value of mm
is coming from and at which point (if any) it still has the correct value. If you are reading it from a file, make sure you specify the correct encoding. Possibly it is not using UTF-8; editors such as VS Code and Notepad++ can automatically detect the encoding and show it.
If the value comes from an HTTP response, verify that you are respecting the charset specified by the server in the Content-Type
header. While the latest JSON specification says UTF-8 must be used, maybe the server is specifying a different encoding for whatever reason.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论