问题与JAXB和UTF-16编码有关。

huangapple go评论49阅读模式
英文:

Problems with JAXB and UTF-16 encoding

问题

我有一个小的应用程序,它从一个xml文件中读取内容并将其放入相应的Java对象中。

以下是XML内容:

<?xml version="1.0" encoding="UTF-16"?>
<Marker>
<TimePosition>2700</TimePosition>
<SamplePosition>119070</SamplePosition>
</Marker>

以下是相应的Java代码:

JAXBContext jaxbContext = JAXBContext.newInstance(MarkerDto.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
InputStream inputStream = new FileInputStream("D:/marker.xml");
Reader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_16.toString());
MarkerDto markerDto = (MarkerDto) jaxbUnmarshaller.unmarshal(reader);

如果我运行这段代码,我会得到一个"Content is not allowed in prolog."异常。如果我使用UTF-8运行相同的代码,一切正常。有人知道可能是什么问题吗?

英文:

Hi I have a small APP that reads content from an xml file and put it into a corresponding Java object.

Here is the XML:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-16&quot;?&gt;
&lt;Marker&gt;
&lt;TimePosition&gt;2700&lt;/TimePosition&gt;
&lt;SamplePosition&gt;119070&lt;/SamplePosition&gt;
&lt;/Marker&gt;

here is the corresponding Java code:

JAXBContext jaxbContext = JAXBContext.newInstance(MarkerDto.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
InputStream inputStream = new FileInputStream(&quot;D:/marker.xml&quot;);
Reader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_16.toString());
MarkerDto markerDto = (MarkerDto) jaxbUnmarshaller.unmarshal(reader);

If I run this code I get an "Content is not allowed in prolog." exception. If I run the same with UTF-8 everything works fine. Does anyone have a clue what might be the problem?

答案1

得分: 3

  1. 当构造InputStreamReader时,不需要调用toString()方法来获取Charset,因为该类有一个接受Charset参数的构造函数,所以只需删除.toString()部分:

    Reader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_16);
    

    这只是一个小问题,对功能没有影响。

  2. 根本不需要构造一个Reader!XML 是一种在编码方面是自我描述的格式:有效的 XML 文件可以在不事先知道编码的情况下解析。所以,不需要创建Reader,只需直接将InputStream传递给处理 XML 的代码。删除创建Reader的那一行,并将下一行更改为:

    MarkerDto markerDto = (MarkerDto) jaxbUnmarshaller.unmarshal(inputStream);
    

    可能会或可能不会解决您的问题,这取决于输入是否格式良好。

  3. 您的 XML 文件可能在标题中具有encoding="UTF-16",但实际上不是 UTF-16 编码的。如果是这种情况,那么它是格式错误的,符合规范的解析器将拒绝解析它。通过使用您选择的高级文本编辑器(我建议在 Windows 上使用 Notepad++,Linux 用户可能知道他们的首选项是什么),验证这一点,并检查是否显示"UTF-16"作为编码(且内容可读)。

    如果我使用 UTF-8 运行相同的操作,一切都正常。

    此行建议实际情况是这样的:XML 文件错误标记了自己的编码方式。这需要在创建 XML 文件的地方进行修复。

    值得注意的是,此演示代码提供了与问题中报告的“Content is not allowed in prolog.”异常消息完全相同的异常消息:

    String xml = "<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n<foo />";
    JAXBContext jaxbContext = JAXBContext.newInstance();
    Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
    InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
    jaxbUnmarshaller.unmarshal(inputStream);
    

    请注意,XML 编码属性声称为UTF-16,但实际传递给 XML 解析器的数据是 UTF-8 编码的。

英文:

There's several things wrong here (ranging from slightly suboptimal, to potentially very wrong). In increasing order of likelihood of causing the problem:

  1. When constructing an InputStreamReader, there's no need to call toString() on the Charset, because that class has a constructor that takes a Charset, so simply remove the .toString():

    Reader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_16);
    

    This is a tiny nitpick and has no effect on functionality.

  2. Don't construct a Reader at all! XML is a format that's self-describing when it comes to encoding: Valid XML files can be parsed without knowing the encoding up-front. So instead of creating a Reader, simply pass the InputStream directly into your XML-handling code. Delete the line that creates the Reader and change the next one to this:

     MarkerDto markerDto = (MarkerDto) jaxbUnmarshaller.unmarshal(inputStream);
    

    This may or may not fix your problem, depending on whether the input is well-formed.

  3. Your XML file might have encoding=&quot;UTF-16&quot; in the header and not actually be UTF-16 encoded. If that's the case, then it is malformed and a conforming parser will decline to parse it. Verify this by opening the file with the advanced text editor of your choice (I suggest Notepad++ on Windows, Linux users probably know what their preference is) and check if it shows "UTF-16" as encoding (and the content is readable).

    > If I run the same with UTF-8 everything works fine.

    This line suggests that that's what's actually happening here: the XML file is mis-labeling itself. This needs to be fixed at the point where the XML file is created.

    Notably, this demo code provides exactly the same Content is not allowed in prolog. exception message that is reported in the question:

    String xml = &quot;&lt;?xml version=\&quot;1.0\&quot; encoding=\&quot;UTF-16\&quot;?&gt;\n&lt;foo /&gt;&quot;;
    JAXBContext jaxbContext = JAXBContext.newInstance();
    Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
    InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
    jaxbUnmarshaller.unmarshal(inputStream);
    

    Note that the XML encoding attribute claims UTF-16, but the actual data handed to the XML parser is UTF-8 encoded.

huangapple
  • 本文由 发表于 2023年2月14日 19:52:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75447446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定