英文:
StAX: START_DOCUMENT on empty XML file
问题
我正试图理解与START_DOCUMENT事件相关的StAX设计。典型的while循环如下:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
try {
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
switch(xmlEvent.getEventType()) {
[...]
使用此循环无法区分空XML文件和仅具有xml版本的XML文件。例如:
% test -s empty.xml || echo empty
empty
% cat start.xml
<?xml version="1.0" encoding="UTF-8"?>
上述这两个文件产生完全相同的一系列StAX事件(一个START_DOCUMENT)。这种行为是否有文档记录?为什么在空文件的情况下会有START_DOCUMENT事件?
英文:
I am trying to understand StAX design with regards to START_DOCUMENT event. The typical while loop is:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
try {
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
switch( xmlEvent.getEventType() ) {
[...]
Using this loop there is no way to distinguish in between an empty XML file vs an XML file with simply the xml version. Eg:
% test -s empty.xml || echo empty
empty
% cat start.xml
<?xml version="1.0" encoding="UTF-8"?>
The above two files produce exactly the same series of StAX events (one START_DOCUMENT). Is this behavior documented somewhere ? Why would anyone want a START_DOCUMENT event in the case of an empty file ?
答案1
得分: 1
如果您正在解析文件,而文件不包含格式良好的 XML,则唯一可以确定的是会报告错误。您描述的这两种情况(空文件和只包含 XML 声明的文件)都不是格式良好的,因此除了错误之外,您不能依赖任何其他内容。
话虽如此,如果我记得正确,即使在格式良好的情况下,不同的 StAX 解析器在报告事件序列方面也会有差异。值得用多个解析器对您的代码进行测试。
英文:
If you're parsing a file and the file doesn't contain well-formed XML, then the only thing you can be sure of is that an error will be reported. Neither of the two cases you describe (an empty file, and a file containing only an XML declaration) is well-formed, so you can't rely on anything except the error.
Having said that, if I recall correctly there are differences between StAX parsers in the sequence of events they report, even in cases that are well-formed. It's worth testing your code with more than one.
答案2
得分: 1
以下是翻译好的内容:
两个文件都可以被解析,因为XML声明是可选的。
两者都不是格式良好的(因为格式良好的XML必须具有根元素),但特别从StAX等事件解析器的角度来看,它们是相同的事物。
在START_DOCUMENT
事件之后,下一个hasNext
调用应该抛出一个XMLStreamException
,指示文档不是格式良好的。
英文:
Either file are equally parseable, as the XML declaration is optional.
Neither one is well-formed (because a well-formed XML must have a root element), but especially from the perspective of a event parser like StAX, these are the same thing.
After the START_DOCUMENT
event the next hasNext
call should throw a XMLStreamException
indicating that the document isn't well-formed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论