英文:
How do I allow #PCDATA without forcing it and how do I deny #PCDATA?
问题
我想要获取一个类似的XML结构:
<root>
<back>no#PCDATA</back>
<front>allow#PCDATA</front>
</root>
我有以下内容:
<!ELEMENT root (back?, front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>
英文:
I would like to get an xml structor like:
<root>
allow<back>no#PCDATA</back>
allow<front>allow#PCDATA</front>
</root>
I have:
<!ELEMENT root (back?,front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>
答案1
得分: 0
使用XML DTDs,你能得到的最好结果是
<!DOCTYPE root [
<!ELEMENT root (#PCDATA|back|front)*>
<!ELEMENT back (js*)>
<!ELEMENT front (#PCDATA|para)*>
]>
<root>
允许<back><!-- 无#PCDATA --></back>
允许<front>允许#PCDATA</front>
</root>
因为XML DTDs对#PCDATA
内容标记的使用有限制;具体而言,它必须是由|
连接符分隔的一组元素中的第一部分,根据XML规范。
你可以使用Libxml2(xmllint --valid
命令行实用程序)检查此示例。
另一方面,SGML,XML基于其上,XML DTD被设计为其子集,并且没有此限制,允许#PCDATA
多次出现:
<!DOCTYPE root [
<!-- 注意:这是SGML而不是XML -->
<!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))>
<!ELEMENT back - - (js*)>
<!ELEMENT front - - (#PCDATA|para)*>
]>
<root>
允许<back><!-- 无#PCDATA --></back>
允许<front>允许#PCDATA</front></root>
你可以使用OpenSP(osgmlnorm
命令行实用程序)或sgmljs(sgmlproc
命令行实用程序)检查这些SGML示例。但在这种情况下,SGML也有一些限制:
-
你可能已经注意到
</root>
结束元素标记放在行尾;这是因为SGML将换行符解释为字符数据,除非它出现在仅包含具有开始和结束元素标记的单个元素的行之后,在这种情况下,它将该换行符视为仅用于格式化目的 -
类似于
(#PCDATA,back?,#PCDATA,front?)
的内容模型是不明确的,因此被禁止,因为如果可选的back
元素不存在,则文本内容可能归因于两个#PCDATA
标记中的任何一个
英文:
Using XML DTDs, the best you can get is
<!DOCTYPE root [
<!ELEMENT root (#PCDATA|back|front)*>
<!ELEMENT back (js*)>
<!ELEMENT front (#PCDATA|para)*>
]>
<root>
allow<back><!-- no#PCDATA --></back>
allow<front>allow#PCDATA</front>
</root>
since XML DTDs places restrictions on how the #PCDATA
content token can be used; namely, that it has to be part of a choice group (specifically, it must be the first part of a group of elements separated by the |
connector) according to the XML specification.
You can check this example using Libxml2 (the xmllint --valid
command line utility).
SGML, on the other hand, on which XML is based, and of which XML DTD is designed to be a subset, doesn't have this restriction and allows #PCDATA
to occur multiple times:
<!DOCTYPE root [
<!-- NOTE: this is SGML not XML -->
<!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))>
<!ELEMENT back - - (js*)>
<!ELEMENT front - - (#PCDATA|para)*>
]>
<root>
allow<back><!-- no#PCDATA --></back>
allow<front>allow#PCDATA</front></root>
You can check these SGML examples using OpenSP (the osgmlnorm
command line utility) or sgmljs (the sgmlproc
command line utility). However, there are restrictions with SGML in this context as well:
-
you will have noticed that the
</root>
end-element tag is put at the end of the line; this is because SGML would interpret a newline as character data unless it occurs after a line containing only a single element with start- and end-element tags in which case it considers that newline as solely for formatting purposes -
a content model such as
(#PCDATA,back?,#PCDATA,front?)
isn't unambiguous and thus disallowed because if the optionalback
element isn't present, text content could be attributed to either of the two#PCDATA
tokens
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论