如何允许 #PCDATA 而不强制它,以及如何拒绝 #PCDATA?

huangapple go评论56阅读模式
英文:

How do I allow #PCDATA without forcing it and how do I deny #PCDATA?

问题

我想要获取一个类似的XML结构:

<root>
	<back>no#PCDATA</back>
	<front>allow#PCDATA</front>
</root>

我有以下内容:

<!ELEMENT root (back?, front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>
英文:

I would like to get an xml structor like:

&lt;root&gt;
	allow&lt;back&gt;no#PCDATA&lt;/back&gt;
	allow&lt;front&gt;allow#PCDATA&lt;/front&gt;
&lt;/root&gt;

I have:

&lt;!ELEMENT root (back?,front?)&gt;
&lt;!ELEMENT back (js*)&gt;
&lt;!ELEMENT front (para*)&gt;

答案1

得分: 0

使用XML DTDs,你能得到的最好结果是

&lt;!DOCTYPE root [
  &lt;!ELEMENT root (#PCDATA|back|front)*&gt;
  &lt;!ELEMENT back (js*)&gt;
  &lt;!ELEMENT front (#PCDATA|para)*&gt;
]&gt;
&lt;root&gt;
  允许&lt;back&gt;&lt;!-- 无#PCDATA --&gt;&lt;/back&gt;
  允许&lt;front&gt;允许#PCDATA&lt;/front&gt;
&lt;/root&gt;

因为XML DTDs对#PCDATA内容标记的使用有限制;具体而言,它必须是由|连接符分隔的一组元素中的第一部分,根据XML规范

你可以使用Libxml2(xmllint --valid命令行实用程序)检查此示例。

另一方面,SGML,XML基于其上,XML DTD被设计为其子集,并且没有此限制,允许#PCDATA多次出现:

&lt;!DOCTYPE root [
  &lt;!-- 注意:这是SGML而不是XML --&gt;
  &lt;!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))&gt;
  &lt;!ELEMENT back - - (js*)&gt;
  &lt;!ELEMENT front - - (#PCDATA|para)*&gt;
]&gt;
&lt;root&gt;
  允许&lt;back&gt;&lt;!-- 无#PCDATA --&gt;&lt;/back&gt;
  允许&lt;front&gt;允许#PCDATA&lt;/front&gt;&lt;/root&gt;

你可以使用OpenSP(osgmlnorm命令行实用程序)或sgmljs(sgmlproc命令行实用程序)检查这些SGML示例。但在这种情况下,SGML也有一些限制:

  • 你可能已经注意到&lt;/root&gt;结束元素标记放在行尾;这是因为SGML将换行符解释为字符数据,除非它出现在仅包含具有开始和结束元素标记的单个元素的行之后,在这种情况下,它将该换行符视为仅用于格式化目的

  • 类似于(#PCDATA,back?,#PCDATA,front?)的内容模型是不明确的,因此被禁止,因为如果可选的back元素不存在,则文本内容可能归因于两个#PCDATA标记中的任何一个

英文:

Using XML DTDs, the best you can get is

&lt;!DOCTYPE root [
  &lt;!ELEMENT root (#PCDATA|back|front)*&gt;
  &lt;!ELEMENT back (js*)&gt;
  &lt;!ELEMENT front (#PCDATA|para)*&gt;
]&gt;
&lt;root&gt;
  allow&lt;back&gt;&lt;!-- no#PCDATA --&gt;&lt;/back&gt;
  allow&lt;front&gt;allow#PCDATA&lt;/front&gt;
&lt;/root&gt;

since XML DTDs places restrictions on how the #PCDATA content token can be used; namely, that it has to be part of a choice group (specifically, it must be the first part of a group of elements separated by the | connector) according to the XML specification.

You can check this example using Libxml2 (the xmllint --valid command line utility).

SGML, on the other hand, on which XML is based, and of which XML DTD is designed to be a subset, doesn't have this restriction and allows #PCDATA to occur multiple times:

&lt;!DOCTYPE root [
  &lt;!-- NOTE: this is SGML not XML --&gt;
  &lt;!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))&gt;
  &lt;!ELEMENT back - - (js*)&gt;
  &lt;!ELEMENT front - - (#PCDATA|para)*&gt;
]&gt;
&lt;root&gt;
  allow&lt;back&gt;&lt;!-- no#PCDATA --&gt;&lt;/back&gt;
  allow&lt;front&gt;allow#PCDATA&lt;/front&gt;&lt;/root&gt;

You can check these SGML examples using OpenSP (the osgmlnorm command line utility) or sgmljs (the sgmlproc command line utility). However, there are restrictions with SGML in this context as well:

  • you will have noticed that the &lt;/root&gt; end-element tag is put at the end of the line; this is because SGML would interpret a newline as character data unless it occurs after a line containing only a single element with start- and end-element tags in which case it considers that newline as solely for formatting purposes

  • a content model such as (#PCDATA,back?,#PCDATA,front?) isn't unambiguous and thus disallowed because if the optional back element isn't present, text content could be attributed to either of the two #PCDATA tokens

huangapple
  • 本文由 发表于 2023年5月14日 07:47:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245289.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定