实体翻译为自定义实体

huangapple go评论68阅读模式
英文:

entity translation to customized entity

问题

以下是您要翻译的内容:

在 xml 数据中有一些用户定义的实体。为了取消转义这些实体,我们正在使用以下代码:-

<xsl:stylesheet version='3.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:output method="xml" omit-xml-declaration="no" use-character-maps="mdash" />
<xsl:character-map name="mdash">
<xsl:output-character character="&#x2014;" string="&amp;mdash;"/>
<xsl:output-character character="&amp;" string="&amp;amp;"/>
<xsl:output-character character="&quot;" string="&amp;quot;&quot;"/>
<xsl:output-character character="&apos;" string="&amp;apos;"/>
<xsl:output-character character="&#167;" string="&amp;sect;"/>
<xsl:output-character character="&#36;" string="&amp;dollar;"/>
<xsl:output-character character="&#47;" string="&amp;sol;"/>
<xsl:output-character character="&#45;" string="&amp;hyphen;"/>
</xsl:character-map>
<!--=================================================================-->
<xsl:template match="@* | node()">
<!--=================================================================-->
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

但存在一种特殊情况,其中数据中出现了 &amp;sect; 两次,例如:-

例如- 数字 &amp;sect;&amp;sect; 1234

上述示例应该转换为特殊的用户定义实体,即

输出- 数字 &amp;multisect; 1234

&amp;sect;&amp;sect; 应该转换为 &amp;multisect;

英文:

There are some user defined entites in the xml data. In order to unescape those entities, we are using below code:-

&lt;xsl:stylesheet version=&#39;3.0&#39; xmlns:xsl=&#39;http://www.w3.org/1999/XSL/Transform&#39; &gt;
&lt;xsl:output method=&quot;xml&quot; omit-xml-declaration=&quot;no&quot; use-character-maps=&quot;mdash&quot; /&gt;
&lt;xsl:character-map name=&quot;mdash&quot;&gt;
&lt;xsl:output-character character=&quot;&amp;#x2014;&quot; string=&quot;&amp;amp;mdash;&quot;/&gt;
&lt;xsl:output-character character=&quot;&amp;amp;&quot; string=&quot;&amp;amp;amp;&quot; /&gt;
&lt;xsl:output-character character=&quot;&amp;quot;&quot; string=&quot;&amp;amp;quot;&quot; /&gt;
&lt;xsl:output-character character=&quot;&amp;apos;&quot; string=&quot;&amp;amp;apos;&quot; /&gt;
&lt;xsl:output-character character=&quot;&amp;#167;&quot; string=&quot;&amp;amp;sect;&quot;/&gt;
&lt;xsl:output-character character=&quot;&amp;#36;&quot; string=&quot;&amp;amp;dollar;&quot; /&gt;
&lt;xsl:output-character character=&quot;&amp;#47;&quot; string=&quot;&amp;amp;sol;&quot; /&gt;
&lt;xsl:output-character character=&quot;&amp;#45;&quot; string=&quot;&amp;amp;hyphen;&quot; /&gt;
&lt;/xsl:character-map&gt;
&lt;!--=================================================================--&gt;
&lt;xsl:template match=&quot;@* | node()&quot;&gt;
&lt;!--=================================================================--&gt;
&lt;xsl:copy&gt;
&lt;xsl:apply-templates select=&quot;@* | node()&quot;/&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

But there is a special case where &amp;sect; is appearing twice in data, for example:-

Ex- The number &amp;sect;&amp;sect; 1234

The above should example should be converted to a special userdefined entity i.e.

Output- The number &amp;multisect; 1234

The &amp;sect;&amp;sect; should be converted to &amp;multisect;

答案1

得分: 1

你不能直接在序列化器中实现这一点,就像处理单个字符那样。您要么必须在转换过程中识别 "§§"(也许将其转换为某个私有使用区字符,然后由 xsl:output-character 捕获),要么可以通过在字符流级别后处理输出来实现。

英文:

You can't achieve this directly in the serializer, as you can with single characters. You will either have to recognise "§§" in the transformation proper (perhaps converting it to some private-use-area character, which is then picked up by xsl:output-character), or you could do it by post-processing the output at the character-stream level.

答案2

得分: 1

如果您想使用字符映射,首先需要处理您期望出现两个特殊字符的文本节点,并将它们替换为一个您不希望在其他地方使用的单个字符;然后该字符可以由映射转换为字符串 &amp;multisect;,例如样式表:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:fn="http://www.w3.org/2005/xpath-functions"
	exclude-result-prefixes="#all"
	expand-text="yes"
	version="3.0">

  <xsl:param name="multisect-sub" static="yes" as="xs:string" select="'&#171;'"/>

  <xsl:character-map name="sub">
    <xsl:output-character _character="{$multisect-sub}" string="&amp;multisect;"/>
  </xsl:character-map>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes" use-character-maps="sub"/>

  <xsl:template match="text()">
    <xsl:apply-templates mode="analyze" select="analyze-string(., '&#xA7;&#xA7;')"/>
  </xsl:template>

  <xsl:template mode="analyze" match="fn:match">
    <xsl:text>{$multisect-sub}</xsl:text>
  </xsl:template>

</xsl:stylesheet>

将输入:

<!DOCTYPE text [
  <!ENTITY sect "&#xA7;">
]>
<text>&amp;sect;&amp;sect; 1234</text>

转换为输出:

<?xml version="1.0" encoding="UTF-8"?>
<text>&amp;multisect; 1234</text>

请注意,我主要使用了 &#39;&#171;&#39; 作为示例,您可能需要使用一个私有字符或确保在您的输入/输出数据中不会出现的其他字符。如果您希望结果是规范的,还需要向输出添加一个文档类型(doctype),例如 xsl:output doctype-system="some.dtd",在其中确保 some.dtd 声明了 <!ENTITY multisect "&#xA7;&#xA7;">

英文:

If you want to use a character map, you would first need to process text nodes where you expect the two sect characters to be present and replace them with a single character you don't expect to be used elsewhere; that character could then be converted by the map to the string &amp;multisect; e.g. the stylesheet

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
	xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
	xmlns:fn=&quot;http://www.w3.org/2005/xpath-functions&quot;
	exclude-result-prefixes=&quot;#all&quot;
	expand-text=&quot;yes&quot;
	version=&quot;3.0&quot;&gt;
  
  &lt;xsl:param name=&quot;multisect-sub&quot; static=&quot;yes&quot; as=&quot;xs:string&quot; select=&quot;&#39;&#171;&#39;&quot;/&gt;
  
  &lt;xsl:character-map name=&quot;sub&quot;&gt;
    &lt;xsl:output-character _character=&quot;{$multisect-sub}&quot; string=&quot;&amp;amp;multisect;&quot;/&gt;
  &lt;/xsl:character-map&gt;

  &lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt;

  &lt;xsl:output method=&quot;xml&quot; indent=&quot;yes&quot; use-character-maps=&quot;sub&quot;/&gt;
  
  &lt;xsl:template match=&quot;text()&quot;&gt;
    &lt;xsl:apply-templates mode=&quot;analyze&quot; select=&quot;analyze-string(., &#39;&amp;#xA7;&amp;#xA7;&#39;)&quot;/&gt;
  &lt;/xsl:template&gt;
  
  &lt;xsl:template mode=&quot;analyze&quot; match=&quot;fn:match&quot;&gt;
    &lt;xsl:text&gt;{$multisect-sub}&lt;/xsl:text&gt;
  &lt;/xsl:template&gt;

&lt;/xsl:stylesheet&gt;

transforms the input

&lt;!DOCTYPE text [
  &lt;!ENTITY sect &quot;&amp;#xA7;&quot;&gt;
]&gt;
&lt;text&gt;&amp;sect;&amp;sect; 1234&lt;/text&gt;

into the output

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;text&gt;&amp;multisect; 1234&lt;/text&gt;

Note that I used &#39;&#171;&#39; primarily as an example, you might want to need to use a private char or some other character you are sure doesn't occur in your input/output data.

If you want the result to be well-formed you would also need to add a doctype to the output with e.g. xsl:output doctype-system=&quot;some.dtd&quot; where you ensure that some.dtd declares e.g. &lt;!ENTITY multisect &quot;&amp;#xA7;&amp;#xA7;&quot;&gt;

huangapple
  • 本文由 发表于 2023年7月6日 14:16:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76625997.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定