如何使用XSL选择包含特殊字符的元素?

huangapple go评论51阅读模式
英文:

How to select elements containing special characters using XSL?

问题

以下是您要翻译的内容:

"I have an ascii-encoded XML-file (in which the various special characters are encoded as &#x..;). Here is a simplified example:

<?xml version="1.0" encoding="ascii"?>
<data>
    <element1>Some regular text</element1>
    <element2>Text containing special characters: 1º-2ª</element2>
    <element3>Again regular text, but with the special character prefix: #x</element3>
</data>

Now what I want to do is to pick all the leaf elements containing special characters. The output should look like

The following elements in the input file contain special characters:
<element2>Text containing special characters: 1º-2ª</element2>

I tried with this XSL:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="/">
        <xsl:text>The following elements in the input file contain special characters:
        </xsl:text>
        <xsl:for-each select="//*">
            <xsl:if test="not(*) and contains(., '&#x')">
                <xsl:copy-of select="."></xsl:copy-of>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

But it only gives me:

The following elements in the input file contain special characters:

If I try to search for just "#x" with this XSL:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="/">
        <xsl:text>The following elements in the input file contain special characters:
        </xsl:text>
        <xsl:for-each select="//*">
            <xsl:if test="not(*) and contains(., '#x')">
                <xsl:copy-of select="."></xsl:copy-of>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

I get:

The following elements in the input file contain special characters:
        <element3>Again regular text, but with the special character prefix: #x</element3>

So the question is: is there any way to find those elements which contain special characters encoded as "&#x..;"?

I know I can do this with grep etc:

grep '&#x' simpletest.xml
    <element2>Text containing special characters: 1º-2ª</element2>

but the ultimate goal is to generate a pretty output with information about parent elements etc that can be sent as email notification, and using XSLT would make that part so much easier."

希望这能帮助您进行相应的处理。如果您需要进一步的信息,请随时提问。

英文:

I have an ascii-encoded XML-file (in which the various special characters are encoded as &#x..;). Here is a simplified example:

<?xml version="1.0" encoding="ascii"?>
<data>
    <element1>Some regular text</element1>
    <element2>Text containing special characters: 1º-2ª</element2>
    <element3>Again regular text, but with the special charactre prefix: #x</element3>
</data>

Now what I want to do is to pick all the leaf elements containing special characters. The output should look like

The following elements in the input file contain special characters:
<element2>Text containing special characters: 1º-2ª</element2>

I tried with this XSL:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="/">
        <xsl:text>The following elements in the input file contain special characters:
        </xsl:text>
        <xsl:for-each select="//*">
            <xsl:if test="not(*) and contains(., '&#x')">
                <xsl:copy-of select="."></xsl:copy-of>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

But it only gives me:

The following elements in the input file contain special characters:

If I try to search for just "#x" with this XSL:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="/">
        <xsl:text>The following elements in the input file contain special characters:
        </xsl:text>
        <xsl:for-each select="//*">
            <xsl:if test="not(*) and contains(., '#x')">
                <xsl:copy-of select="."></xsl:copy-of>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

I get:

The following elements in the input file contain special characters:
        <element3>Again regular text, but with the special character prefix: #x</element3>

So the question is: is there any way to find those elements which contain special characters encoded as "&#x..;"?

I know I can do this with grep etc:

grep '&#x' simpletest.xml
    <element2>Text containing special characters: 1º-2ª</element2>

but the ultimate goal is to generate a pretty output with information about parent elements etc that can be sent as email notification, and using XSLT would make that part so much easier.

答案1

得分: 2

在XSLT/XPath中,你无法确定任何Unicode字符是在输入文档中直接存在还是以字符引用的方式存在,但在XSLT 2或3中,你可以使用matches和Unicode范围来检查是否存在特定字符(例如,使用\P{IsBasicLatin}来查找不是ASCII/Latin的字符):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="/">
        <xsl:text>The following elements in the input file contain special characters:
        </xsl:text>
        <xsl:for-each select="//*[not(*) and matches(., '\P{IsBasicLatin}')]">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

输出:

The following elements in the input file contain special characters:
    <element2>Text containing special characters: 1º-2ª</element2>
英文:

In XSLT/XPath you can't know whether any Unicode character was literally in the input document or as a character reference but in XSLT 2 or 3 you can certainly check with matches and Unicode ranges whether certain characters occur (e.g. with \P{IsBasicLatin} for anything not ASCII/Latin):

&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; version=&quot;3.0&quot;&gt;
    &lt;xsl:output omit-xml-declaration=&quot;yes&quot;/&gt;
    &lt;xsl:template match=&quot;/&quot;&gt;
        &lt;xsl:text&gt;The following elements in the input file contain special characters:
        &lt;/xsl:text&gt;
        &lt;xsl:for-each select=&quot;//*[not(*) and matches(., &#39;\P{IsBasicLatin}&#39;)]&quot;&gt;
            &lt;xsl:copy-of select=&quot;.&quot;&gt;&lt;/xsl:copy-of&gt;
        &lt;/xsl:for-each&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

Output:

The following elements in the input file contain special characters:
    &lt;element2&gt;Text containing special characters: 1&#186;-2&#170;&lt;/element2&gt;

答案2

得分: 0

正如Martin所说,类似&amp;#xaa;的字符实体引用会被XML解析器解析,因此当XML传递到您的XSLT时,它们已经被转换为常规的Unicode字符,没有迹象表明它们被特别编码。

如果您想查找某种方式上是“特殊”的字符(即具有特定代码点的Unicode字符),那么Martin的解决方案,使用正则表达式,就是您想要的。这将找到这些字符,无论它们是否使用字符实体引用编码。

但是,如果您实际上想查找字符实体引用,那么您的XSLT需要以纯文本形式读取XML文件(而不将其解析为XML),例如使用unparsed-text XPath函数。但请注意,如果您这样做,那么您将无法看到包含这些字符的特定XML元素,因为XML元素标记也不会被解析。

英文:

As Martin said, character entity references like &amp;#xaa; are resolved by XML parsers so when the XML is passed to your XSLT they will have already been converted to regular Unicode characters, with no sign that they were encoded specially.

If you want to find characters which are "special" in some way (i.e. Unicode characters with particular code points), then Martin's solution using regular expressions is what you want. That will find those characters, irrespective of whether they were encoded with character entity references or not.

However, if you are actually trying to find character entity references, then your XSLT would need to read the XML file as plain text (without parsing it as XML), e.g. using the unparsed-text XPath function. Note though, that if you do that, then you won't be able to see which particular XML elements contains the characters, because the XML element markup will also not have been parsed.

huangapple
  • 本文由 发表于 2023年2月8日 17:38:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75383797.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定