2023年5月21日 16:55:55go评论108阅读模式

英文:

Grouping adjacent nodes and processing mixed content in XSLT3

问题

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;text&gt;
   &lt;p&gt;TOKEN1 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN2 } TOKEN3 } TOKEN4 } combo text &lt;i&gt;and potentially something else&lt;/i&gt;.&lt;/p&gt;
   &lt;p&gt;TOKEN5 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN6 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN7 } TOKEN8 } TOKEN9 } TOKEN10 } some other &lt;b&gt;combo&lt;/b&gt; text.&lt;/p&gt;
   &lt;p&gt;TOKEN11 some &lt;i&gt;other&lt;/i&gt; text.&lt;/p&gt;
   &lt;p&gt;TOKEN12 x.&lt;/p&gt;
   &lt;p&gt;TOKEN13 y.&lt;/p&gt;
   &lt;p&gt;TOKEN14 z.&lt;/p&gt;
&lt;/text&gt;

英文:

Given this (simplified) xml:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;text&gt;
    &lt;p&gt;TOKEN1 some other text.&lt;/p&gt;
    &lt;p&gt;TOKEN2 }&lt;/p&gt;
    &lt;p&gt;TOKEN3    } combo text &lt;i&gt;and potentially something else&lt;/i&gt;.&lt;/p&gt;
    &lt;p&gt;TOKEN4 }&lt;/p&gt;
    &lt;p&gt;TOKEN5 some other text.&lt;/p&gt;
    &lt;p&gt;TOKEN6 some other text.&lt;/p&gt;
    &lt;p&gt;TOKEN7 }&lt;/p&gt;
    &lt;p&gt;TOKEN8 }&lt;/p&gt;
    &lt;p&gt;TOKEN9    } some other &lt;b&gt;combo&lt;/b&gt; text.&lt;/p&gt;
    &lt;p&gt;TOKEN10 }&lt;/p&gt;
    &lt;p&gt;TOKEN11 some &lt;i&gt;other&lt;/i&gt; text.&lt;/p&gt;
    &lt;p&gt;TOKEN12 x.&lt;/p&gt;
    &lt;p&gt;TOKEN13 y.&lt;/p&gt;
    &lt;p&gt;TOKEN14 z.&lt;/p&gt;
&lt;/text&gt;

my goal is to arrive at:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;text&gt;
   &lt;p&gt;TOKEN1 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN2 } TOKEN3 } TOKEN4 } combo text &lt;i&gt;and potentially something else&lt;/i&gt;.&lt;/p&gt;
   &lt;p&gt;TOKEN5 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN6 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN7 } TOKEN8 } TOKEN9 } TOKEN10 } some other &lt;b&gt;combo&lt;/b&gt; text.&lt;/p&gt;
   &lt;p&gt;TOKEN11 some &lt;i&gt;other&lt;/i&gt; text.&lt;/p&gt;
   &lt;p&gt;TOKEN12 x.&lt;/p&gt;
   &lt;p&gt;TOKEN13 y.&lt;/p&gt;
   &lt;p&gt;TOKEN14 z.&lt;/p&gt;
&lt;/text&gt;

In other words, I would like to merge adjacent paragraphs that have a curly bracket in them by:

merging the text content up to and including the curly bracket; followed by:
anything that might follow the curly bracket

The mixed content bit after the curly bracket will occur in only one of the paragraphs that need to be merged, but the number of the paragraphs to be merged, or the position of the paragraph which has mixed content after the bracket, cannot be not known in advance.

The following XSLT:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
    xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
    exclude-result-prefixes=&quot;xs&quot; expand-text=&quot;true&quot;    version=&quot;3.0&quot;&gt;
    
    &lt;xsl:output method=&quot;xml&quot; indent=&quot;true&quot;&gt;&lt;/xsl:output&gt;
    &lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt;
     
    &lt;xsl:template match=&quot;text&quot;&gt;
        &lt;xsl:copy&gt;
            &lt;xsl:for-each-group select=&quot;p&quot; group-adjacent=&quot;exists(text()[matches(., &#39;\}&#39;)])&quot;&gt;
                &lt;xsl:choose&gt;
                    &lt;xsl:when test=&quot;exists(text()[matches(., &#39;\}&#39;)])&quot;&gt;
                        &lt;xsl:copy&gt;
                            &lt;xsl:for-each select=&quot;current-group()&quot;&gt;
                                &lt;xsl:variable name=&quot;text&quot; select=&quot;normalize-space(text()[1])&quot;/&gt;
                                &lt;xsl:copy-of select=&quot;substring-before($text, &#39;}&#39;)&quot;/&gt;
                                &lt;xsl:text&gt;}} &lt;/xsl:text&gt;
                            &lt;/xsl:for-each&gt;
                        &lt;/xsl:copy&gt;
                    &lt;/xsl:when&gt;
                    &lt;xsl:otherwise&gt;
                        &lt;xsl:copy&gt;
                            &lt;xsl:apply-templates/&gt;
                        &lt;/xsl:copy&gt;
                    &lt;/xsl:otherwise&gt;
                &lt;/xsl:choose&gt;
            &lt;/xsl:for-each-group&gt;
        &lt;/xsl:copy&gt;
    &lt;/xsl:template&gt;
    
&lt;/xsl:stylesheet&gt;

will get me as far as:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;text&gt;
   &lt;p&gt;TOKEN1 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN2 } TOKEN3 } TOKEN4 } &lt;/p&gt;
   &lt;p&gt;TOKEN5 some other text.&lt;/p&gt;
   &lt;p&gt;TOKEN7 } TOKEN8 } TOKEN9 } TOKEN10 } &lt;/p&gt;
   &lt;p&gt;TOKEN11 some &lt;i&gt;other&lt;/i&gt; text.&lt;/p&gt;
&lt;/text&gt;

but there are two problems with it:

this only takes care of Point 1 above; and
I'm missing some paragraphs in the output (those containing TOKEN6, TOKEN12, TOKEN13 and TOKEN14). I don't understand why this happens, and why it doesn't happen to paragraphs containing TOKEN1 and TOKEN5.

I'll be most grateful for your help.

答案1

得分: 1

我认为，在分组后，你需要将你的标记（用 }）包装在一个元素内（例如 token），然后你可以简单地先处理任何 token 包装，然后再处理未被包装为 token 的其余分组节点：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output indent="yes"/>

  <xsl:template match="text">
    <xsl:copy>
      <xsl:for-each-group select="p" group-adjacent="contains(., '}')">
        <xsl:choose>
          <xsl:when test="current-grouping-key()">
            <xsl:copy>
              <xsl:variable name="splitted" as="node()*">
                <xsl:apply-templates select="current-group()/node()" mode="split"/>
              </xsl:variable>
              <xsl:apply-templates select="$splitted[self::token]/text(), $splitted[not(self::token)]"/>
            </xsl:copy>
          </xsl:when>
          <xsl:otherwise>
            <xsl:apply-templates select="current-group()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:mode name="split" on-no-match="shallow-copy"/>

  <xsl:template match="text()[contains(., '}')]" mode="split">
    <xsl:apply-templates select="analyze-string(., '.*}')" mode="wrap"/>
  </xsl:template>

  <xsl:template match="*:match" mode="wrap">
    <token>{.}</token>
  </xsl:template>

</xsl:stylesheet>

如果你需要在输出标记时进行一些空格规范化，首先将 <xsl:apply-templates select="$splitted[self::token]/text(), $splitted[not(self::token)]"/> 替换为例如：

<xsl:value-of select="$splitted[self::token]/normalize-space()" separator=" "/>
<xsl:apply-templates select="$splitted[not(self::token)]"/>

英文:

I think, after grouping, you need to wrap your tokens (with the }) into an element (e.g. token), then you can simply process any token wrappers first and after that the rest of the grouped nodes not being tokens:

&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; version=&quot;3.0&quot;
  xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
  exclude-result-prefixes=&quot;#all&quot;
  expand-text=&quot;yes&quot;&gt;

  &lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt;

  &lt;xsl:output indent=&quot;yes&quot;/&gt;
  
  &lt;xsl:template match=&quot;text&quot;&gt;
    &lt;xsl:copy&gt;
      &lt;xsl:for-each-group select=&quot;p&quot; group-adjacent=&quot;contains(., &#39;}&#39;)&quot;&gt;
        &lt;xsl:choose&gt;
          &lt;xsl:when test=&quot;current-grouping-key()&quot;&gt;
            &lt;xsl:copy&gt;
              &lt;xsl:variable name=&quot;splitted&quot; as=&quot;node()*&quot;&gt;
                &lt;xsl:apply-templates select=&quot;current-group()/node()&quot; mode=&quot;split&quot;/&gt;
              &lt;/xsl:variable&gt;
              &lt;xsl:apply-templates select=&quot;$splitted[self::token]/text(), $splitted[not(self::token)]&quot;/&gt;
            &lt;/xsl:copy&gt;
          &lt;/xsl:when&gt;
          &lt;xsl:otherwise&gt;
            &lt;xsl:apply-templates select=&quot;current-group()&quot;/&gt;
          &lt;/xsl:otherwise&gt;
        &lt;/xsl:choose&gt;
      &lt;/xsl:for-each-group&gt;
    &lt;/xsl:copy&gt;
  &lt;/xsl:template&gt;
  
  &lt;xsl:mode name=&quot;split&quot; on-no-match=&quot;shallow-copy&quot;/&gt;
  
  &lt;xsl:template match=&quot;text()[contains(., &#39;}&#39;)]&quot; mode=&quot;split&quot;&gt;
    &lt;xsl:apply-templates select=&quot;analyze-string(., &#39;.*\}&#39;)&quot; mode=&quot;wrap&quot;/&gt;
  &lt;/xsl:template&gt;

  &lt;xsl:template match=&quot;*:match&quot; mode=&quot;wrap&quot;&gt;
    &lt;token&gt;{.}&lt;/token&gt;
  &lt;/xsl:template&gt;

&lt;/xsl:stylesheet&gt;

If you need to do some white space normalization on outputting the tokens first replace <xsl:apply-templates select="$splitted[self::token]/text(), $splitted[not(self::token)]"/> with e.g.

          &lt;xsl:value-of select=&quot;$splitted[self::token]/normalize-space()&quot; separator=&quot; &quot;/&gt;
          &lt;xsl:apply-templates select=&quot;$splitted[not(self::token)]&quot;/&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在XSLT3中，将相邻节点分组并处理混合内容。

问题

答案1

XLST 2.0 – 基于先前数据添加额外元素

xsl:number在跳过xsl:for-each迭代时递增。

将多个XML标签重命名为不同的名称。

为什么空的样式表会从XML中返回文本？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论