优化XSLT函数在for-each部分的性能。

huangapple go评论79阅读模式
英文:

Improve performance of an XSLT function in the for-each part

问题

我已经实现了一个XSLT函数,该函数以一个节点及其父节点作为输入,并“收集”节点中所有这些数据项的值,如果它们全部相同则返回“True”,如果它们不同则返回“False”。
实现如下:

<?xml version="1.0" encoding="UTF-8"?>
<!--
This file was generated by SOFTDEV team.

This function searches for the nodes with the given name under the given parent and outputs whether all are the same or not
-->
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ns0="http://ecs.dgtaxud.ec">
    <xsl:template name="checkUniqueVales">
        <xsl:param name="nodeParentName" select="()"/>
        <xsl:param name="nodeName" select="()"/>
        <xsl:variable name="var1" as="xs:string*">
            <xsl:for-each select="//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]">
                <xsl:sequence select="fn:string(.)"/>
            </xsl:for-each>
        </xsl:variable>
        <xsl:sequence select="xs:string((xs:string(fn:count(fn:distinct-values($var1))))='1')"/>
    </xsl:template>
</xsl:stylesheet>

这个模板 "checkUniqueValues" 被调用了4次。它接受两个参数作为输入,即“nodeParentName”和“nodeName”,分别表示父节点的名称和目标节点的名称。然后,它存储了所有与给定的“nodeName”匹配且是匹配给定“nodeParentName”的父节点的子节点的值。

让我用一个XML示例来说明:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Group>
    <Item>
        <ref>16</ref>
        <countryOfDis>Italy</countryOfDis>
        <countryOfDes>Spain</countryOfDes>
        <Support>
            <method>Transdoc</method>
        </Support>
    </Item>
    <Item>
        <ref>16</ref>
        <countryOfDis>Italy</countryOfDis>
        <countryOfDes>Spain</countryOfDes>
        <Support>
            <method>Transdoc</method>
        </Support>
    </Item>
</Group>

这个函数想要检查每个 "Item" 下的 "ref"、"countryOfDis"、"countryOfDes" 以及 "Support" 下的 "method" 是否具有相同的值。它以 "Item" 作为 nodeParentName,以 "countryofDis"、"countryOfDes" 或 "Support" 下的节点名称作为 nodeName。

当消息很大时,对于每个部分都有一个大的倒计时,这会导致问题。也许一个想法是在找到两个相等的情况下中断循环,然后从中断的位置退出循环,但我认为在XSLT中实现这个可能不太可行。您对如何以高效的方式实现这个有什么想法吗?

提前感谢您的帮助。

英文:

i have implemented an XSLT function that takes as inputs a node and its parent node and “gathers” the values of all those data items in the node and returns “True” if all of them are the same or “False” if they are different.
The implementation is the following:
<?xml version="1.0" encoding="UTF-8"?>
<!--
This file was generated by SOFTDEV team.

This function searches for the nodes with the given name under the given parent and outputs whether all are the same or not

--&gt;
&lt;xsl:stylesheet version=&quot;2.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; xmlns:ns0=&quot;http://ecs.dgtaxud.ec&quot;&gt;
	&lt;xsl:template name=&quot;checkUniqueVales&quot;&gt;
		&lt;xsl:param name=&quot;nodeParentName&quot; select=&quot;()&quot;/&gt;
		&lt;xsl:param name=&quot;nodeName&quot; select=&quot;()&quot;/&gt;
		&lt;xsl:variable name=&quot;var1&quot; as=&quot;xs:string*&quot;&gt;
			&lt;xsl:for-each select=&quot;//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]&quot;&gt;
				&lt;xsl:sequence select=&quot;fn:string(.)&quot;/&gt;
			&lt;/xsl:for-each&gt;
		&lt;/xsl:variable&gt;
		&lt;xsl:sequence select=&quot;xs:string((xs:string(fn:count(fn:distinct-values($var1))))=&#39;1&#39;)&quot;/&gt;
	&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

This template checkUniqueValues is being called 4 times. It takes as an input two parameters, “nodeParentName” and “nodeName”, which represent the name of the parent node and the name of the target node, respectively. It then stores the values of all the target nodes that match the given “nodeName” and are children of the parent nodes that match the given “nodeParentName”.
Let me give you an example of with an xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

&lt;Group&gt;
    &lt;Item&gt;
	  &lt;ref&gt;16&lt;/ref&gt;
	  &lt;countryOfDis&gt;Italy&lt;/countryOfDis&gt;
	  &lt;countryOfDes&gt;Spain&lt;/countryOfDes&gt;
	  &lt;Support&gt;
		 &lt;method&gt;Transdoc&lt;/method&gt;
      &lt;/Support&gt;
	&lt;/Item&gt;
    &lt;Item&gt;
	  &lt;ref&gt;16&lt;/ref&gt;
	  &lt;countryOfDis&gt;Italy&lt;/countryOfDis&gt;
	  &lt;countryOfDes&gt;Spain&lt;/countryOfDes&gt;
	  &lt;Support&gt;
		 &lt;method&gt;Transdoc&lt;/method&gt;
      &lt;/Support&gt;
	&lt;/Item&gt;
&lt;/Group&gt;

This function wants to check if ref, countryOfDis countryOfDes and method under Support have the same values for each Item. It took us input the Item as nodeParentName and countryofDis or countryOfDes or method under Support as nodeName
The for each part when the message is big has a big countdown and this is causing issues. Maybe one idea is to somehow break the for each when we find for example two equals and then break from the for each but i dont think this is feasible in XSLT. Any ideas on how to implement this in an efficient way?

Thanks in advance

答案1

得分: 3

性能取决于许多因素,其中一个重要因素是您正在使用的XSLT处理器 - 您还没有告诉我们。

您还没有告诉我们checkUniqueValues模板被调用的频率。

看看这个:

<xsl:for-each select="//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]">

您要求处理器执行大量工作,我想知道多少是真正必要的?

您真的需要使用正则表达式来匹配这两个名称,而不是精确的字符串匹配吗?正则表达式中的内容是什么,也许它们可以简化?

您真的需要使用"//"而不是"/"吗?基本上,您在这里进行了二次操作:对于每个后代节点,搜索其所有后代节点,这可能会在树的大小上是O(n^2)。

一个明显的改进(优化引擎可能会为您执行,但最好不要假设)是将其编写为

//*[matches(local-name(), $nodeName)][ancestor::*[matches(local-name(), $nodeParentName)]

因为这样您正在说“对于每个后代节点,搜索所有其祖先”,而祖先的数量通常远远小于后代的数量。

下一步:而不是使用正则表达式搜索所有祖先,请提前识别它们:

<xsl:variable name="matching-ancestors" select="//*[matches(local-name(), $nodeParentName)]"/>

然后

<xsl:for-each select="//*[matches(local-name(), $nodeName)][ancestor::* intersect $matching-ancestors]"/>

另一个优化是收集文档中所有元素的本地名称,执行distinct-values()以使列表唯一,过滤名称列表以仅保留与正则表达式匹配的名称,然后搜索具有这些特定名称的元素。这将大大减少您正在执行的正则表达式匹配数量(基本上每个文档中不同元素名称每个匹配进行2次匹配)。

英文:

Performance depends on many factors, one important one being the XSLT processor that you are using - which you haven't told us.

You also haven't told us how often the checkUniqueValues template is being called.

Looking at this:

&lt;xsl:for-each select=&quot;//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]&quot;&gt;

you're asking the processor to do a lot of work, and I wonder how much of it is really necessary?

Do you really need to match the two names using regular expressions, rather than an exact string match? And what's in the regular expressions, perhaps they can be simplified?

Do you really need to use "//" rather than "/"? You've basically got a quadratic operation here: for every descendant node, search all its descendant nodes, which is likely to be O(n^2) in the size of the tree.

An obvious improvement (which an optimizing engine might do for you, but it's best to assume not) is to write it as

//*[matches(local-name(), $nodeName)][ancestor::*[matches(local-name(), $nodeParentName)]

because then you're saying "for each descendant node, search all its ancestors", and the number of ancestors is usually very much smaller than the number of descendants.

Next step: rather than searching all the ancestors using a regular expression, identify them in advance:

&lt;xsl:variable name=&quot;matching-ancestors&quot;
select=&quot;//*[matches(local-name(), $nodeParentName)]&quot;/&gt;

and then

&lt;xsl:for-each select=&quot;//*[matches(local-name(), $nodeName)]
                         [ancestor::* intersect $matching-ancestors&quot;/&gt;

Another optimization would be to collect all the local names of elements in the document, do a distinct-values() to make the list unique, filter the list of names to retain only those that match the regular expression, then search for elements with those specific names. This would greatly reduce the number of regular expression matches you are doing (basically to 2 matches per distinct element name in the document).

答案2

得分: 2

在XSLT 3中,你可以使用xsl:iteratexsl:break,因此,如果你真的认为你有一个大型数据集,手动迭代和中断有助于提高性能,那么这里是一个示例:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  xmlns:mf="http://example.com/mf"
  expand-text="yes">
  
  <xsl:function name="mf:check-unique-values" as="xs:boolean">
    <xsl:param name="context-node" as="node()"/>
    <xsl:param name="parent-element-name" as="xs:string"/>
    <xsl:param name="element-name" as="xs:string"/>
    <xsl:variable name="elements" as="element()*">
      <xsl:evaluate xpath="'.//' || $parent-element-name || '/' || $element-name" context-item="$context-node"/>
    </xsl:variable>
    <xsl:iterate select="$elements">
      <xsl:param name="unique" as="xs:boolean" select="true()"/>
      <xsl:param name="value" as="item()?" select="()"/>
      <xsl:on-completion select="$unique"/>
      <xsl:choose>
        <xsl:when test="empty($value) or $value = data(.)">
          <xsl:next-iteration>
            <xsl:with-param name="value" select="data(.)"/>
          </xsl:next-iteration>
        </xsl:when>
        <xsl:otherwise>
          <xsl:break select="false()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:iterate>
  </xsl:function>
  
  <xsl:template match="root">
    <unique-items>{mf:check-unique-values(., 'items', 'item')}</unique-items>
    <unique-items>{mf:check-unique-values(., 'values', 'value')}</unique-items>
  </xsl:template>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:output indent="yes"/>
  
</xsl:stylesheet>

另一方面,只要你使用普通的XPath和XSLT以及路径表达式,而不是matches比较,我认为现在的XSLT/XPath 2或3实现可能足够智能,以优化例如count(distinct-values(//items/item)) = 1的评估方法,不一定需要检查所有//items/item

你的代码:

<xsl:template name="checkUniqueVales">
    <xsl:param name="nodeParentName" select="()"/>
    <xsl:param name="nodeName" select="()"/>
    <xsl:variable name="var1" as="xs:string*">
        <xsl:for-each select="//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]">
            <xsl:sequence select="fn:string(.)"/>
        </xsl:for-each>
    </xsl:variable>
    <xsl:sequence select="xs:string((xs:string(fn:count(fn:distinct-values($var1))))='1')"/>
</xsl:template>

可能更好地写成:

<xsl:template name="checkUniqueValues">
    <xsl:param name="nodeParentName" select="()"/>
    <xsl:param name="nodeName" select="()"/>
    <xsl:sequence select="count(distinct-values(//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]))=1"/>
</xsl:template>

是否有助于XSLT 2处理器优化评估需要进行测试,我认为主要的问题是使用//*[matches(..)//*[matches(..)而不是普通的元素路径表达式。因此,不是使用一个传递两个元素名称的函数或命名模板,而是直接使用XPath表达式,例如count(distinct-values(//foo//bar)) = 1count(distinct-values(//items//item)) = 1,而不是进行<xsl:call-template name="checkUniqueValues">..</xsl:call-template>的调用。

在你最新的编辑之后,似乎已经有一个已知的输入格式,你知道要检查的确切节点和路径,如果你真的想要检查是否有唯一的Item,涉及三个子元素和一个后代元素,那么在XSLT 3中,你可以轻松使用composite="yes"group-by中的键值序列:

<xsl:template match="Group">
  <xsl:copy>
    <xsl:for-each-group select="Item" composite="yes" group-by="ref, countryOfDis, countryOfDes, Support/method">
      <group key="{current-grouping-key()}" unique="{count(current-group()) = 1}"/>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

在XSLT 2中,你可以使用单个键连接四个元素:

<xsl:template match="Group">
  <xsl:copy>
    <xsl:for-each-group select="Item" group-by="string-join((ref, countryOfDis, countryOfDes, Support/method), '|')">
      <group key="{current-grouping-key()}" unique="{count(current-group()) = 1}"/>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

希望这些翻译对你有所帮助。

英文:

In XSLT 3 you have xsl:iterate with xsl:break, thus if you really think you have a large data set where manual iteration and breaking helps to improve performance then here is an example:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
version=&quot;3.0&quot;
xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
exclude-result-prefixes=&quot;#all&quot;
xmlns:mf=&quot;http://example.com/mf&quot;
expand-text=&quot;yes&quot;&gt;
&lt;xsl:function name=&quot;mf:check-unique-values&quot; as=&quot;xs:boolean&quot;&gt;
&lt;xsl:param name=&quot;context-node&quot; as=&quot;node()&quot;/&gt;
&lt;xsl:param name=&quot;parent-element-name&quot; as=&quot;xs:string&quot;/&gt;
&lt;xsl:param name=&quot;element-name&quot; as=&quot;xs:string&quot;/&gt;
&lt;xsl:variable name=&quot;elements&quot; as=&quot;element()*&quot;&gt;
&lt;xsl:evaluate xpath=&quot;&#39;.//&#39; || $parent-element-name || &#39;/&#39; || $element-name&quot; context-item=&quot;$context-node&quot;/&gt;
&lt;/xsl:variable&gt;
&lt;xsl:iterate select=&quot;$elements&quot;&gt;
&lt;xsl:param name=&quot;unique&quot; as=&quot;xs:boolean&quot; select=&quot;true()&quot;/&gt;
&lt;xsl:param name=&quot;value&quot; as=&quot;item()?&quot; select=&quot;()&quot;/&gt;
&lt;xsl:on-completion select=&quot;$unique&quot;/&gt;
&lt;xsl:choose&gt;
&lt;xsl:when test=&quot;empty($value) or $value = data(.)&quot;&gt;
&lt;xsl:next-iteration&gt;
&lt;xsl:with-param name=&quot;value&quot; select=&quot;data(.)&quot;/&gt;
&lt;/xsl:next-iteration&gt;
&lt;/xsl:when&gt;
&lt;xsl:otherwise&gt;
&lt;xsl:break select=&quot;false()&quot;/&gt;
&lt;/xsl:otherwise&gt;
&lt;/xsl:choose&gt;
&lt;/xsl:iterate&gt;
&lt;/xsl:function&gt;
&lt;xsl:template match=&quot;root&quot;&gt;
&lt;unique-items&gt;{mf:check-unique-values(., &#39;items&#39;, &#39;item&#39;)}&lt;/unique-items&gt;
&lt;unique-items&gt;{mf:check-unique-values(., &#39;values&#39;, &#39;value&#39;)}&lt;/unique-items&gt;
&lt;/xsl:template&gt;
&lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt;
&lt;xsl:output indent=&quot;yes&quot;/&gt;
&lt;/xsl:stylesheet&gt;

XSLT 3 online fiddle.

On the other hand, as long as you use normal XPath and XSLT with path expressions instead of matches comparisons, I would think that nowadays XSLT/XPath 2 or 3 implementations are perhaps smart enough to optimize the evaluatation of e.g. count(distinct-values(//items/item)) = 1 to some evaluation approach that doesn't necessarily need to check all //items/item.

And your code

&lt;xsl:template name=&quot;checkUniqueVales&quot;&gt;
&lt;xsl:param name=&quot;nodeParentName&quot; select=&quot;()&quot;/&gt;
&lt;xsl:param name=&quot;nodeName&quot; select=&quot;()&quot;/&gt;
&lt;xsl:variable name=&quot;var1&quot; as=&quot;xs:string*&quot;&gt;
&lt;xsl:for-each select=&quot;//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]&quot;&gt;
&lt;xsl:sequence select=&quot;fn:string(.)&quot;/&gt;
&lt;/xsl:for-each&gt;
&lt;/xsl:variable&gt;
&lt;xsl:sequence select=&quot;xs:string((xs:string(fn:count(fn:distinct-values($var1))))=&#39;1&#39;)&quot;/&gt;
&lt;/xsl:template&gt;

might be better written as e.g.

&lt;xsl:template name=&quot;checkUniqueValues&quot;&gt;
&lt;xsl:param name=&quot;nodeParentName&quot; select=&quot;()&quot;/&gt;
&lt;xsl:param name=&quot;nodeName&quot; select=&quot;()&quot;/&gt;
&lt;xsl:sequence select=&quot;count(distinct-values(//*[matches(local-name(), $nodeParentName)]//*[matches(local-name(), $nodeName)]))=1&quot;/&gt;
&lt;/xsl:template&gt;

Whether that helps the XSLT 2 processor optimizing the evaluation needs to be tested, I would think the main culprit is usin e.g. //*[matches(..)//*[matches(..) instead of normal element path expressions. Thus instead of using a function or a named template where you pass in two element names it might considerable be easier and hopefully faster to directly use an XPath expression of e.g. count(distinct-values(//foo//bar)) = 1 or count(distinct-values(//items//item)) = 1 instead of doing &lt;xsl:call-template name=&quot;checkUniqueValues&quot;&gt;..&lt;/xsl:call-template&gt;.

After your latest edit it seems there is a known input format and you now the exact node nodes and paths you want to check, it all sounds like grouping with a composite key to me if you really want to check whether you have unique Items in terms of three child elements and one descendant element; in XSLT 3 you can easily use composite=&quot;yes&quot; and a sequence of key values in group-by:

  &lt;xsl:template match=&quot;Group&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:for-each-group select=&quot;Item&quot; composite=&quot;yes&quot; group-by=&quot;ref, countryOfDis, countryOfDes, Support/method&quot;&gt;
&lt;group key=&quot;{current-grouping-key()}&quot; unique=&quot;{count(current-group()) = 1}&quot;/&gt;
&lt;/xsl:for-each-group&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;

In XSLT 2 you can use a single key concatenating the four elements:

  &lt;xsl:template match=&quot;Group&quot;&gt;
&lt;xsl:copy&gt;
&lt;xsl:for-each-group select=&quot;Item&quot; group-by=&quot;string-join((ref, countryOfDis, countryOfDes, Support/method), &#39;|&#39;)&quot;&gt;
&lt;group key=&quot;{current-grouping-key()}&quot; unique=&quot;{count(current-group()) = 1}&quot;/&gt;
&lt;/xsl:for-each-group&gt;
&lt;/xsl:copy&gt;
&lt;/xsl:template&gt;

huangapple
  • 本文由 发表于 2023年7月13日 18:44:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定