XSTL/ XSL file: need to remove duplicates generically from the parent tag given that all the child key values are same for XML

huangapple go评论77阅读模式
英文:

XSTL/ XSL file: need to remove duplicates generically from the parent tag given that all the child key values are same for XML

问题

以下是你提供的内容的翻译部分:

我已经在一个问题上工作了很长时间。我需要根据子标签的键值从xml文件中删除重复项。父标签"A"将始终是已知的并保持不变。嵌套标签可以有不同的名称,即可能有"Name","Location","Name"等。如果两个"Name"标签下的数据彼此重复,则必须删除一个名称标签及其子节点。只有在所有子标签键值相同的情况下,才应执行此操作,如果只有一个、两个或更多标签相同,但在父标签下存在一些具有不同键值或相同键值但不同值的标签,则不应执行此操作。

示例:

<A>
  <Name>
    <c>1</c>
    <d>g</d>
    <e>h</e>
  </Name>
  <Location>
    <c>2</c>
    <d>g</d>
    <e>h</e>
  </Location>
  <Name>
    <c>1</c>
    <d>g</d>
    <e>h</e>
  </Name>
<A>

预期输出:

<A>
  <Name>
    <c>1</c>
    <d>g</d>
    <e>h</e>
  </Name>
  <Locaiton>
    <c>2</c>
    <d>g</d>
    <e>h</e>
  </Locaiton>
<A>

我尝试了这个:

<xsl:template match="@*|node()">
  <xsl:if test="not(node()) or not(preceding-sibling::node()[.=string(current())])">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:if>
</xsl:template>

但最终发生的是,具有相同键值的子标签也被删除了,我得到了这样的结果:

<A>
  <Name>
    <c>1</c>
    <d>g</d>
    <e>h</e>
  </Name>
  <Location>
    <c>2</c>
  </Location>
<A>

我正在寻找一种通用的方法,因为我不想在文件中指定标签值或键。

提前感谢你的帮助!

英文:

I have been working on a problem for a long time. I need to remove duplicates from xml file based on the key value of the child tag. The parent tag "A" will always be known and will stay the same. The nested tags can have different names i.e., there could be "Name", "Location", "Name". If the data under 2 "Name" tags are duplicates of each other, one of the name tag along with its child nodes must get removed. This should only happen if all the child tag key values are same and not if only one or 2 or more tags are same but there exists some tags with different key value or same key and different value under the parent tag.

Example:

`&lt;A&gt;
  &lt;Name&gt;
  &lt;c&gt;1&lt;c&gt;
  &lt;d&gt;g&lt;/d&gt;
  &lt;e&gt;h&lt;/e&gt;
 &lt;/Name&gt;
 &lt;Location&gt;
  &lt;c&gt;2&lt;c&gt;
  &lt;d&gt;g&lt;/d&gt;
  &lt;e&gt;h&lt;/e&gt;
 &lt;/Location&gt;
 &lt;Name&gt;
  &lt;c&gt;1&lt;c&gt;
  &lt;d&gt;g&lt;/d&gt;
  &lt;e&gt;h&lt;/e&gt;
 &lt;/Name&gt;
&lt;A&gt;`

Expected output:

`&lt;A&gt;
 &lt;Name&gt;
  &lt;c&gt;1&lt;c&gt;
  &lt;d&gt;g&lt;/d&gt;
  &lt;e&gt;h&lt;/e&gt;
 &lt;/Name&gt;
 &lt;Locaiton&gt;
  &lt;c&gt;2&lt;c&gt;
  &lt;d&gt;g&lt;/d&gt;
  &lt;e&gt;h&lt;/e&gt;
 &lt;/Locaiton&gt;
&lt;A&gt;`

I tried : this:

`&lt;xsl:template match=&quot;@*|node()&quot;&gt;
  &lt;xsl:if test=&quot;not(node()) or not(preceding-sibling::node()[.=string(current())])&quot;&gt;
    &lt;xsl:copy&gt;
      &lt;xsl:apply-templates select=&quot;@*|node()&quot;/&gt;
    &lt;/xsl:copy&gt;
  &lt;/xsl:if&gt;

</xsl:template>`

but what ended up happening was that the child tags with the same key values got removed as well and I was getting something like this:

`&lt;A&gt;
 &lt;Name&gt;
     &lt;c&gt;1&lt;c&gt;
     &lt;d&gt;g&lt;/d&gt;
     &lt;e&gt;h&lt;/e&gt;
&lt;/Name&gt;
&lt;Location&gt;
    &lt;c&gt;2&lt;c&gt;

&lt;/Location&gt;
&lt;A&gt;`

I'm looking for a generic way as I don't want to specify the tag values or keys in the file.

Thanks in advance :)!

答案1

得分: 0

在XSLT 3中,使用for-each-group group-by与复合键可能足够:

<xsl:template match="A">
    <xsl:copy>
        <xsl:for-each-group select="*" composite="yes" group-by="*">
            <xsl:apply-templates select="."/>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

如果孙子节点也不需要排序,则可能需要以下方式:

<xsl:for-each-group select="*" composite="yes" group-by="sort(*, (), function($c) { name($c) })">

而不是上面给出的简单group-by方式。

无论哪种方式,作为基本转换,您需要通过在XSLT的xsl:stylesheet(或xsl:transform)中声明<xsl:mode on-no-match="shallow-copy"/>来设置身份转换。

但问题的规范性不太明确,不清楚子元素的名称和顺序是完全未知的还是始终相同的,或者是否可能存在变化以及如何处理它们。

作为替代方案,如果您可以让A的子元素不同,但只需要针对特定元素(如B)消除重复项,但可能还有其他可能的元素,那么为B元素明确声明的键可以帮助:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    expand-text="yes">
    
    <xsl:mode on-no-match="shallow-copy"/>
      
    <xsl:key name="B-group" match="A/B" use="sort(*, (), function($c) { name($c) })"/>
    
    <xsl:template match="B[not(. is key('B-group', sort(*, (), function($c) { name($c) }))[1])]"/>
    
</xsl:stylesheet>
英文:

In XSLT 3 using a composite key with for-each-group group-by might suffice:

  &lt;xsl:template match=&quot;A&quot;&gt;
      &lt;xsl:copy&gt;
          &lt;xsl:for-each-group select=&quot;*&quot; composite=&quot;yes&quot; group-by=&quot;*&quot;&gt;
              &lt;xsl:apply-templates select=&quot;.&quot;/&gt;
          &lt;/xsl:for-each-group&gt;
      &lt;/xsl:copy&gt;
  &lt;/xsl:template&gt;

If the grandchildren also not to be sorted then you might need

      &lt;xsl:for-each-group select=&quot;*&quot; composite=&quot;yes&quot; group-by=&quot;sort(*, (), function($c) { name($c) })&quot;&gt;

instead of the simple group-by given above.

Both ways, as the base transformation, you need to set up the identity transformation by declaring &lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt; as a child of xsl:stylesheet (or xsl:transform) in the XSLT.

But the problem is rather underspecified, it is not clear whether the names and order of child elements are simply unknown are always the same or whether there can be variations and how to handle them.

As an alternative, if you can have different elements as children of A but need to eliminate duplicates only for a specific one like B but for possible other elements then a key explicitly declared for B elements can help

&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
  version=&quot;3.0&quot;
  xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
  exclude-result-prefixes=&quot;#all&quot;
  expand-text=&quot;yes&quot;&gt;

  &lt;xsl:mode on-no-match=&quot;shallow-copy&quot;/&gt;
  
  &lt;xsl:key name=&quot;B-group&quot; match=&quot;A/B&quot; use=&quot;sort(*, (), function($c) { name($c) })&quot;/&gt;

  &lt;xsl:template match=&quot;B[not(. is key(&#39;B-group&#39;, sort(*, (), function($c) { name($c) }))[1])]&quot;/&gt;

&lt;/xsl:stylesheet&gt;

huangapple
  • 本文由 发表于 2020年9月15日 06:45:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/63892711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定