按ID分组并使用XSLT计算重复项。

huangapple go评论88阅读模式
英文:

grouping by id and counting duplicates with XSLT

问题

I can help you with the translation of the provided XML and code into Chinese. Here's the translated content:

我有以下源XML,我能够按ID分组并计算重复项的数量:

    <?xml version="1.0" encoding="utf-8"?>
    <cases>
        <case id="1" cont="">
            <serial>111</serial>        
        </case>
        <case id="1" cont="">
            <serial>111</serial>
        </case>
        <case id="2" cont="">
            <serial>222</serial>
        </case>
    </cases>

**XSLT 1.0**

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:key name="caseKey" match="case" use="@id"/>
    
    <xsl:template match="cases">
        <output>
            <xsl:apply-templates select="@*|case[generate-id()=generate-id(key('caseKey', @id)[1])]"/>
    	</output>
    </xsl:template>
    
    <xsl:template match="case">
    	<xsl:element name="id">
            <xsl:attribute name="val"><xsl:value-of select="@id"/></xsl:attribute>
            <xsl:element name="duplicates">
                <xsl:value-of select="count(key('caseKey', @id))-1"/>
            </xsl:element>
    	</xsl:element>      
    </xsl:template>
    </xsl:stylesheet>

**XSLT 2.0**

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/cases">
        <output>
            <xsl:for-each-group select="case" group-by="@id">
                <xsl:element name="id">
                    <xsl:attribute name="val"><xsl:value-of select="@id"/></xsl:attribute>
                    <xsl:element name="duplicates">
                        <xsl:value-of select="count(current-group())-1"/>
                    </xsl:element>
                </xsl:element> 
            </xsl:for-each-group>
        </output>
    </xsl:template>
    
    </xsl:stylesheet>

**输出:**

    <?xml version="1.0" encoding="UTF-8"?>
    <output>
       <id val="1">
          <duplicates>1</duplicates>
       </id>
       <id val="2">
          <duplicates>0</duplicates>
       </id>
    </output>

现在,我的挑战是一个案例可以在另一个案例中继续,并且为此“cont”属性将具有值,如“1 | 2”和“2 | 2”,使该案例成为唯一的,到目前为止,我尚未考虑“cont”属性用于键,但现在我认为我必须:

    <?xml version="1.0" encoding="utf-8"?>
    <cases>
        <case id="1" cont="1 | 2">
            <serial>111</serial>        
        </case>
        <case id="1" cont="2 | 2">
            <serial>111</serial>
        </case>
        <case id="2" cont="">
            <serial>222</serial>
        </case>
        <case id="3" cont="">
            <serial>333</serial>
        </case>
        <case id="3" cont="">
            <serial>333</serial>
        </case>
        <case id="1" cont="1 | 2">
            <serial>111</serial>        
        </case>
        <case id="1" cont="2 | 2">
            <serial>111</serial>
        </case>
        <case id="4" cont="1 | 2">
            <serial>444</serial>        
        </case>
        <case id="4" cont="2 | 2">
            <serial>444</serial>
        </case>
    </cases>
对于上面的示例XML,预期输出应为:

    <?xml version="1.0" encoding="UTF-8"?>
    <output>
       <id val="1">
          <duplicates>1</duplicates>
       </id>
       <id val="2">
          <duplicates>0</duplicates>
       </id>
       <id val="3">
          <duplicates>1</duplicates>
       </id>
       <id val="4">
          <duplicates>0</duplicates>
       </id>
    </output>

**解释:**
- 如果多个案例中存在相同的“id”但“cont”为空,则将视为重复的案例(参考:案例id=3)。
- 如果多个案例中存在相同的“id”但“cont”不为空(例如:1 | 2,2 | 2),则将视为唯一的案例(参考:案例id=4)。
- 如果多个案例中存在相同的“id”以及“cont”值,则将视为重复的案例(参考:案例id=1)。

**关于重复项的进一步解释:**

以下是一个重复项,因为相同的id出现了两次,而cont为空:

    <case id="1" cont="">
        <serial>111</serial>        
    </case>
    <case id="1" cont="">
        <serial>111</serial>
    </case>

    <output>
       <id val="1">
          <duplicates>1</duplicates>
       </id>
    </output>

现在,以下本身不是重复项,因为相同的id可以在多个页面/案例中,因此必须同时存在相同的id和cont值(例如:1 | 2,2 | 2)才能被视为重复项(参考:案例id=4)。

    <case id="1" cont="1 | 2">
    	<serial>111</serial>        
    </case>
    <case id="1" cont="2 | 2">
    	<serial>111</serial>
    </case>

    <output>
       <id val="1">
          <duplicates>0</duplicates>
       </id>
    </output>

上述将被视为唯一

<details>
<summary>英文:</summary>

I have below source XML that I&#39;m able to group by id and count the duplicates:

    &lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;
    &lt;cases&gt;
        &lt;case id=&quot;1&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;        
        &lt;/case&gt;
        &lt;case id=&quot;1&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;2&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;222&lt;/serial&gt;
        &lt;/case&gt;
    &lt;/cases&gt;

**XSLT 1.0**

    &lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
    &lt;xsl:output indent=&quot;yes&quot;/&gt;
    &lt;xsl:strip-space elements=&quot;*&quot;/&gt;
    
    &lt;xsl:key name=&quot;caseKey&quot; match=&quot;case&quot; use=&quot;@id&quot;/&gt;
    
    &lt;xsl:template match=&quot;cases&quot;&gt;
        &lt;output&gt;
            &lt;xsl:apply-templates select=&quot;@*|case[generate-id()=generate-id(key(&#39;caseKey&#39;, @id)[1])]&quot;/&gt;
    	&lt;/output&gt;
    &lt;/xsl:template&gt;
    
    &lt;xsl:template match=&quot;case&quot;&gt;
    	&lt;xsl:element name=&quot;id&quot;&gt;
            &lt;xsl:attribute name=&quot;val&quot;&gt;&lt;xsl:value-of select=&quot;@id&quot;&gt;&lt;/xsl:value-of&gt;&lt;/xsl:attribute&gt;
            &lt;xsl:element name=&quot;duplicates&quot;&gt;
                &lt;xsl:value-of select=&quot;count(key(&#39;caseKey&#39;, @id))-1&quot;&gt;&lt;/xsl:value-of&gt;
            &lt;/xsl:element&gt;
    	&lt;/xsl:element&gt;      
    &lt;/xsl:template&gt;
    &lt;/xsl:stylesheet&gt;

**XSLT 2.0**

    &lt;xsl:stylesheet version=&quot;2.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
    &lt;xsl:output method=&quot;xml&quot; version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; indent=&quot;yes&quot;/&gt;
    &lt;xsl:strip-space elements=&quot;*&quot;/&gt;
    
    &lt;xsl:template match=&quot;/cases&quot;&gt;
        &lt;output&gt;
            &lt;xsl:for-each-group select=&quot;case&quot; group-by=&quot;@id&quot;&gt;
                &lt;xsl:element name=&quot;id&quot;&gt;
                    &lt;xsl:attribute name=&quot;val&quot;&gt;&lt;xsl:value-of select=&quot;@id&quot;&gt;&lt;/xsl:value-of&gt;&lt;/xsl:attribute&gt;
                    &lt;xsl:element name=&quot;duplicates&quot;&gt;
                        &lt;xsl:value-of select=&quot;count(current-group())-1&quot;&gt;&lt;/xsl:value-of&gt;
                    &lt;/xsl:element&gt;
                &lt;/xsl:element&gt; 
            &lt;/xsl:for-each-group&gt;
        &lt;/output&gt;
    &lt;/xsl:template&gt;
    
    &lt;/xsl:stylesheet&gt;
**Output:**

    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
          &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
       &lt;id val=&quot;2&quot;&gt;
          &lt;duplicates&gt;0&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt;

Now, my challenge is that one case can continue in another case and for that `cont` attribute will have values like `1 | 2` and `2 | 2` making that case unique, so far I haven&#39;t take into consideration the `cont` attribute for the key, but now I think I have to:

    &lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;
    &lt;cases&gt;
        &lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;        
        &lt;/case&gt;
        &lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;2&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;222&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;3&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;333&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;3&quot; cont=&quot;&quot;&gt;
            &lt;serial&gt;333&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;        
        &lt;/case&gt;
        &lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;
            &lt;serial&gt;111&lt;/serial&gt;
        &lt;/case&gt;
        &lt;case id=&quot;4&quot; cont=&quot;1 | 2&quot;&gt;
            &lt;serial&gt;444&lt;/serial&gt;        
        &lt;/case&gt;
        &lt;case id=&quot;4&quot; cont=&quot;2 | 2&quot;&gt;
            &lt;serial&gt;444&lt;/serial&gt;
        &lt;/case&gt;
    &lt;/cases&gt;
For above sample XML the expected output should be:

    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
          &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
       &lt;id val=&quot;2&quot;&gt;
          &lt;duplicates&gt;0&lt;/duplicates&gt;
       &lt;/id&gt;
       &lt;id val=&quot;3&quot;&gt;
          &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
       &lt;id val=&quot;4&quot;&gt;
          &lt;duplicates&gt;0&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt;

**Explanation:**
- A case will be considered duplicated if the same `id` is present in multiple cases but `cont` is empty (ref: case id=3)
- A case will be considered unique if the same `id` is present in multiple cases but `cont` is not empty (ex: 1 | 2, 2 | 2) (ref: case id=4)
- A case will be considered duplicated if the same `id` along with `cont` values are present in multiple cases (ref: case id=1)

**Further explanation on duplicates:**

The below is a duplicated because the same id appears two times and cont is blank:

    &lt;case id=&quot;1&quot; cont=&quot;&quot;&gt;
        &lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;&quot;&gt;
        &lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;

    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
          &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt;

 
Now the below itself is not a duplicate because the same id can be in multiple pages/cases, and for that the same id along with cont has to be present:

    &lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;

    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
          &lt;duplicates&gt;0&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt;

 
The above will be considered unique. Now, the above can also be duplicated if the same appears again, like below example:

    &lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;

    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
          &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt; 
For above scenario, even though there are two `&lt;case id=&quot;1&quot; cont=&quot;1 | 2&quot;&gt;` and two `&lt;case id=&quot;1&quot; cont=&quot;2 | 2&quot;&gt;` the count of duplicates at the end is not two because that case id is the same but split in two. See below example:

    (Case id=1 split in 3 pages - below 3 entries are considered only one - The entire block)
    &lt;case id=&quot;1&quot; cont=&quot;1 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;2 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;3 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;
    
    (Duplicated case id=1 same as above - This one (the entire block) is the one that counts as the duplicated)
    &lt;case id=&quot;1&quot; cont=&quot;1 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;        
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;2 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;
    &lt;case id=&quot;1&quot; cont=&quot;3 | 3&quot;&gt;
    	&lt;serial&gt;111&lt;/serial&gt;
    &lt;/case&gt;
    
    &lt;output&gt;
       &lt;id val=&quot;1&quot;&gt;
    	  &lt;duplicates&gt;1&lt;/duplicates&gt;
       &lt;/id&gt;
    &lt;/output&gt;

How can I achieve this in either XSLT 1.0 or XSLT 2.0?

</details>


# 答案1
**得分**: 1

假设您有一致的输入数据(意味着 `cont` 要么是 `&#39;&#39;`,要么是连续的序列 `1 | n`、`2 | n`、..、`n | n`),我认为可以将 `cont` 为空的情况和 `n | n` 的情况分组,使用XSLT 3来实现,示例如下:

在XSLT 3中:

```xml
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    expand-text="yes"
    version="3.0">

  <xsl:mode on-no-match="shallow-skip"/>

  <xsl:output method="xml" indent="yes" />

  <xsl:template match="cases">
    <output>
      <xsl:for-each-group 
          select="case[@cont = '&#39;&#39;' or count(distinct-values(tokenize(@cont, '\s*\|\s*')))]" 
          composite="yes" 
          group-by="if (@cont = '&#39;&#39;') then (@id, '&#39;&#39;') else (@id, tokenize(@cont, '\s*\|\s*')[2])">
        <val id="{@id}">
          <duplicates>{count(current-group()) - 1}</duplicates>
        </val>
      </xsl:for-each-group>
    </output>
  </xsl:template>
  
</xsl:stylesheet>

在XSLT 2中:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="2.0">
   
  <xsl:output method="xml" indent="yes" />

  <xsl:template match="cases">
    <output>
      <xsl:for-each-group 
          select="case[@cont = '&#39;&#39;' or count(distinct-values(tokenize(@cont, '\s*\|\s*')))]" 
          group-by="@id">
        <xsl:for-each-group select="current-group()" group-by="if (@cont = '&#39;&#39;') then '&#39;&#39;' else tokenize(@cont, '\s*\|\s*')[2]">
          <val id="{@id}">
            <duplicates>
              <xsl:value-of select="count(current-group()) - 1"/>                 
            </duplicates>
          </val>           
        </xsl:for-each-group>
      </xsl:for-each-group>
    </output>
  </xsl:template>
  
</xsl:stylesheet>
英文:

Assuming you have consistent input data (meaning the cont is either &#39;&#39; or the are consistent sequences of 1 | n, 2 | n, .., n | n), I would think it suffices to group the cases with cont being empty and the ones where n | n; with XSLT 3 that translates into e.g.

&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
exclude-result-prefixes=&quot;#all&quot;
expand-text=&quot;yes&quot;
version=&quot;3.0&quot;&gt;
&lt;xsl:mode on-no-match=&quot;shallow-skip&quot;/&gt;
&lt;xsl:output method=&quot;xml&quot; indent=&quot;yes&quot; /&gt;
&lt;xsl:template match=&quot;cases&quot;&gt;
&lt;output&gt;
&lt;xsl:for-each-group 
select=&quot;case[@cont = &#39;&#39; or count(distinct-values(tokenize(@cont, &#39;\s*\|\s*&#39;))) = 1]&quot; 
composite=&quot;yes&quot; 
group-by=&quot;if (@cont = &#39;&#39;) then (@id, &#39;&#39;) else (@id, tokenize(@cont, &#39;\s*\|\s*&#39;)[2])&quot;&gt;
&lt;val id=&quot;{@id}&quot;&gt;
&lt;duplicates&gt;{count(current-group()) - 1}&lt;/duplicates&gt;
&lt;/val&gt;
&lt;/xsl:for-each-group&gt;
&lt;/output&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

In XSLT 2:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
exclude-result-prefixes=&quot;#all&quot;
version=&quot;2.0&quot;&gt;
&lt;xsl:output method=&quot;xml&quot; indent=&quot;yes&quot; /&gt;
&lt;xsl:template match=&quot;cases&quot;&gt;
&lt;output&gt;
&lt;xsl:for-each-group 
select=&quot;case[@cont = &#39;&#39; or count(distinct-values(tokenize(@cont, &#39;\s*\|\s*&#39;))) = 1]&quot; 
group-by=&quot;@id&quot;&gt;
&lt;xsl:for-each-group select=&quot;current-group()&quot; group-by=&quot;if (@cont = &#39;&#39;) then &#39;&#39; else tokenize(@cont, &#39;\s*\|\s*&#39;)[2]&quot;&gt;
&lt;val id=&quot;{@id}&quot;&gt;
&lt;duplicates&gt;
&lt;xsl:value-of select=&quot;count(current-group()) - 1&quot;/&gt;                 
&lt;/duplicates&gt;
&lt;/val&gt;           
&lt;/xsl:for-each-group&gt;
&lt;/xsl:for-each-group&gt;
&lt;/output&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

huangapple
  • 本文由 发表于 2023年7月10日 22:37:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654821.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定