英文:
grouping by id and counting duplicates with XSLT
问题
I can help you with the translation of the provided XML and code into Chinese. Here's the translated content:
我有以下源XML,我能够按ID分组并计算重复项的数量:
<?xml version="1.0" encoding="utf-8"?>
<cases>
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="2" cont="">
<serial>222</serial>
</case>
</cases>
**XSLT 1.0**
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="caseKey" match="case" use="@id"/>
<xsl:template match="cases">
<output>
<xsl:apply-templates select="@*|case[generate-id()=generate-id(key('caseKey', @id)[1])]"/>
</output>
</xsl:template>
<xsl:template match="case">
<xsl:element name="id">
<xsl:attribute name="val"><xsl:value-of select="@id"/></xsl:attribute>
<xsl:element name="duplicates">
<xsl:value-of select="count(key('caseKey', @id))-1"/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
**XSLT 2.0**
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/cases">
<output>
<xsl:for-each-group select="case" group-by="@id">
<xsl:element name="id">
<xsl:attribute name="val"><xsl:value-of select="@id"/></xsl:attribute>
<xsl:element name="duplicates">
<xsl:value-of select="count(current-group())-1"/>
</xsl:element>
</xsl:element>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
**输出:**
<?xml version="1.0" encoding="UTF-8"?>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
<id val="2">
<duplicates>0</duplicates>
</id>
</output>
现在,我的挑战是一个案例可以在另一个案例中继续,并且为此“cont”属性将具有值,如“1 | 2”和“2 | 2”,使该案例成为唯一的,到目前为止,我尚未考虑“cont”属性用于键,但现在我认为我必须:
<?xml version="1.0" encoding="utf-8"?>
<cases>
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<case id="2" cont="">
<serial>222</serial>
</case>
<case id="3" cont="">
<serial>333</serial>
</case>
<case id="3" cont="">
<serial>333</serial>
</case>
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<case id="4" cont="1 | 2">
<serial>444</serial>
</case>
<case id="4" cont="2 | 2">
<serial>444</serial>
</case>
</cases>
对于上面的示例XML,预期输出应为:
<?xml version="1.0" encoding="UTF-8"?>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
<id val="2">
<duplicates>0</duplicates>
</id>
<id val="3">
<duplicates>1</duplicates>
</id>
<id val="4">
<duplicates>0</duplicates>
</id>
</output>
**解释:**
- 如果多个案例中存在相同的“id”但“cont”为空,则将视为重复的案例(参考:案例id=3)。
- 如果多个案例中存在相同的“id”但“cont”不为空(例如:1 | 2,2 | 2),则将视为唯一的案例(参考:案例id=4)。
- 如果多个案例中存在相同的“id”以及“cont”值,则将视为重复的案例(参考:案例id=1)。
**关于重复项的进一步解释:**
以下是一个重复项,因为相同的id出现了两次,而cont为空:
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="1" cont="">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
</output>
现在,以下本身不是重复项,因为相同的id可以在多个页面/案例中,因此必须同时存在相同的id和cont值(例如:1 | 2,2 | 2)才能被视为重复项(参考:案例id=4)。
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>0</duplicates>
</id>
</output>
上述将被视为唯一
<details>
<summary>英文:</summary>
I have below source XML that I'm able to group by id and count the duplicates:
<?xml version="1.0" encoding="utf-8"?>
<cases>
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="2" cont="">
<serial>222</serial>
</case>
</cases>
**XSLT 1.0**
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="caseKey" match="case" use="@id"/>
<xsl:template match="cases">
<output>
<xsl:apply-templates select="@*|case[generate-id()=generate-id(key('caseKey', @id)[1])]"/>
</output>
</xsl:template>
<xsl:template match="case">
<xsl:element name="id">
<xsl:attribute name="val"><xsl:value-of select="@id"></xsl:value-of></xsl:attribute>
<xsl:element name="duplicates">
<xsl:value-of select="count(key('caseKey', @id))-1"></xsl:value-of>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
**XSLT 2.0**
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/cases">
<output>
<xsl:for-each-group select="case" group-by="@id">
<xsl:element name="id">
<xsl:attribute name="val"><xsl:value-of select="@id"></xsl:value-of></xsl:attribute>
<xsl:element name="duplicates">
<xsl:value-of select="count(current-group())-1"></xsl:value-of>
</xsl:element>
</xsl:element>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
**Output:**
<?xml version="1.0" encoding="UTF-8"?>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
<id val="2">
<duplicates>0</duplicates>
</id>
</output>
Now, my challenge is that one case can continue in another case and for that `cont` attribute will have values like `1 | 2` and `2 | 2` making that case unique, so far I haven't take into consideration the `cont` attribute for the key, but now I think I have to:
<?xml version="1.0" encoding="utf-8"?>
<cases>
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<case id="2" cont="">
<serial>222</serial>
</case>
<case id="3" cont="">
<serial>333</serial>
</case>
<case id="3" cont="">
<serial>333</serial>
</case>
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<case id="4" cont="1 | 2">
<serial>444</serial>
</case>
<case id="4" cont="2 | 2">
<serial>444</serial>
</case>
</cases>
For above sample XML the expected output should be:
<?xml version="1.0" encoding="UTF-8"?>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
<id val="2">
<duplicates>0</duplicates>
</id>
<id val="3">
<duplicates>1</duplicates>
</id>
<id val="4">
<duplicates>0</duplicates>
</id>
</output>
**Explanation:**
- A case will be considered duplicated if the same `id` is present in multiple cases but `cont` is empty (ref: case id=3)
- A case will be considered unique if the same `id` is present in multiple cases but `cont` is not empty (ex: 1 | 2, 2 | 2) (ref: case id=4)
- A case will be considered duplicated if the same `id` along with `cont` values are present in multiple cases (ref: case id=1)
**Further explanation on duplicates:**
The below is a duplicated because the same id appears two times and cont is blank:
<case id="1" cont="">
<serial>111</serial>
</case>
<case id="1" cont="">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
</output>
Now the below itself is not a duplicate because the same id can be in multiple pages/cases, and for that the same id along with cont has to be present:
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>0</duplicates>
</id>
</output>
The above will be considered unique. Now, the above can also be duplicated if the same appears again, like below example:
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<case id="1" cont="1 | 2">
<serial>111</serial>
</case>
<case id="1" cont="2 | 2">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
</output>
For above scenario, even though there are two `<case id="1" cont="1 | 2">` and two `<case id="1" cont="2 | 2">` the count of duplicates at the end is not two because that case id is the same but split in two. See below example:
(Case id=1 split in 3 pages - below 3 entries are considered only one - The entire block)
<case id="1" cont="1 | 3">
<serial>111</serial>
</case>
<case id="1" cont="2 | 3">
<serial>111</serial>
</case>
<case id="1" cont="3 | 3">
<serial>111</serial>
</case>
(Duplicated case id=1 same as above - This one (the entire block) is the one that counts as the duplicated)
<case id="1" cont="1 | 3">
<serial>111</serial>
</case>
<case id="1" cont="2 | 3">
<serial>111</serial>
</case>
<case id="1" cont="3 | 3">
<serial>111</serial>
</case>
<output>
<id val="1">
<duplicates>1</duplicates>
</id>
</output>
How can I achieve this in either XSLT 1.0 or XSLT 2.0?
</details>
# 答案1
**得分**: 1
假设您有一致的输入数据(意味着 `cont` 要么是 `''`,要么是连续的序列 `1 | n`、`2 | n`、..、`n | n`),我认为可以将 `cont` 为空的情况和 `n | n` 的情况分组,使用XSLT 3来实现,示例如下:
在XSLT 3中:
```xml
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:mode on-no-match="shallow-skip"/>
<xsl:output method="xml" indent="yes" />
<xsl:template match="cases">
<output>
<xsl:for-each-group
select="case[@cont = '''' or count(distinct-values(tokenize(@cont, '\s*\|\s*')))]"
composite="yes"
group-by="if (@cont = '''') then (@id, '''') else (@id, tokenize(@cont, '\s*\|\s*')[2])">
<val id="{@id}">
<duplicates>{count(current-group()) - 1}</duplicates>
</val>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
在XSLT 2中:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="cases">
<output>
<xsl:for-each-group
select="case[@cont = '''' or count(distinct-values(tokenize(@cont, '\s*\|\s*')))]"
group-by="@id">
<xsl:for-each-group select="current-group()" group-by="if (@cont = '''') then '''' else tokenize(@cont, '\s*\|\s*')[2]">
<val id="{@id}">
<duplicates>
<xsl:value-of select="count(current-group()) - 1"/>
</duplicates>
</val>
</xsl:for-each-group>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
英文:
Assuming you have consistent input data (meaning the cont
is either ''
or the are consistent sequences of 1 | n
, 2 | n
, .., n | n
), I would think it suffices to group the case
s with cont
being empty and the ones where n | n
; with XSLT 3 that translates into e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:mode on-no-match="shallow-skip"/>
<xsl:output method="xml" indent="yes" />
<xsl:template match="cases">
<output>
<xsl:for-each-group
select="case[@cont = '' or count(distinct-values(tokenize(@cont, '\s*\|\s*'))) = 1]"
composite="yes"
group-by="if (@cont = '') then (@id, '') else (@id, tokenize(@cont, '\s*\|\s*')[2])">
<val id="{@id}">
<duplicates>{count(current-group()) - 1}</duplicates>
</val>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
In XSLT 2:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="cases">
<output>
<xsl:for-each-group
select="case[@cont = '' or count(distinct-values(tokenize(@cont, '\s*\|\s*'))) = 1]"
group-by="@id">
<xsl:for-each-group select="current-group()" group-by="if (@cont = '') then '' else tokenize(@cont, '\s*\|\s*')[2]">
<val id="{@id}">
<duplicates>
<xsl:value-of select="count(current-group()) - 1"/>
</duplicates>
</val>
</xsl:for-each-group>
</xsl:for-each-group>
</output>
</xsl:template>
</xsl:stylesheet>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论