英文:
How preceding-sibling works in XPath and Python? It seems to display wrong output
问题
The error in your code is related to the XPath expression you are using to select the preceding sibling elements. In your current XPath expression, you are selecting only the immediate preceding sibling elements of the <T>
element with R1='ABC3'
, but you want to select all preceding siblings of the <X>
element.
To achieve the expected output, you should modify your XPath expression. Here's the corrected code:
from lxml import etree
tree = etree.parse('test.xml')
for i in tree.xpath("//X/Z/T[R1='ABC3']/ancestor::X/preceding-sibling::*"):
print(i.tag, " - ", i.text)
With this modification, the code will select all preceding sibling elements of the <X>
element that contains the <Z>
element with a <T>
element where R1='ABC3'
. This should produce the desired output:
Y1 - ABC1
Y2 - ABC2
Y1 - ABC7
Y2 - ABC8
Now, it will correctly print all preceding siblings as expected.
英文:
For the XML data
<X>
<Y1>ABC1</Y1>
<Y2>ABC2</Y2>
<Z>
<T>
<R1>ABC3</R1>
<R2>ABC4</R2>
</T>
<T>
<R1>ABC5</R1>
<R2>ABC6</R2>
</T>
</Z>
<Y1>ABC7</Y1>
<Y2>ABC8</Y2>
<Z>
<T>
<R1>ABC3</R1>
<R2>ABC9</R2>
</T>
<T>
<R1>ABC5</R1>
<R2>ABC9</R2>
</T>
</Z>
</X>
I wrote a sample python file like the below.
from lxml import etree
tree = etree.parse('test.xml')
for i in tree.xpath("//X/Z/T[R1='ABC3']/parent::*/preceding-sibling::*"):
print(i.tag, " - ", i.text)
I expected output like
Y1 - ABC1
Y2 - ABC2
Y1 - ABC1
Y2 - ABC2
Z -
Y1 - ABC7
Y2 - ABC8
but received one like
Y1 - ABC1
Y2 - ABC2
Z -
Y1 - ABC7
Y2 - ABC8
It should print all preceding sibling. For 1st match of "R1=ABC3",it should print Y1 and Y2. For 2nd match of "R1=ABC", it should print the 5 siblings. Total 7 elements should be printed.
What is the error here?
答案1
得分: 1
XPath 1.0 中有一个节点集的概念,其中每个 /
步骤基于节点标识消除重复,因此像你使用的单个 XPath 表达式不会返回包含相同节点两次的集合,任何重复的节点都会被消除。
在 XPath 2.0 中,虽然 /
步骤操作符仍然具有相同的重复消除语义,但有一个更一般化的序列概念,可以使用 for .. return
(for $p in //X/Z/T[R1='ABC3']/parent::* return $p/preceding-sibling::*
) 或在 XPath 3.1 中使用 !
(//X/Z/T[R1='ABC3']/parent::*!preceding-sibling::*
) 来包含重复项,详见 https://xqueryfiddle.liberty-development.net/eiZQFoV。
在 XPath 1.0 中,你需要在宿主语言的循环中使用多个 XPath 评估(例如 Python),或者在 Python 的情况下,你可以使用列表推导式 element_list = [el for parent in tree.xpath("//X/Z/T[R1='ABC3']/parent::*") for el in parent.xpath("preceding-sibling::*")]
。
英文:
XPath 1.0 has a concept of node-sets where each /
step eliminates duplicates based on node identity so a single XPath expression as you have used will not give a set that contains the same node twice, any duplicates are eliminated.
In XPath 2.0, while of course the /
step operator continues to have the same duplicate elimination semantics, there is a more generalized concept of sequences using for .. return
(for $p in //X/Z/T[R1='ABC3']/parent::* return $p/preceding-sibling::*
) or in XPath 3.1 !
(//X/Z/T[R1='ABC3']/parent::*!preceding-sibling::*
) that would allow you to include duplicates, see https://xqueryfiddle.liberty-development.net/eiZQFoV.
In XPath 1.0 you would need to use several XPath evaluations in a loop of the host language (e.g. Python) or in the case of Python you could use list comprehensions element_list = [el for parent in tree.xpath("//X/Z/T[R1='ABC3']/parent::*") for el in parent.xpath("preceding-sibling::*")]
.
答案2
得分: 1
问题标记为 xslt
,但您没有使用 XSLT。可以使用以下样式表来实现预期输出:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/X">
<xsl:for-each select="Z[T/R1='ABC3']">
<xsl:for-each select="preceding-sibling::*">
<xsl:value-of select="name()" />
<xsl:text> - </xsl:text>
<xsl:value-of select="text()" />
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
正如 Martin Honnen 的回答中所指出的,需要分别处理每个匹配节点的前面兄弟节点,以获取两个单独的列表。
还请注意,您的表达式:
Z/T[R1='ABC3']/parent::*
是不必要的复杂:显然,匹配的 T
的父节点必须是 Z
- 因此,您可以简单地写成:
Z[T/R1='ABC3']
英文:
The question is tagged xslt
, but you're not using XSLT. The expected output can be achieved using the following stylesheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/X">
<xsl:for-each select="Z[T/R1='ABC3']">
<xsl:for-each select="preceding-sibling::*">
<xsl:value-of select="name()" />
<xsl:text> - </xsl:text>
<xsl:value-of select="text()" />
<xsl:text>&#10;</xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
As noted in the answer by Martin Honnen, it is necessary to process the preceding siblings of each matched node separately, in order to get two separate lists.
Note also that your expression:
Z/T[R1='ABC3']/parent::*
is unnecessarily convoluted: clearly, the parent of the matched T
must be Z
- so you can write simply:
Z[T/R1='ABC3']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论