英文:
XmlNode SelectNodes returns only first occurrence when querying for text nodes
问题
使用text()
XPath函数查询文本节点时遇到奇怪的行为。
我有一个简单的XML,如下所示:
<label>Matrise <foo/><foo/>GM:1<foo/></label>
使用labelNode.SelectNodes("text()")
查询将返回两个预期的文本节点 - "Matrise"和"GM:1"。
然而,如果我删除foo
节点,仍然会有两个文本节点,但查询不再返回两个节点 - 只返回第一个节点。
如果明显有两个文本节点,为什么在第一种情况下只返回一个文本节点?
代码示例:
using System.Xml;
var xml = "<label>Matrise <foo/><foo/>GM:1<foo/></label>";
var doc = new XmlDocument();
doc.LoadXml(xml);
var labelNode = doc.SelectSingleNode("label");
Console.WriteLine("Before modifications:");
PrintTextNodesInOriginal(labelNode);
Console.WriteLine("\r\nAfter modifications:");
PrintTextNodesInModified(labelNode);
void PrintTextNodesInOriginal(XmlNode labelNode)
{
PrintTextNodes(labelNode);
}
void PrintTextNodesInModified(XmlNode labelNode)
{
// Remove foo nodes
var foos = labelNode
.SelectNodes("foo")
.Cast<XmlNode>()
.ToList();
foreach (var foo in foos)
{
foo.ParentNode.RemoveChild(foo);
}
PrintTextNodes(labelNode);
}
List<XmlNode> GetTextNodes(XmlNode labelNode)
{
return labelNode.SelectNodes("text()")
.Cast<XmlNode>()
.ToList();
}
void PrintTextNodes(XmlNode labelNode)
{
var textNodes = GetTextNodes(labelNode);
var textNodesActual = labelNode.ChildNodes
.Cast<XmlNode>()
.Where(node => node.NodeType == XmlNodeType.Text)
.ToList();
Console.WriteLine($"Nodes count using query: {textNodes.Count}");
Console.WriteLine($"Nodes count actual: {textNodesActual.Count}");
Console.WriteLine("Nodes' contents:");
foreach (var textNode in textNodes)
{
Console.WriteLine(textNode.InnerText);
}
}
期望结果:
Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
After modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
实际结果:
Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
After modifications:
Nodes count using query: 1
Nodes count actual: 2
Nodes' contents:
Matrise
英文:
Seeing strange behavior when querying for text nodes with the text()
XPath function.
I have a simple XML like
<label>Matrise <foo/><foo/>GM:1<foo/></label>
Using labelNode.SelectNodes("text()")
query will return the 2 expected text nodes - "Matrise" and "GM:1"
However, if I remove the foo
nodes then there will still be two text nodes, but the query no longer returns both of them - only the first.
Why is it returning only one text node in the first case if there are clearly two text nodes?
Code example:
using System.Xml;
var xml = "<label>Matrise <foo/><foo/>GM:1<foo/></label>";
var doc = new XmlDocument();
doc.LoadXml(xml);
var labelNode = doc.SelectSingleNode("label");
Console.WriteLine("Before modifications:");
PrintTextNodesInOriginal(labelNode);
Console.WriteLine("\r\nAfter modifications:");
PrintTextNodesInModified(labelNode);
void PrintTextNodesInOriginal(XmlNode labelNode)
{
PrintTextNodes(labelNode);
}
void PrintTextNodesInModified(XmlNode labelNode)
{
// Remove foo nodes
var foos = labelNode
.SelectNodes("foo")
.Cast<XmlNode>()
.ToList();
foreach (var foo in foos)
{
foo.ParentNode.RemoveChild(foo);
}
PrintTextNodes(labelNode);
}
List<XmlNode> GetTextNodes(XmlNode labelNode)
{
return labelNode.SelectNodes("text()")
.Cast<XmlNode>()
.ToList();
}
void PrintTextNodes(XmlNode labelNode)
{
var textNodes = GetTextNodes(labelNode);
var textNodesActual = labelNode.ChildNodes
.Cast<XmlNode>()
.Where(node => node.NodeType == XmlNodeType.Text)
.ToList();
Console.WriteLine($"Nodes count using query: {textNodes.Count}");
Console.WriteLine($"Nodes count actual: {textNodesActual.Count}");
Console.WriteLine("Nodes' contents:");
foreach (var textNode in textNodes)
{
Console.WriteLine(textNode.InnerText);
}
}
Expected results:
Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
After modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
Actual results:
Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1
After modifications:
Nodes count using query: 1
Nodes count actual: 2
Nodes' contents:
Matrise
答案1
得分: 1
XPath数据模型不允许元素节点的子节点包含两个相邻的文本节点。因此,尽管底层的树模型(如DOM)可能允许这种情况,但正确的XPath实现会将它们视为合并为一个单一节点。一些XPath实现处理这个问题时存在错误,但看起来这个实现是正确的。
XPath中的这个规则的原因是为了确保如果两个XDM树具有相同的XML序列化,则它们在任何XPath查询方面是等价的。
英文:
The XPath data model does not allow the children of an element node to contain two adjacent text nodes. So while an underlying tree model such as DOM might allow this, a correct XPath implementation will treat them as if they were merged into a single node. Some XPath implementations handle this incorrectly, but it looks as if this one gets it right.
The reason for the rule in XPath is that it tries to ensure that if two XDM trees have the same XML serialization, then they are equivalent in terms of any XPath query.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论