XmlNode的SelectNodes方法在查询文本节点时只返回第一个匹配项。

huangapple go评论92阅读模式
英文:

XmlNode SelectNodes returns only first occurrence when querying for text nodes

问题

使用text() XPath函数查询文本节点时遇到奇怪的行为。

我有一个简单的XML,如下所示:

<label>Matrise <foo/><foo/>GM:1<foo/></label>

使用labelNode.SelectNodes("text()")查询将返回两个预期的文本节点 - "Matrise"和"GM:1"。

然而,如果我删除foo节点,仍然会有两个文本节点,但查询不再返回两个节点 - 只返回第一个节点。

如果明显有两个文本节点,为什么在第一种情况下只返回一个文本节点?

代码示例:

using System.Xml;

var xml = "<label>Matrise <foo/><foo/>GM:1<foo/></label>";
var doc = new XmlDocument();
doc.LoadXml(xml);
var labelNode = doc.SelectSingleNode("label");

Console.WriteLine("Before modifications:");
PrintTextNodesInOriginal(labelNode);

Console.WriteLine("\r\nAfter modifications:");
PrintTextNodesInModified(labelNode);


void PrintTextNodesInOriginal(XmlNode labelNode)
{
    PrintTextNodes(labelNode);
}

void PrintTextNodesInModified(XmlNode labelNode)
{
    // Remove foo nodes
    var foos = labelNode
        .SelectNodes("foo")
        .Cast<XmlNode>()
        .ToList();

    foreach (var foo in foos)
    {
        foo.ParentNode.RemoveChild(foo);
    }

    PrintTextNodes(labelNode);
}

List<XmlNode> GetTextNodes(XmlNode labelNode)
{
    return labelNode.SelectNodes("text()")
        .Cast<XmlNode>()
        .ToList();
}

void PrintTextNodes(XmlNode labelNode)
{
    var textNodes = GetTextNodes(labelNode);
    var textNodesActual = labelNode.ChildNodes
        .Cast<XmlNode>()
        .Where(node => node.NodeType == XmlNodeType.Text)
        .ToList();

    Console.WriteLine($"Nodes count using query: {textNodes.Count}");
    Console.WriteLine($"Nodes count actual: {textNodesActual.Count}");
    Console.WriteLine("Nodes' contents:");

    foreach (var textNode in textNodes)
    {
        Console.WriteLine(textNode.InnerText);
    }
}

期望结果:

Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1

After modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1

实际结果:

Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes' contents:
Matrise
GM:1

After modifications:
Nodes count using query: 1
Nodes count actual: 2
Nodes' contents:
Matrise
英文:

Seeing strange behavior when querying for text nodes with the text() XPath function.

I have a simple XML like

&lt;label&gt;Matrise &lt;foo/&gt;&lt;foo/&gt;GM:1&lt;foo/&gt;&lt;/label&gt;

Using labelNode.SelectNodes(&quot;text()&quot;) query will return the 2 expected text nodes - "Matrise" and "GM:1"

However, if I remove the foo nodes then there will still be two text nodes, but the query no longer returns both of them - only the first.

Why is it returning only one text node in the first case if there are clearly two text nodes?

Code example:

using System.Xml;

var xml = &quot;&lt;label&gt;Matrise &lt;foo/&gt;&lt;foo/&gt;GM:1&lt;foo/&gt;&lt;/label&gt;&quot;;
var doc = new XmlDocument();
doc.LoadXml(xml);
var labelNode = doc.SelectSingleNode(&quot;label&quot;);

Console.WriteLine(&quot;Before modifications:&quot;);
PrintTextNodesInOriginal(labelNode);

Console.WriteLine(&quot;\r\nAfter modifications:&quot;);
PrintTextNodesInModified(labelNode);


void PrintTextNodesInOriginal(XmlNode labelNode)
{
    PrintTextNodes(labelNode);
}

void PrintTextNodesInModified(XmlNode labelNode)
{
    // Remove foo nodes
    var foos = labelNode
        .SelectNodes(&quot;foo&quot;)
        .Cast&lt;XmlNode&gt;()
        .ToList();

    foreach (var foo in foos)
    {
        foo.ParentNode.RemoveChild(foo);
    }

    PrintTextNodes(labelNode);
}

List&lt;XmlNode&gt; GetTextNodes(XmlNode labelNode)
{
    return labelNode.SelectNodes(&quot;text()&quot;)
        .Cast&lt;XmlNode&gt;()
        .ToList();
}

void PrintTextNodes(XmlNode labelNode)
{
    var textNodes = GetTextNodes(labelNode);
    var textNodesActual = labelNode.ChildNodes
        .Cast&lt;XmlNode&gt;()
        .Where(node =&gt; node.NodeType == XmlNodeType.Text)
        .ToList();

    Console.WriteLine($&quot;Nodes count using query: {textNodes.Count}&quot;);
    Console.WriteLine($&quot;Nodes count actual: {textNodesActual.Count}&quot;);
    Console.WriteLine(&quot;Nodes&#39; contents:&quot;);

    foreach (var textNode in textNodes)
    {
        Console.WriteLine(textNode.InnerText);
    }
}

Expected results:

Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes&#39; contents:
Matrise
GM:1

After modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes&#39; contents:
Matrise
GM:1

Actual results:

Before modifications:
Nodes count using query: 2
Nodes count actual: 2
Nodes&#39; contents:
Matrise
GM:1

After modifications:
Nodes count using query: 1
Nodes count actual: 2
Nodes&#39; contents:
Matrise

答案1

得分: 1

XPath数据模型不允许元素节点的子节点包含两个相邻的文本节点。因此,尽管底层的树模型(如DOM)可能允许这种情况,但正确的XPath实现会将它们视为合并为一个单一节点。一些XPath实现处理这个问题时存在错误,但看起来这个实现是正确的。

XPath中的这个规则的原因是为了确保如果两个XDM树具有相同的XML序列化,则它们在任何XPath查询方面是等价的。

英文:

The XPath data model does not allow the children of an element node to contain two adjacent text nodes. So while an underlying tree model such as DOM might allow this, a correct XPath implementation will treat them as if they were merged into a single node. Some XPath implementations handle this incorrectly, but it looks as if this one gets it right.

The reason for the rule in XPath is that it tries to ensure that if two XDM trees have the same XML serialization, then they are equivalent in terms of any XPath query.

huangapple
  • 本文由 发表于 2023年8月8日 23:40:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861138.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定