
huangapple go评论107阅读模式

XmlNode SelectNodes returns only first occurrence when querying for text nodes


使用text() XPath函数查询文本节点时遇到奇怪的行为。


  1. <label>Matrise <foo/><foo/>GM:1<foo/></label>

使用labelNode.SelectNodes("text()")查询将返回两个预期的文本节点 - "Matrise"和"GM:1"。

然而,如果我删除foo节点,仍然会有两个文本节点,但查询不再返回两个节点 - 只返回第一个节点。



  1. using System.Xml;
  2. var xml = "<label>Matrise <foo/><foo/>GM:1<foo/></label>";
  3. var doc = new XmlDocument();
  4. doc.LoadXml(xml);
  5. var labelNode = doc.SelectSingleNode("label");
  6. Console.WriteLine("Before modifications:");
  7. PrintTextNodesInOriginal(labelNode);
  8. Console.WriteLine("\r\nAfter modifications:");
  9. PrintTextNodesInModified(labelNode);
  10. void PrintTextNodesInOriginal(XmlNode labelNode)
  11. {
  12. PrintTextNodes(labelNode);
  13. }
  14. void PrintTextNodesInModified(XmlNode labelNode)
  15. {
  16. // Remove foo nodes
  17. var foos = labelNode
  18. .SelectNodes("foo")
  19. .Cast<XmlNode>()
  20. .ToList();
  21. foreach (var foo in foos)
  22. {
  23. foo.ParentNode.RemoveChild(foo);
  24. }
  25. PrintTextNodes(labelNode);
  26. }
  27. List<XmlNode> GetTextNodes(XmlNode labelNode)
  28. {
  29. return labelNode.SelectNodes("text()")
  30. .Cast<XmlNode>()
  31. .ToList();
  32. }
  33. void PrintTextNodes(XmlNode labelNode)
  34. {
  35. var textNodes = GetTextNodes(labelNode);
  36. var textNodesActual = labelNode.ChildNodes
  37. .Cast<XmlNode>()
  38. .Where(node => node.NodeType == XmlNodeType.Text)
  39. .ToList();
  40. Console.WriteLine($"Nodes count using query: {textNodes.Count}");
  41. Console.WriteLine($"Nodes count actual: {textNodesActual.Count}");
  42. Console.WriteLine("Nodes' contents:");
  43. foreach (var textNode in textNodes)
  44. {
  45. Console.WriteLine(textNode.InnerText);
  46. }
  47. }


  1. Before modifications:
  2. Nodes count using query: 2
  3. Nodes count actual: 2
  4. Nodes' contents:
  5. Matrise
  6. GM:1
  7. After modifications:
  8. Nodes count using query: 2
  9. Nodes count actual: 2
  10. Nodes' contents:
  11. Matrise
  12. GM:1


  1. Before modifications:
  2. Nodes count using query: 2
  3. Nodes count actual: 2
  4. Nodes' contents:
  5. Matrise
  6. GM:1
  7. After modifications:
  8. Nodes count using query: 1
  9. Nodes count actual: 2
  10. Nodes' contents:
  11. Matrise

Seeing strange behavior when querying for text nodes with the text() XPath function.

I have a simple XML like

  1. &lt;label&gt;Matrise &lt;foo/&gt;&lt;foo/&gt;GM:1&lt;foo/&gt;&lt;/label&gt;

Using labelNode.SelectNodes(&quot;text()&quot;) query will return the 2 expected text nodes - "Matrise" and "GM:1"

However, if I remove the foo nodes then there will still be two text nodes, but the query no longer returns both of them - only the first.

Why is it returning only one text node in the first case if there are clearly two text nodes?

Code example:

  1. using System.Xml;
  2. var xml = &quot;&lt;label&gt;Matrise &lt;foo/&gt;&lt;foo/&gt;GM:1&lt;foo/&gt;&lt;/label&gt;&quot;;
  3. var doc = new XmlDocument();
  4. doc.LoadXml(xml);
  5. var labelNode = doc.SelectSingleNode(&quot;label&quot;);
  6. Console.WriteLine(&quot;Before modifications:&quot;);
  7. PrintTextNodesInOriginal(labelNode);
  8. Console.WriteLine(&quot;\r\nAfter modifications:&quot;);
  9. PrintTextNodesInModified(labelNode);
  10. void PrintTextNodesInOriginal(XmlNode labelNode)
  11. {
  12. PrintTextNodes(labelNode);
  13. }
  14. void PrintTextNodesInModified(XmlNode labelNode)
  15. {
  16. // Remove foo nodes
  17. var foos = labelNode
  18. .SelectNodes(&quot;foo&quot;)
  19. .Cast&lt;XmlNode&gt;()
  20. .ToList();
  21. foreach (var foo in foos)
  22. {
  23. foo.ParentNode.RemoveChild(foo);
  24. }
  25. PrintTextNodes(labelNode);
  26. }
  27. List&lt;XmlNode&gt; GetTextNodes(XmlNode labelNode)
  28. {
  29. return labelNode.SelectNodes(&quot;text()&quot;)
  30. .Cast&lt;XmlNode&gt;()
  31. .ToList();
  32. }
  33. void PrintTextNodes(XmlNode labelNode)
  34. {
  35. var textNodes = GetTextNodes(labelNode);
  36. var textNodesActual = labelNode.ChildNodes
  37. .Cast&lt;XmlNode&gt;()
  38. .Where(node =&gt; node.NodeType == XmlNodeType.Text)
  39. .ToList();
  40. Console.WriteLine($&quot;Nodes count using query: {textNodes.Count}&quot;);
  41. Console.WriteLine($&quot;Nodes count actual: {textNodesActual.Count}&quot;);
  42. Console.WriteLine(&quot;Nodes&#39; contents:&quot;);
  43. foreach (var textNode in textNodes)
  44. {
  45. Console.WriteLine(textNode.InnerText);
  46. }
  47. }

Expected results:

  1. Before modifications:
  2. Nodes count using query: 2
  3. Nodes count actual: 2
  4. Nodes&#39; contents:
  5. Matrise
  6. GM:1
  7. After modifications:
  8. Nodes count using query: 2
  9. Nodes count actual: 2
  10. Nodes&#39; contents:
  11. Matrise
  12. GM:1

Actual results:

  1. Before modifications:
  2. Nodes count using query: 2
  3. Nodes count actual: 2
  4. Nodes&#39; contents:
  5. Matrise
  6. GM:1
  7. After modifications:
  8. Nodes count using query: 1
  9. Nodes count actual: 2
  10. Nodes&#39; contents:
  11. Matrise


得分: 1




The XPath data model does not allow the children of an element node to contain two adjacent text nodes. So while an underlying tree model such as DOM might allow this, a correct XPath implementation will treat them as if they were merged into a single node. Some XPath implementations handle this incorrectly, but it looks as if this one gets it right.

The reason for the rule in XPath is that it tries to ensure that if two XDM trees have the same XML serialization, then they are equivalent in terms of any XPath query.

  • 本文由 发表于 2023年8月8日 23:40:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861138.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
