pandas.read_xml() 意外行为

huangapple go评论69阅读模式
英文:

pandas.read_xml() unexpected behaviour

问题

我正在尝试理解为什么这段代码:

import pandas

xml = '''
<ROOT>
  <ELEM atr="anything">1</ELEM>
  <ELEM atr="anything">2</ELEM>
  <ELEM atr="anything">3</ELEM>
  <ELEM atr="anything">4</ELEM>
  <ELEM atr="anything">5</ELEM>
  <ELEM atr="anything">6</ELEM>
  <ELEM atr="anything">7</ELEM>
  <ELEM atr="anything">8</ELEM>
  <ELEM atr="anything">9</ELEM>
  <ELEM atr="anything">10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())

...按预期工作并打印:

        atr  ELEM
0  anything     1
1  anything     2
2  anything     3
3  anything     4
4  anything     5
5  anything     6
6  anything     7
7  anything     8
8  anything     9
9  anything    10

然而,以下代码:

import pandas

xml = '''
<ROOT>
  <ELEM>1</ELEM>
  <ELEM>2</ELEM>
  <ELEM>3</ELEM>
  <ELEM>4</ELEM>
  <ELEM>5</ELEM>
  <ELEM>6</ELEM>
  <ELEM>7</ELEM>
  <ELEM>8</ELEM>
  <ELEM>9</ELEM>
  <ELEM>10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())

导致错误:

ValueError: xpath does not return any nodes or attributes. Be sure to
specify in `xpath` the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.

我已经阅读了这里的文档:
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

并且在这里检查了我的xpath(上面的代码只是一个最小的示例,我使用的实际XML更复杂):
https://freeonlineformatter.com/xpath-validator/

简而言之,我需要将一个已知xpath的XML子元素列表读入pandas数据框中。子元素没有属性,但都有文本值。我想要一个包含这些值的数据框列。我做错了什么?

英文:

I am trying to understand why the code:

import pandas

xml = '''
<ROOT>
  <ELEM atr="anything">1</ELEM>
  <ELEM atr="anything">2</ELEM>
  <ELEM atr="anything">3</ELEM>
  <ELEM atr="anything">4</ELEM>
  <ELEM atr="anything">5</ELEM>
  <ELEM atr="anything">6</ELEM>
  <ELEM atr="anything">7</ELEM>
  <ELEM atr="anything">8</ELEM>
  <ELEM atr="anything">9</ELEM>
  <ELEM atr="anything">10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())

... works as expected and prints:

<pre>
atr ELEM
0 anything 1
1 anything 2
2 anything 3
3 anything 4
4 anything 5
5 anything 6
6 anything 7
7 anything 8
8 anything 9
9 anything 10
</pre>

Yet the following code:

import pandas

xml = &#39;&#39;&#39;
&lt;ROOT&gt;
  &lt;ELEM&gt;1&lt;/ELEM&gt;
  &lt;ELEM&gt;2&lt;/ELEM&gt;
  &lt;ELEM&gt;3&lt;/ELEM&gt;
  &lt;ELEM&gt;4&lt;/ELEM&gt;
  &lt;ELEM&gt;5&lt;/ELEM&gt;
  &lt;ELEM&gt;6&lt;/ELEM&gt;
  &lt;ELEM&gt;7&lt;/ELEM&gt;
  &lt;ELEM&gt;8&lt;/ELEM&gt;
  &lt;ELEM&gt;9&lt;/ELEM&gt;
  &lt;ELEM&gt;10&lt;/ELEM&gt;
&lt;/ROOT&gt;
&#39;&#39;&#39;
df = pandas.read_xml(xml, xpath=&#39;/ROOT/ELEM&#39;)
print(df.to_string())

results in the error:

<pre>ValueError: xpath does not return any nodes or attributes. Be sure to
specify in xpath the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.</pre>

I have read the documentation here:
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

And also checked my xpath here (code above is just a minimal example, actual XML I use is more complex):
https://freeonlineformatter.com/xpath-validator/

In a nutshell I need to read into pandas dataframe a list of XML child elements at a known xpath. Child elements have no attributes but all have text values. I want to get a dataframe with one column containing these valyes. What am I doing wrong?

答案1

得分: 1

如果您查看文档,pandas 期望 XML 具有带有列的行。在您的第一个示例中,每个 &lt;ELEM&gt; 是一行,而 atr 是列。在您的第二个示例中,没有列。如果您有 &lt;ELEM&gt;&lt;VAL&gt;1&lt;/VAL&gt;&lt;/ELEM&gt;,它应该可以工作,因为 VAL 将成为列。

英文:

If you check the documentation, pandas expects the XML to have rows with columns. In your first example, each &lt;ELEM&gt; is a row, and the atr is the column. In your second example, there are no columns. If you had &lt;ELEM&gt;&lt;VAL&gt;1&lt;/VAL&gt;&lt;/ELEM&gt;, it should work, because VAL would be the column.

https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

huangapple
  • 本文由 发表于 2023年6月1日 06:06:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377578.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定