2023年6月1日 06:06:09go评论74阅读模式

英文:

pandas.read_xml() unexpected behaviour

问题

我正在尝试理解为什么这段代码：

import pandas

xml = &#39;&#39;&#39;
&lt;ROOT&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;1&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;2&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;3&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;4&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;5&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;6&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;7&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;8&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;9&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;10&lt;/ELEM&gt;
&lt;/ROOT&gt;
&#39;&#39;&#39;
df = pandas.read_xml(xml, xpath=&#39;/ROOT/ELEM&#39;)
print(df.to_string())

...按预期工作并打印：

        atr  ELEM
0  anything     1
1  anything     2
2  anything     3
3  anything     4
4  anything     5
5  anything     6
6  anything     7
7  anything     8
8  anything     9
9  anything    10

然而，以下代码：

import pandas

xml = &#39;&#39;&#39;
&lt;ROOT&gt;
  &lt;ELEM&gt;1&lt;/ELEM&gt;
  &lt;ELEM&gt;2&lt;/ELEM&gt;
  &lt;ELEM&gt;3&lt;/ELEM&gt;
  &lt;ELEM&gt;4&lt;/ELEM&gt;
  &lt;ELEM&gt;5&lt;/ELEM&gt;
  &lt;ELEM&gt;6&lt;/ELEM&gt;
  &lt;ELEM&gt;7&lt;/ELEM&gt;
  &lt;ELEM&gt;8&lt;/ELEM&gt;
  &lt;ELEM&gt;9&lt;/ELEM&gt;
  &lt;ELEM&gt;10&lt;/ELEM&gt;
&lt;/ROOT&gt;
&#39;&#39;&#39;
df = pandas.read_xml(xml, xpath=&#39;/ROOT/ELEM&#39;)
print(df.to_string())

导致错误：

ValueError: xpath does not return any nodes or attributes. Be sure to
specify in `xpath` the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.

我已经阅读了这里的文档：
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

并且在这里检查了我的xpath（上面的代码只是一个最小的示例，我使用的实际XML更复杂）：
https://freeonlineformatter.com/xpath-validator/

简而言之，我需要将一个已知xpath的XML子元素列表读入pandas数据框中。子元素没有属性，但都有文本值。我想要一个包含这些值的数据框列。我做错了什么？

英文:

I am trying to understand why the code:

import pandas

xml = &#39;&#39;&#39;
&lt;ROOT&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;1&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;2&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;3&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;4&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;5&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;6&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;7&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;8&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;9&lt;/ELEM&gt;
  &lt;ELEM atr=&quot;anything&quot;&gt;10&lt;/ELEM&gt;
&lt;/ROOT&gt;
&#39;&#39;&#39;
df = pandas.read_xml(xml, xpath=&#39;/ROOT/ELEM&#39;)
print(df.to_string())

... works as expected and prints:

<pre>
atr ELEM
0 anything 1
1 anything 2
2 anything 3
3 anything 4
4 anything 5
5 anything 6
6 anything 7
7 anything 8
8 anything 9
9 anything 10
</pre>

Yet the following code:

import pandas

xml = &#39;&#39;&#39;
&lt;ROOT&gt;
  &lt;ELEM&gt;1&lt;/ELEM&gt;
  &lt;ELEM&gt;2&lt;/ELEM&gt;
  &lt;ELEM&gt;3&lt;/ELEM&gt;
  &lt;ELEM&gt;4&lt;/ELEM&gt;
  &lt;ELEM&gt;5&lt;/ELEM&gt;
  &lt;ELEM&gt;6&lt;/ELEM&gt;
  &lt;ELEM&gt;7&lt;/ELEM&gt;
  &lt;ELEM&gt;8&lt;/ELEM&gt;
  &lt;ELEM&gt;9&lt;/ELEM&gt;
  &lt;ELEM&gt;10&lt;/ELEM&gt;
&lt;/ROOT&gt;
&#39;&#39;&#39;
df = pandas.read_xml(xml, xpath=&#39;/ROOT/ELEM&#39;)
print(df.to_string())

results in the error:

<pre>ValueError: xpath does not return any nodes or attributes. Be sure to
specify in xpath the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.</pre>

I have read the documentation here:
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

And also checked my xpath here (code above is just a minimal example, actual XML I use is more complex):
https://freeonlineformatter.com/xpath-validator/

In a nutshell I need to read into pandas dataframe a list of XML child elements at a known xpath. Child elements have no attributes but all have text values. I want to get a dataframe with one column containing these valyes. What am I doing wrong?

答案1

得分: 1

如果您查看文档，pandas 期望 XML 具有带有列的行。在您的第一个示例中，每个 <ELEM> 是一行，而 atr 是列。在您的第二个示例中，没有列。如果您有 <ELEM><VAL>1</VAL></ELEM>，它应该可以工作，因为 VAL 将成为列。

英文:

If you check the documentation, pandas expects the XML to have rows with columns. In your first example, each <ELEM> is a row, and the atr is the column. In your second example, there are no columns. If you had <ELEM><VAL>1</VAL></ELEM>, it should work, because VAL would be the column.

https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas.read_xml() 意外行为

问题

答案1

使用Matplotlib中的FuncAnimation来实现标签动画。

在我的Django异步视图中，任务未被执行。

如何迭代具有与当前元素相同名称的XML子元素并在迭代中避免当前元素？

为什么我的Pygame程序中”Screen Fill”没有移除”Blit”？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论