英文:
pandas.read_xml() unexpected behaviour
问题
我正在尝试理解为什么这段代码:
import pandas
xml = '''
<ROOT>
<ELEM atr="anything">1</ELEM>
<ELEM atr="anything">2</ELEM>
<ELEM atr="anything">3</ELEM>
<ELEM atr="anything">4</ELEM>
<ELEM atr="anything">5</ELEM>
<ELEM atr="anything">6</ELEM>
<ELEM atr="anything">7</ELEM>
<ELEM atr="anything">8</ELEM>
<ELEM atr="anything">9</ELEM>
<ELEM atr="anything">10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
...按预期工作并打印:
atr ELEM
0 anything 1
1 anything 2
2 anything 3
3 anything 4
4 anything 5
5 anything 6
6 anything 7
7 anything 8
8 anything 9
9 anything 10
然而,以下代码:
import pandas
xml = '''
<ROOT>
<ELEM>1</ELEM>
<ELEM>2</ELEM>
<ELEM>3</ELEM>
<ELEM>4</ELEM>
<ELEM>5</ELEM>
<ELEM>6</ELEM>
<ELEM>7</ELEM>
<ELEM>8</ELEM>
<ELEM>9</ELEM>
<ELEM>10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
导致错误:
ValueError: xpath does not return any nodes or attributes. Be sure to
specify in `xpath` the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.
我已经阅读了这里的文档:
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
并且在这里检查了我的xpath(上面的代码只是一个最小的示例,我使用的实际XML更复杂):
https://freeonlineformatter.com/xpath-validator/
简而言之,我需要将一个已知xpath的XML子元素列表读入pandas数据框中。子元素没有属性,但都有文本值。我想要一个包含这些值的数据框列。我做错了什么?
英文:
I am trying to understand why the code:
import pandas
xml = '''
<ROOT>
<ELEM atr="anything">1</ELEM>
<ELEM atr="anything">2</ELEM>
<ELEM atr="anything">3</ELEM>
<ELEM atr="anything">4</ELEM>
<ELEM atr="anything">5</ELEM>
<ELEM atr="anything">6</ELEM>
<ELEM atr="anything">7</ELEM>
<ELEM atr="anything">8</ELEM>
<ELEM atr="anything">9</ELEM>
<ELEM atr="anything">10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
... works as expected and prints:
<pre>
atr ELEM
0 anything 1
1 anything 2
2 anything 3
3 anything 4
4 anything 5
5 anything 6
6 anything 7
7 anything 8
8 anything 9
9 anything 10
</pre>
Yet the following code:
import pandas
xml = '''
<ROOT>
<ELEM>1</ELEM>
<ELEM>2</ELEM>
<ELEM>3</ELEM>
<ELEM>4</ELEM>
<ELEM>5</ELEM>
<ELEM>6</ELEM>
<ELEM>7</ELEM>
<ELEM>8</ELEM>
<ELEM>9</ELEM>
<ELEM>10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
results in the error:
<pre>ValueError: xpath does not return any nodes or attributes. Be sure to
specify in xpath
the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.</pre>
I have read the documentation here:
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
And also checked my xpath here (code above is just a minimal example, actual XML I use is more complex):
https://freeonlineformatter.com/xpath-validator/
In a nutshell I need to read into pandas dataframe a list of XML child elements at a known xpath. Child elements have no attributes but all have text values. I want to get a dataframe with one column containing these valyes. What am I doing wrong?
答案1
得分: 1
如果您查看文档,pandas 期望 XML 具有带有列的行。在您的第一个示例中,每个 <ELEM>
是一行,而 atr
是列。在您的第二个示例中,没有列。如果您有 <ELEM><VAL>1</VAL></ELEM>
,它应该可以工作,因为 VAL 将成为列。
英文:
If you check the documentation, pandas expects the XML to have rows with columns. In your first example, each <ELEM>
is a row, and the atr
is the column. In your second example, there are no columns. If you had <ELEM><VAL>1</VAL></ELEM>
, it should work, because VAL would be the column.
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论