如何在Python中使用xml.etree.ElementTree解析时保留一些标签不被解析?

huangapple go评论74阅读模式
英文:

How is it possible to leave some tags not parsed while parsing by xml.etree.ElementTree in Python?

问题

我需要在Python中使用xml.etree.ElementTree解析XML文件。XML示例如下:

<items>
    <item id="0001" type="donut">
        <name>Cake</name>
        <batters>
            <batter id="1001">Regular</batter>
            <batter id="1002">Chocolate</batter>
            <batter id="1003">Blueberry</batter>
        </batters>
        <topping id="5001">None</topping>
    </item>
    <item id="0002" type="biscuit">
        <name>Biscuit</name>
        <batters>
            <batter id="1001">Regular</batter>
            <batter id="1002">Chocolate</batter>
            <batter id="1004">Orange</batter>
            <batter id="1005">Banana</batter>
        </batters>
        <topping id="5006">Sprinkles</topping>
    </item>
</items>

batters的组合可以更复杂,具有嵌套结构。因为batters的组合可能会更改,所以我不想解析batters中的每个标签。我想要将每个itemid与其batters关联起来。batters应该表示为类似XML的字符串,而itemid应该被解析。

因此,在这个示例中,我想要获取以下结果:

{
    '0001': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1003">Blueberry</batter>',
    '0002': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1004">Orange</batter><batter id="1005">Banana</batter>'
}

数据结构可能不同,这无关紧要。我找不到如何部分解析XML的示例,即解析某些标签而不解析其他标签。是否可以使用xml.etree.ElementTree实现这个目标?

英文:

I need to parse xml file in Python by xml.etree.ElementTree
The xml example is:

&lt;items&gt;
	&lt;item id=&quot;0001&quot; type=&quot;donut&quot;&gt;
		&lt;name&gt;Cake&lt;/name&gt;
		&lt;batters&gt;
			&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
			&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
			&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;
		&lt;/batters&gt;
		&lt;topping id=&quot;5001&quot;&gt;None&lt;/topping&gt;
	&lt;/item&gt;
	&lt;item id=&quot;0002&quot; type=&quot;biscuit&quot;&gt;
		&lt;name&gt;Biscuit&lt;/name&gt;
		&lt;batters&gt;
			&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
			&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
			&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;
			&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;
		&lt;/batters&gt;
		&lt;topping id=&quot;5006&quot;&gt;Sprinkles&lt;/topping&gt;
	&lt;/item&gt;
&lt;/items&gt;

The compound of batters could be more complex with nested structures.
Because the compound of the batters could change I don't want to parse every tag in batters.
I want to relate every item id with its batters. Batters should be presented as xml-like-string, and item id should be parsed.
So in this example I want to get the result

{
	&#39;0001&#39;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
		&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
		&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;&#39;,
	&#39;0002&#39;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
		&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
		&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;
		&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;&#39;
}

The data structure could be different, it's no matter.
I couldn't find examples how to parse xml partly, i.e. parse some tags and don't parse other tags. Is it possible with xml.etree.ElementTree?

答案1

得分: 0

以下是代码示例的翻译部分:

import xml.etree.ElementTree as ET

tree = ET.parse("your_file.xml")
root = tree.getroot()

out = {}
for item in root.iter("item"):
    item_id = item.get("id")
    s = []
    for batter in item.iter("batter"):
        s.append(ET.tostring(batter).decode("utf-8").strip())
    out[item_id] = "\n".join(s)

print(out)

打印输出结果:

{
    "0001": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1003">Blueberry</batter>',
    "0002": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1004">Orange</batter>\n<batter id="1005">Banana</batter>',
}
英文:

You can use this example how to parse the XML file to the dictionary:

import xml.etree.ElementTree as ET

tree = ET.parse(&quot;your_file.xml&quot;)
root = tree.getroot()

out = {}
for item in root.iter(&quot;item&quot;):
    item_id = item.get(&quot;id&quot;)
    s = []
    for batter in item.iter(&quot;batter&quot;):
        s.append(ET.tostring(batter).decode(&quot;utf-8&quot;).strip())
    out[item_id] = &quot;\n&quot;.join(s)

print(out)

Prints:

{
    &quot;0001&quot;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;\n&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;\n&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;&#39;,
    &quot;0002&quot;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;\n&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;\n&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;\n&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;&#39;,
}

huangapple
  • 本文由 发表于 2023年6月30日 03:10:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76583996.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定