英文:
How is it possible to leave some tags not parsed while parsing by xml.etree.ElementTree in Python?
问题
我需要在Python中使用xml.etree.ElementTree解析XML文件。XML示例如下:
<items>
<item id="0001" type="donut">
<name>Cake</name>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1003">Blueberry</batter>
</batters>
<topping id="5001">None</topping>
</item>
<item id="0002" type="biscuit">
<name>Biscuit</name>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1004">Orange</batter>
<batter id="1005">Banana</batter>
</batters>
<topping id="5006">Sprinkles</topping>
</item>
</items>
batters
的组合可以更复杂,具有嵌套结构。因为batters
的组合可能会更改,所以我不想解析batters
中的每个标签。我想要将每个item
的id
与其batters
关联起来。batters
应该表示为类似XML的字符串,而item
的id
应该被解析。
因此,在这个示例中,我想要获取以下结果:
{
'0001': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1003">Blueberry</batter>',
'0002': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1004">Orange</batter><batter id="1005">Banana</batter>'
}
数据结构可能不同,这无关紧要。我找不到如何部分解析XML的示例,即解析某些标签而不解析其他标签。是否可以使用xml.etree.ElementTree实现这个目标?
英文:
I need to parse xml file in Python by xml.etree.ElementTree
The xml example is:
<items>
<item id="0001" type="donut">
<name>Cake</name>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1003">Blueberry</batter>
</batters>
<topping id="5001">None</topping>
</item>
<item id="0002" type="biscuit">
<name>Biscuit</name>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1004">Orange</batter>
<batter id="1005">Banana</batter>
</batters>
<topping id="5006">Sprinkles</topping>
</item>
</items>
The compound of batters
could be more complex with nested structures.
Because the compound of the batters
could change I don't want to parse every tag in batters
.
I want to relate every item
id
with its batters
. Batters
should be presented as xml-like-string, and item
id
should be parsed.
So in this example I want to get the result
{
'0001': '<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1003">Blueberry</batter>',
'0002': '<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1004">Orange</batter>
<batter id="1005">Banana</batter>'
}
The data structure could be different, it's no matter.
I couldn't find examples how to parse xml partly, i.e. parse some tags and don't parse other tags. Is it possible with xml.etree.ElementTree?
答案1
得分: 0
以下是代码示例的翻译部分:
import xml.etree.ElementTree as ET
tree = ET.parse("your_file.xml")
root = tree.getroot()
out = {}
for item in root.iter("item"):
item_id = item.get("id")
s = []
for batter in item.iter("batter"):
s.append(ET.tostring(batter).decode("utf-8").strip())
out[item_id] = "\n".join(s)
print(out)
打印输出结果:
{
"0001": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1003">Blueberry</batter>',
"0002": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1004">Orange</batter>\n<batter id="1005">Banana</batter>',
}
英文:
You can use this example how to parse the XML file to the dictionary:
import xml.etree.ElementTree as ET
tree = ET.parse("your_file.xml")
root = tree.getroot()
out = {}
for item in root.iter("item"):
item_id = item.get("id")
s = []
for batter in item.iter("batter"):
s.append(ET.tostring(batter).decode("utf-8").strip())
out[item_id] = "\n".join(s)
print(out)
Prints:
{
"0001": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1003">Blueberry</batter>',
"0002": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1004">Orange</batter>\n<batter id="1005">Banana</batter>',
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论