2023年6月30日 03:10:47go评论74阅读模式

英文:

How is it possible to leave some tags not parsed while parsing by xml.etree.ElementTree in Python?

问题

我需要在Python中使用xml.etree.ElementTree解析XML文件。XML示例如下：

<items>
    <item id="0001" type="donut">
        <name>Cake</name>
        <batters>
            <batter id="1001">Regular</batter>
            <batter id="1002">Chocolate</batter>
            <batter id="1003">Blueberry</batter>
        </batters>
        <topping id="5001">None</topping>
    </item>
    <item id="0002" type="biscuit">
        <name>Biscuit</name>
        <batters>
            <batter id="1001">Regular</batter>
            <batter id="1002">Chocolate</batter>
            <batter id="1004">Orange</batter>
            <batter id="1005">Banana</batter>
        </batters>
        <topping id="5006">Sprinkles</topping>
    </item>
</items>

batters的组合可以更复杂，具有嵌套结构。因为batters的组合可能会更改，所以我不想解析batters中的每个标签。我想要将每个item的id与其batters关联起来。batters应该表示为类似XML的字符串，而item的id应该被解析。

因此，在这个示例中，我想要获取以下结果：

{
    '0001': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1003">Blueberry</batter>',
    '0002': '<batter id="1001">Regular</batter><batter id="1002">Chocolate</batter><batter id="1004">Orange</batter><batter id="1005">Banana</batter>'
}

数据结构可能不同，这无关紧要。我找不到如何部分解析XML的示例，即解析某些标签而不解析其他标签。是否可以使用xml.etree.ElementTree实现这个目标？

英文:

I need to parse xml file in Python by xml.etree.ElementTree
The xml example is:

&lt;items&gt;
	&lt;item id=&quot;0001&quot; type=&quot;donut&quot;&gt;
		&lt;name&gt;Cake&lt;/name&gt;
		&lt;batters&gt;
			&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
			&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
			&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;
		&lt;/batters&gt;
		&lt;topping id=&quot;5001&quot;&gt;None&lt;/topping&gt;
	&lt;/item&gt;
	&lt;item id=&quot;0002&quot; type=&quot;biscuit&quot;&gt;
		&lt;name&gt;Biscuit&lt;/name&gt;
		&lt;batters&gt;
			&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
			&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
			&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;
			&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;
		&lt;/batters&gt;
		&lt;topping id=&quot;5006&quot;&gt;Sprinkles&lt;/topping&gt;
	&lt;/item&gt;
&lt;/items&gt;

The compound of batters could be more complex with nested structures.
Because the compound of the batters could change I don't want to parse every tag in batters.
I want to relate every item id with its batters. Batters should be presented as xml-like-string, and item id should be parsed.
So in this example I want to get the result

{
	&#39;0001&#39;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
		&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
		&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;&#39;,
	&#39;0002&#39;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;
		&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;
		&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;
		&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;&#39;
}

The data structure could be different, it's no matter.
I couldn't find examples how to parse xml partly, i.e. parse some tags and don't parse other tags. Is it possible with xml.etree.ElementTree?

答案1

得分: 0

以下是代码示例的翻译部分：

import xml.etree.ElementTree as ET

tree = ET.parse("your_file.xml")
root = tree.getroot()

out = {}
for item in root.iter("item"):
    item_id = item.get("id")
    s = []
    for batter in item.iter("batter"):
        s.append(ET.tostring(batter).decode("utf-8").strip())
    out[item_id] = "\n".join(s)

print(out)

打印输出结果：

{
    "0001": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1003">Blueberry</batter>',
    "0002": '<batter id="1001">Regular</batter>\n<batter id="1002">Chocolate</batter>\n<batter id="1004">Orange</batter>\n<batter id="1005">Banana</batter>',
}

英文:

You can use this example how to parse the XML file to the dictionary:

import xml.etree.ElementTree as ET

tree = ET.parse(&quot;your_file.xml&quot;)
root = tree.getroot()

out = {}
for item in root.iter(&quot;item&quot;):
    item_id = item.get(&quot;id&quot;)
    s = []
    for batter in item.iter(&quot;batter&quot;):
        s.append(ET.tostring(batter).decode(&quot;utf-8&quot;).strip())
    out[item_id] = &quot;\n&quot;.join(s)

print(out)

Prints:

{
    &quot;0001&quot;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;\n&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;\n&lt;batter id=&quot;1003&quot;&gt;Blueberry&lt;/batter&gt;&#39;,
    &quot;0002&quot;: &#39;&lt;batter id=&quot;1001&quot;&gt;Regular&lt;/batter&gt;\n&lt;batter id=&quot;1002&quot;&gt;Chocolate&lt;/batter&gt;\n&lt;batter id=&quot;1004&quot;&gt;Orange&lt;/batter&gt;\n&lt;batter id=&quot;1005&quot;&gt;Banana&lt;/batter&gt;&#39;,
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中使用xml.etree.ElementTree解析时保留一些标签不被解析？

问题

答案1

Pandas线性插值能捕捉季节性模式吗？

寻找一种方法可以多次运行一个Python脚本，同时将txt文件转换为csv。

如何在多行上使用Flake8拆分for语句？

“variable for pdf file is referenced before assignment” 变量在赋值之前被引用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论