英文:
how to get the xml tag name after parsing as it is specified in xml without namespace conversion
问题
我需要将 XML 解析成另一种结构。
示例:
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
我正在使用 ElementTree 来解析这个树:
root = ElementTree.fromstring(a)
当我执行以下操作时:
root[0][1].tag
我得到了这个结果:
{http://characters.example.com}character
但我需要获得与原始文件中相同的结果:
fictional:character
我该如何获得这个结果?
英文:
I need to parse xml into another structure.
example:
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
I am using ElementTree to parse the tree
root = ElementTree.fromstring(a)
When I apply
root[0][1].tag
I get the result
{``http://characters.example.com``}character
but I need to get the result as it was in the original file
fictional:character
how do I achieve this result?
答案1
得分: 1
使用XPath,您可以使用name()(以及没有前缀的local-name())返回元素的命名空间前缀和本地名称。Python的第三方包lxml可以运行XPath 1.0:
import lxml.etree as lx
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
root = xl.fromstring(a)
for el in root.xpath("/actor/*"):
print(el.xpath("name()"))
# name
# fictional:character
# fictional:character
# fictional:character
英文:
With XPath, you can return namespace prefixes with local name of an element using name() (and without prefix: local-name()). Python's third-party package, lxml, can run XPath 1.0:
import lxml.etree as lx
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
root = xl.fromstring(a)
for el in root.xpath("/actor/*"):
print(el.xpath("name()"))
# name
# fictional:character
# fictional:character
# fictional:character
答案2
得分: 0
使用ElementTree库,没有简单的方法来做到这一点。
英文:
with ElementTree library there is no simple way to do it.
答案3
得分: 0
你可以使用re.sub()函数:
import xml.etree.ElementTree as ET
import re
from io import StringIO
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
f = StringIO(a)
tree = ET.parse(f)
root = tree.getroot()
ns={"fictional": "http://characters.example.com"}
for elem in root.findall(".//fictional:character", ns):
print(re.sub("{http://characters.example.com}", "fictional:", elem.tag), elem.text)
输出结果:
fictional:character Sir Robin
fictional:character Gunther
fictional:character Commander Clement
英文:
You can use re.sub():
import xml.etree.ElementTree as ET
import re
from io import StringIO
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""
f = StringIO(a)
tree = ET.parse(f)
root = tree.getroot()
ns={"fictional": "http://characters.example.com"}
for elem in root.findall(".//fictional:character", ns):
print(re.sub("{http://characters.example.com}", "fictional:", elem.tag), elem.text)
Output:
fictional:character Sir Robin
fictional:character Gunther
fictional:character Commander Clement
答案4
得分: 0
我发现expat解析器参与了命名空间的转换。它是由解析器创建的,默认情况下被ElementTree使用。
xml.etree.ElementTree.XMLParser
在初始化方法中使用以下命令创建:
parser = expat.ParserCreate(encoding, ""}"")
如果你将这行重定义为以下内容,你可以覆盖解析器的标准行为:
parser = expat.ParserCreate(encoding, None)
在这种情况下,命名空间处理被禁用。
英文:
I found out that the expat parser is engaged in the transformation of namespaces.
It is created by the parser, which is used by default ElementTree.
xml.etree.ElementTree.XMLParser
is created in the initialization method with the command
parser = expat.ParserCreate(encoding, "}")
You can override the standard behavior of the parser if you redefine this line to
parser = expat.ParserCreate(encoding, None)
In this case, namespace processing is disabled
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论