how to get the xml tag name after parsing as it is specified in xml without namespace conversion

huangapple go评论124阅读模式
英文:

how to get the xml tag name after parsing as it is specified in xml without namespace conversion

问题

我需要将 XML 解析成另一种结构。
示例:

a = """
<actors xmlns:fictional="http://characters.example.com">
  <actor>     
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""

我正在使用 ElementTree 来解析这个树:

root = ElementTree.fromstring(a)

当我执行以下操作时:

root[0][1].tag

我得到了这个结果:

{http://characters.example.com}character

但我需要获得与原始文件中相同的结果:

fictional:character

我该如何获得这个结果?

英文:

I need to parse xml into another structure.
example:
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""

I am using ElementTree to parse the tree
root = ElementTree.fromstring(a)

When I apply
root[0][1].tag

I get the result
{``http://characters.example.com``}character

but I need to get the result as it was in the original file
fictional:character

how do I achieve this result?

答案1

得分: 1

使用XPath,您可以使用name()(以及没有前缀的local-name())返回元素的命名空间前缀和本地名称。Python的第三方包lxml可以运行XPath 1.0:

import lxml.etree as lx

a = """
<actors xmlns:fictional="http://characters.example.com">
 <actor>    
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""

root = xl.fromstring(a)

for el in root.xpath("/actor/*"):
   print(el.xpath("name()"))

# name
# fictional:character
# fictional:character
# fictional:character
英文:

With XPath, you can return namespace prefixes with local name of an element using name() (and without prefix: local-name()). Python's third-party package, lxml, can run XPath 1.0:

import lxml.etree as lx

a = """
<actors xmlns:fictional="http://characters.example.com">
 <actor>    
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""

root = xl.fromstring(a)

for el in root.xpath("/actor/*"):
   print(el.xpath("name()"))

# name
# fictional:character
# fictional:character
# fictional:character

答案2

得分: 0

使用ElementTree库,没有简单的方法来做到这一点。

英文:

with ElementTree library there is no simple way to do it.

答案3

得分: 0

你可以使用re.sub()函数:

import xml.etree.ElementTree as ET
import re
from io import StringIO

a = """
<actors xmlns:fictional="http://characters.example.com">
 <actor>    
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""
f = StringIO(a)

tree = ET.parse(f)
root = tree.getroot()

ns={"fictional": "http://characters.example.com"}

for elem in root.findall(".//fictional:character", ns):
    print(re.sub("{http://characters.example.com}", "fictional:", elem.tag), elem.text)

输出结果:

fictional:character Sir Robin
fictional:character Gunther
fictional:character Commander Clement
英文:

You can use re.sub():

import xml.etree.ElementTree as ET
import re
from io import StringIO

a = &quot;&quot;&quot;
&lt;actors xmlns:fictional=&quot;http://characters.example.com&quot;&gt;
 &lt;actor&gt;    
    &lt;name&gt;Eric Idle&lt;/name&gt;
     &lt;fictional:character&gt;Sir Robin&lt;/fictional:character&gt;
     &lt;fictional:character&gt;Gunther&lt;/fictional:character&gt;
     &lt;fictional:character&gt;Commander Clement&lt;/fictional:character&gt;
   &lt;/actor&gt;
&lt;/actors&gt;
&quot;&quot;&quot;
f = StringIO(a)

tree = ET.parse(f)
root = tree.getroot()

ns={&quot;fictional&quot;: &quot;http://characters.example.com&quot;}

for elem in root.findall(&quot;.//fictional:character&quot;, ns):
    print(re.sub(&quot;{http://characters.example.com}&quot;, &quot;fictional:&quot;, elem.tag), elem.text)

Output:

fictional:character Sir Robin
fictional:character Gunther
fictional:character Commander Clement

答案4

得分: 0

我发现expat解析器参与了命名空间的转换。它是由解析器创建的,默认情况下被ElementTree使用。

xml.etree.ElementTree.XMLParser

在初始化方法中使用以下命令创建:

parser = expat.ParserCreate(encoding, "&quot;}&quot;")

如果你将这行重定义为以下内容,你可以覆盖解析器的标准行为:

parser = expat.ParserCreate(encoding, None)

在这种情况下,命名空间处理被禁用。

英文:

I found out that the expat parser is engaged in the transformation of namespaces.
It is created by the parser, which is used by default ElementTree.

xml.etree.ElementTree.XMLParser

is created in the initialization method with the command

parser = expat.ParserCreate(encoding, &quot;}&quot;)

You can override the standard behavior of the parser if you redefine this line to

parser = expat.ParserCreate(encoding, None)

In this case, namespace processing is disabled

huangapple
  • 本文由 发表于 2023年7月13日 19:20:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678797.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定