如何使用Python解析包含XML模式信息的XML

huangapple go评论76阅读模式
英文:

How to parse xml that include xml schema info using python

问题

import xml.etree.ElementTree as ET
mytree=ET.parse("/Users/user/student.xml")
myroot=mytree.getroot()
tag=myroot.tag
print(tag)
#attr=myroot.attrib
#print(attr)

for p in myroot.findall('.//studentData'):
    acctDt=p.find('acctDt').text
英文:
<?xml version="1.0" encoding="UTF-8"?>
<studentData xmlns="http://www.myschool.com/schmea/studentData" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.myschool.com/schmea/studentData Studentdata.xsd">
   <stuRec>
      <as>
         <sourceSys>BBC</sourceSys>
         <acctDt>2023-04-04</acctDt>
      </as>
      <stats>
         <ss>
            <prov>AB</prov>
            <cono>1</cono>
         </ss>
      </stats>
   </stuRec>
   <stuRec>
      <as>
         <sourceSys>RCD</sourceSys>
         <acctDt>2023-05-14</acctDt>
      </as>
      <stats>
         <ss>
            <prov>ON</prov>
            <cono>2</cono>
         </ss>
      </stats>
   </stuRec>
</studentData>
import xml.etree.ElementTree as ET
    mytree=ET.parse("/Users/user/student.xml")
    myroot=mytree.getroot()
    tag=myroot.tag
    print(tag)
    #attr=myroot.attrib
    #print(attr)

for p in myroot.findall('.//studentData'):
    acctDt=p.find('acctDt').text

**My XML file (student.xml) looks like above xml file:
**When I run the python code I can print root tag and attribute but I get nothing from the loop, however, I want to get acctDt and prov:

user@star ~ % python -u "/Users/user/student.py"
{http://www.myschool.com/schmea/studentData}studentData
{'{http://www.w3.org/2000/10/XMLSchema-instance}schemaLocation': 'http://www.myschool.com/schmea/studentData Studentdata.xsd'}
user@star ~ % 

答案1

得分: 2

你应该调整你的循环,因为你的 XML 包含一个命名空间。做类似以下的操作:

ns = {'': 'http://www.myschool.com/schmea/studentData'}
for node in myroot.findall('.//acctDt', ns):
    print(node.text)

参考使用命名空间解析 XML

英文:

You should adjust your loop, because your xml contain a namespace. Do something like:

ns = {'': 'http://www.myschool.com/schmea/studentData'}
for node in myroot.findall('.//acctDt', ns):
    print(node.text)

Compare Parsing XML with Namespaces

答案2

得分: 0

希望这对你的解决方案起作用

from lxml import etree
tree = etree.parse('./xml_schema_info.xml')
root = tree.getroot()
ele_sets = set()
for ele in root.xpath('.//*'):
    ele_sets.add(ele.tag)
print(f'元素: \n{ele_sets}\n总计: {len(ele_sets)}')
acctDt = '{http://www.myschool.com/schmea/studentData}acctDt'
for ele in root.iter(acctDt):
    print(f'acctDt: {ele.text}')
prov = '{http://www.myschool.com/schmea/studentData}prov'
for ele in root.iter(prov):
    print(f'prov: {ele.text}')
英文:

I hope, this will work for your solution

from lxml import etree
tree = etree.parse('./xml_schema_info.xml')
root = tree.getroot()
ele_sets = set()
for ele in root.xpath('.//*'):
    ele_sets.add(ele.tag)
print(f'elements: \n{ele_sets}\nTotal: {len(ele_sets)}')
acctDt = '{http://www.myschool.com/schmea/studentData}acctDt'
for ele in root.iter(acctDt):
    print(f'acctDt: {ele.text}')
prov = '{http://www.myschool.com/schmea/studentData}prov'
for ele in root.iter(prov):
    print(f'prov: {ele.text}')

答案3

得分: 0

以下是您提供的代码的翻译部分:

import xml.etree.ElementTree as ET
from io import StringIO

xml_str = """<?xml version="1.0" encoding="UTF-8"?>
<studentData xmlns="http://www.myschool.com/schmea/studentData" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.myschool.com/schmea/studentData Studentdata.xsd">
   <stuRec>
      <as>
         <sourceSys>BBC</sourceSys>
         <acctDt>2023-04-04</acctDt>
      </as>
      <stats>
         <ss>
            <prov>AB</prov>
            <cono>1</cono>
         </ss>
      </stats>
   </stuRec>
   <stuRec>
      <as>
         <sourceSys>RCD</sourceSys>
         <acctDt>2023-05-14</acctDt>
      </as>
      <stats>
         <ss>
            <prov>ON</prov>
            <cono>2</cono>
         </ss>
      </stats>
   </stuRec>
</studentData>"""

f = StringIO(xml_str)

tree = ET.parse(f)
root = tree.getroot()

ns = {'': 'http://www.myschool.com/schmea/studentData'}

for strRec in root.findall('.//stuRec', ns):
    sourceSys = strRec.find('.//sourceSys', ns).text
    acctDt = strRec.find('.//acctDt', ns).text
    prov = strRec.find('.//prov', ns).text
    cono = strRec.find('.//cono', ns).text
    
    print(f"{sourceSys:<3}, {acctDt:>15}, {prov:>6}, {cono:>5}")

输出:

BBC,     2023-04-04,    AB,    1
RCD,     2023-05-14,    ON,    2
英文:

For you extended question:

import xml.etree.ElementTree as ET
from io import StringIO

xml_str=&quot;&quot;&quot;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;studentData xmlns=&quot;http://www.myschool.com/schmea/studentData&quot; xmlns:xsi=&quot;http://www.w3.org/2000/10/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://www.myschool.com/schmea/studentData Studentdata.xsd&quot;&gt;
   &lt;stuRec&gt;
      &lt;as&gt;
         &lt;sourceSys&gt;BBC&lt;/sourceSys&gt;
         &lt;acctDt&gt;2023-04-04&lt;/acctDt&gt;
      &lt;/as&gt;
      &lt;stats&gt;
         &lt;ss&gt;
            &lt;prov&gt;AB&lt;/prov&gt;
            &lt;cono&gt;1&lt;/cono&gt;
         &lt;/ss&gt;
      &lt;/stats&gt;
   &lt;/stuRec&gt;
   &lt;stuRec&gt;
      &lt;as&gt;
         &lt;sourceSys&gt;RCD&lt;/sourceSys&gt;
         &lt;acctDt&gt;2023-05-14&lt;/acctDt&gt;
      &lt;/as&gt;
      &lt;stats&gt;
         &lt;ss&gt;
            &lt;prov&gt;ON&lt;/prov&gt;
            &lt;cono&gt;2&lt;/cono&gt;
         &lt;/ss&gt;
      &lt;/stats&gt;
   &lt;/stuRec&gt;
&lt;/studentData&gt;&quot;&quot;&quot;

f = StringIO(xml_str)

tree = ET.parse(f)
root = tree.getroot()

ns = {&#39;&#39;: &#39;http://www.myschool.com/schmea/studentData&#39;}

for strRec in root.findall(&#39;.//stuRec&#39;, ns):
    sourceSys = strRec.find(&#39;.//sourceSys&#39;, ns).text
    acctDt = strRec.find(&#39;.//acctDt&#39;, ns).text
    prov = strRec.find(&#39;.//prov&#39;, ns).text
    cono = strRec.find(&#39;.//cono&#39;, ns).text
    
    print(f&quot;{sourceSys:&lt;3},{acctDt:&gt;15},{prov:&gt;6},{cono:&gt;5}&quot;)

Output:

BBC,     2023-04-04,    AB,    1
RCD,     2023-05-14,    ON,    2

huangapple
  • 本文由 发表于 2023年6月26日 12:40:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76553565.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定