解析 XML 中的重复属性

huangapple go评论65阅读模式
英文:

Parsing duplicate properties in XML

问题

I've translated the code parts you provided into Chinese as per your request:

我需要处理一个.XML文件

    <?xml version="1.0" encoding="UTF-8"?>
    <typekitSyncState>
      <state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
      <fonts type="array">
        <font>
          <url>https://someURI</url>
          <id>25367</id>
          <properties>
            <fullName>Heisei Kaku Gothic Std W3</fullName>
            <familyName>Heisei Kaku Gothic Std</familyName>
            <variationName>W3</variationName>
            <familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
            <familyWebId>mpll</familyWebId>
            <fvd>n3</fvd>
            <isVariable>false</isVariable>
            <i18n>
              <locales type="array">
                <locale>
                  <ianaTag>ja</ianaTag>
                  <fullName>平成角ゴシック Std W3</fullName>
                  <familyName>平成角ゴシック Std</familyName>
                </locale>
              </locales>
            </i18n>
          </properties>
        </font>
      </fonts>
    </typekitSyncState>

我对提取特定属性感兴趣主要是字体名称

    for element in root.iter():
      if element.tag == "fullName":
        print("%s - %s" % (element.tag, element.text))


    fullName - Heisei Kaku Gothic Std W3
    fullName - 平成角ゴシック Std W3

到目前为止都还不错但注意它添加了fullName属性以及*本地*fullName属性这让事情变得混淆我想能够分别访问英文和日文名称

我对.XML文件还不熟悉发现它的特点有点令人困惑我尝试使用beautiful soup然后是Etree

    
    for child in root:
      # state, fonts (plural)
      for child2 in child:
        # font (singual)
        for child3 in child2:
          # properties
          for child4 in child3:
            # 有用的信息
            # fullName, familyName, variationName,
            if isinstance(child4.tag, str):
              print(child4.tag, child4.text)
    
              if child4.tag == "fullName":
                print (child4.attrib)
            for child5 in child4:
              print ("5  ", child5.tag, child5.attrib) 

`if child4.tag == "fullName"` 不起作用然后我陷入了寻找与我所使用的XML格式匹配的解析示例的兔子洞我的XML似乎没有命名空间

我想要的是类似这样的东西


    for property in fonts:
      full = font.fullname
      fam  = font.familyname
      vari = font.variationname
      if (font.locale) 存在:
        full_local  = font.locale.fullname
      else: full_local = ""
      print(fill, fam, vari, full_local) 

    >> Heisei Kaku Gothic Std W3, Heisei Kaku Gothic Std, W3, 平成角ゴシック Std W

I've translated the code portions for you.

英文:

I've got to work with an .XML file:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;typekitSyncState&gt;
&lt;state&gt;2f1b61f7296e340f27be2925b5485719f04efcb8&lt;/state&gt;
&lt;fonts type=&quot;array&quot;&gt;
&lt;font&gt;
&lt;url&gt;https://someURI&lt;/url&gt;
&lt;id&gt;25367&lt;/id&gt;
&lt;properties&gt;
&lt;fullName&gt;Heisei Kaku Gothic Std W3&lt;/fullName&gt;
&lt;familyName&gt;Heisei Kaku Gothic Std&lt;/familyName&gt;
&lt;variationName&gt;W3&lt;/variationName&gt;
&lt;familyURL&gt;https://typekit.com/fonts/heisei-kaku-gothic-std&lt;/familyURL&gt;
&lt;familyWebId&gt;mpll&lt;/familyWebId&gt;
&lt;fvd&gt;n3&lt;/fvd&gt;
&lt;isVariable&gt;false&lt;/isVariable&gt;
&lt;i18n&gt;
&lt;locales type=&quot;array&quot;&gt;
&lt;locale&gt;
&lt;ianaTag&gt;ja&lt;/ianaTag&gt;
&lt;fullName&gt;平成角ゴシック Std W3&lt;/fullName&gt;
&lt;familyName&gt;平成角ゴシック Std&lt;/familyName&gt;
&lt;/locale&gt;
&lt;/locales&gt;
&lt;/i18n&gt;
&lt;/properties&gt;
&lt;/font&gt;
&lt;/fonts&gt;
&lt;/typekitSyncState&gt;

I'm interested in extracting certain properties, mainly the font names.

for element in root.iter():
if element.tag == &quot;fullName&quot;:
print(&quot;%s - %s&quot; % (element.tag, element.text))
fullName - Heisei Kaku Gothic Std W3
fullName - 平成角ゴシック Std W3

So far so good, but releases that it's added the fullName property as well as the local fullname property which confuses matters. I want to be able to access the English and Japanese names separately.

I'm new to .XML and finding it's idiosyncrasies somewhat confusing. I tried using beautiful soup and then Etree.

for child in root:
# state, fonts (plural)
for child2 in child:
# font (singual)
for child3 in child2:
# properties
for child4 in child3:
# The good stuff
# fullName, familyName, variationName,
if isinstance(child4.tag, str):
print(child4.tag, child4.text)
if child4.tag == &quot;fullName&quot;:
print (child4.attrib)
for child5 in child4:
print (&quot;5  &quot;, child5.tag, child5.attrib) 

if child4.tag == &quot;fullName&quot; wasn't working. I then went down a rabbit hole trying to find examples of XML parsing that matched my flavour of XML. Mine doesn't seem to have namespaces.

What I'd like is this something like:

for property in fonts:
full = font.fullname
fam  = font.familyname
vari = font.variationname
if (font.locale) exists:
full_local  = font.locale.fullname
else: full_local = &quot;&quot;
print(fill, fam, vari, full_local) 
&gt;&gt; Heisei Kaku Gothic Std W3, Heisei Kaku Gothic Std, W3, 平成角ゴシック Std W

答案1

得分: 1

以下是代码的翻译部分:

from bs4 import BeautifulSoup

xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<typekitSyncState>
  <state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
  <fonts type="array">
    <font>
      <url>https://someURI</url>
      <id>25367</id>
      <properties>
        <fullName>Heisei Kaku Gothic Std W3</fullName>
        <familyName>Heisei Kaku Gothic Std</familyName>
        <variationName>W3</variationName>
        <familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
        <familyWebId>mpll</familyWebId>
        <fvd>n3</fvd>
        <isVariable>false</isVariable>
        <i18n>
          <locales type="array">
            <locale>
              <ianaTag>ja</ianaTag>
              <fullName>平成角ゴシック Std W3</fullName>
              <familyName>平成角ゴシック Std</familyName>
            </locale>
          </locales>
        </i18n>
      </properties>
    </font>
  </fonts>
</typekitSyncState>
'''

soup = BeautifulSoup(xml_doc, 'xml')

full_name = soup.select_one('font > properties > fullName').text
family_name = soup.select_one('font > properties > familyName').text
locale_names = {l.find('ianaTag').text: l.find('fullName').text for l in soup.select('font locales locale')}

print(family_name)
print(full_name)
print(locale_names)

打印输出结果:

Heisei Kaku Gothic Std
Heisei Kaku Gothic Std W3
{'ja': '平成角ゴシック Std W3'}
英文:

Try:

from bs4 import BeautifulSoup


xml_doc = &#39;&#39;&#39;
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;typekitSyncState&gt;
  &lt;state&gt;2f1b61f7296e340f27be2925b5485719f04efcb8&lt;/state&gt;
  &lt;fonts type=&quot;array&quot;&gt;
    &lt;font&gt;
      &lt;url&gt;https://someURI&lt;/url&gt;
      &lt;id&gt;25367&lt;/id&gt;
      &lt;properties&gt;
        &lt;fullName&gt;Heisei Kaku Gothic Std W3&lt;/fullName&gt;
        &lt;familyName&gt;Heisei Kaku Gothic Std&lt;/familyName&gt;
        &lt;variationName&gt;W3&lt;/variationName&gt;
        &lt;familyURL&gt;https://typekit.com/fonts/heisei-kaku-gothic-std&lt;/familyURL&gt;
        &lt;familyWebId&gt;mpll&lt;/familyWebId&gt;
        &lt;fvd&gt;n3&lt;/fvd&gt;
        &lt;isVariable&gt;false&lt;/isVariable&gt;
        &lt;i18n&gt;
          &lt;locales type=&quot;array&quot;&gt;
            &lt;locale&gt;
              &lt;ianaTag&gt;ja&lt;/ianaTag&gt;
              &lt;fullName&gt;平成角ゴシック Std W3&lt;/fullName&gt;
              &lt;familyName&gt;平成角ゴシック Std&lt;/familyName&gt;
            &lt;/locale&gt;
          &lt;/locales&gt;
        &lt;/i18n&gt;
      &lt;/properties&gt;
    &lt;/font&gt;
  &lt;/fonts&gt;
&lt;/typekitSyncState&gt;&#39;&#39;&#39;

soup = BeautifulSoup(xml_doc, &#39;xml&#39;)

full_name = soup.select_one(&#39;font &gt; properties &gt; fullName&#39;).text
family_name = soup.select_one(&#39;font &gt; properties &gt; familyName&#39;).text
locale_names = {l.find(&#39;ianaTag&#39;).text: l.find(&#39;fullName&#39;).text for l in soup.select(r&#39;font locales locale&#39;)}

print(family_name)
print(full_name)
print(locale_names)

Prints:

Heisei Kaku Gothic Std
Heisei Kaku Gothic Std W3
{&#39;ja&#39;: &#39;平成角ゴシック Std W3&#39;}

huangapple
  • 本文由 发表于 2023年6月11日 23:09:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76451097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定