英文:
Parsing duplicate properties in XML
问题
I've translated the code parts you provided into Chinese as per your request:
我需要处理一个.XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<typekitSyncState>
<state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
<fonts type="array">
<font>
<url>https://someURI</url>
<id>25367</id>
<properties>
<fullName>Heisei Kaku Gothic Std W3</fullName>
<familyName>Heisei Kaku Gothic Std</familyName>
<variationName>W3</variationName>
<familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
<familyWebId>mpll</familyWebId>
<fvd>n3</fvd>
<isVariable>false</isVariable>
<i18n>
<locales type="array">
<locale>
<ianaTag>ja</ianaTag>
<fullName>平成角ゴシック Std W3</fullName>
<familyName>平成角ゴシック Std</familyName>
</locale>
</locales>
</i18n>
</properties>
</font>
</fonts>
</typekitSyncState>
我对提取特定属性感兴趣,主要是字体名称。
for element in root.iter():
if element.tag == "fullName":
print("%s - %s" % (element.tag, element.text))
fullName - Heisei Kaku Gothic Std W3
fullName - 平成角ゴシック Std W3
到目前为止都还不错,但注意它添加了fullName属性以及*本地*fullName属性,这让事情变得混淆。我想能够分别访问英文和日文名称。
我对.XML文件还不熟悉,发现它的特点有点令人困惑。我尝试使用beautiful soup,然后是Etree。
for child in root:
# state, fonts (plural)
for child2 in child:
# font (singual)
for child3 in child2:
# properties
for child4 in child3:
# 有用的信息
# fullName, familyName, variationName,
if isinstance(child4.tag, str):
print(child4.tag, child4.text)
if child4.tag == "fullName":
print (child4.attrib)
for child5 in child4:
print ("5 ", child5.tag, child5.attrib)
`if child4.tag == "fullName"` 不起作用。然后我陷入了寻找与我所使用的XML格式匹配的解析示例的兔子洞。我的XML似乎没有命名空间。
我想要的是类似这样的东西:
for property in fonts:
full = font.fullname
fam = font.familyname
vari = font.variationname
if (font.locale) 存在:
full_local = font.locale.fullname
else: full_local = ""
print(fill, fam, vari, full_local)
>> Heisei Kaku Gothic Std W3, Heisei Kaku Gothic Std, W3, 平成角ゴシック Std W
I've translated the code portions for you.
英文:
I've got to work with an .XML file:
<?xml version="1.0" encoding="UTF-8"?>
<typekitSyncState>
<state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
<fonts type="array">
<font>
<url>https://someURI</url>
<id>25367</id>
<properties>
<fullName>Heisei Kaku Gothic Std W3</fullName>
<familyName>Heisei Kaku Gothic Std</familyName>
<variationName>W3</variationName>
<familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
<familyWebId>mpll</familyWebId>
<fvd>n3</fvd>
<isVariable>false</isVariable>
<i18n>
<locales type="array">
<locale>
<ianaTag>ja</ianaTag>
<fullName>平成角ゴシック Std W3</fullName>
<familyName>平成角ゴシック Std</familyName>
</locale>
</locales>
</i18n>
</properties>
</font>
</fonts>
</typekitSyncState>
I'm interested in extracting certain properties, mainly the font names.
for element in root.iter():
if element.tag == "fullName":
print("%s - %s" % (element.tag, element.text))
fullName - Heisei Kaku Gothic Std W3
fullName - 平成角ゴシック Std W3
So far so good, but releases that it's added the fullName property as well as the local fullname property which confuses matters. I want to be able to access the English and Japanese names separately.
I'm new to .XML and finding it's idiosyncrasies somewhat confusing. I tried using beautiful soup and then Etree.
for child in root:
# state, fonts (plural)
for child2 in child:
# font (singual)
for child3 in child2:
# properties
for child4 in child3:
# The good stuff
# fullName, familyName, variationName,
if isinstance(child4.tag, str):
print(child4.tag, child4.text)
if child4.tag == "fullName":
print (child4.attrib)
for child5 in child4:
print ("5 ", child5.tag, child5.attrib)
if child4.tag == "fullName"
wasn't working. I then went down a rabbit hole trying to find examples of XML parsing that matched my flavour of XML. Mine doesn't seem to have namespaces.
What I'd like is this something like:
for property in fonts:
full = font.fullname
fam = font.familyname
vari = font.variationname
if (font.locale) exists:
full_local = font.locale.fullname
else: full_local = ""
print(fill, fam, vari, full_local)
>> Heisei Kaku Gothic Std W3, Heisei Kaku Gothic Std, W3, 平成角ゴシック Std W
答案1
得分: 1
以下是代码的翻译部分:
from bs4 import BeautifulSoup
xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<typekitSyncState>
<state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
<fonts type="array">
<font>
<url>https://someURI</url>
<id>25367</id>
<properties>
<fullName>Heisei Kaku Gothic Std W3</fullName>
<familyName>Heisei Kaku Gothic Std</familyName>
<variationName>W3</variationName>
<familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
<familyWebId>mpll</familyWebId>
<fvd>n3</fvd>
<isVariable>false</isVariable>
<i18n>
<locales type="array">
<locale>
<ianaTag>ja</ianaTag>
<fullName>平成角ゴシック Std W3</fullName>
<familyName>平成角ゴシック Std</familyName>
</locale>
</locales>
</i18n>
</properties>
</font>
</fonts>
</typekitSyncState>
'''
soup = BeautifulSoup(xml_doc, 'xml')
full_name = soup.select_one('font > properties > fullName').text
family_name = soup.select_one('font > properties > familyName').text
locale_names = {l.find('ianaTag').text: l.find('fullName').text for l in soup.select('font locales locale')}
print(family_name)
print(full_name)
print(locale_names)
打印输出结果:
Heisei Kaku Gothic Std
Heisei Kaku Gothic Std W3
{'ja': '平成角ゴシック Std W3'}
英文:
Try:
from bs4 import BeautifulSoup
xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<typekitSyncState>
<state>2f1b61f7296e340f27be2925b5485719f04efcb8</state>
<fonts type="array">
<font>
<url>https://someURI</url>
<id>25367</id>
<properties>
<fullName>Heisei Kaku Gothic Std W3</fullName>
<familyName>Heisei Kaku Gothic Std</familyName>
<variationName>W3</variationName>
<familyURL>https://typekit.com/fonts/heisei-kaku-gothic-std</familyURL>
<familyWebId>mpll</familyWebId>
<fvd>n3</fvd>
<isVariable>false</isVariable>
<i18n>
<locales type="array">
<locale>
<ianaTag>ja</ianaTag>
<fullName>平成角ゴシック Std W3</fullName>
<familyName>平成角ゴシック Std</familyName>
</locale>
</locales>
</i18n>
</properties>
</font>
</fonts>
</typekitSyncState>'''
soup = BeautifulSoup(xml_doc, 'xml')
full_name = soup.select_one('font > properties > fullName').text
family_name = soup.select_one('font > properties > familyName').text
locale_names = {l.find('ianaTag').text: l.find('fullName').text for l in soup.select(r'font locales locale')}
print(family_name)
print(full_name)
print(locale_names)
Prints:
Heisei Kaku Gothic Std
Heisei Kaku Gothic Std W3
{'ja': '平成角ゴシック Std W3'}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论