Python – 从XML中抓取数据

huangapple go评论81阅读模式
英文:

Python - web scrape data from xml

问题

I can help you with the translation. Here's the translated text:

遇到了一点从网页上抓取XML数据的问题。这是一个气象数据页面,我想在我的Python代码中手动输入城市名称,并从XML文件中获取数据(温度、湿度、气压)。

这是我第一次处理XML,不太确定该如何处理 - 有人可以帮助我吗?

当我运行下面的代码时,会打印出所有内容(比我在此处复制的数据表格要长得多)。

我想要获取的是:

城市名称:城市名称
温度:13.0
气压:1013.9
湿度:86

url = "linkx2xml.xml"
page = requests.get(url)
root = etree.parse(page.content).getroot()
instruments = root.find('CityIme')
instrument = instruments.find_all("City 1 Name")
for grandchild in instrument:
temperature, pressure, humid = grandchild.find('Temp'), grandchild.find('Press'), grandchild.find('humid')
print(temperature.text, pressure.text, humid.text)

XML链接:https://vrijeme.hr/hrvatska_n.xml

谢谢!

英文:

have a little issue webscraping xml from web. It's a meteo data page for which I want to manually enter the city name in my python code and get data for it from xml file (temperature, humidity, pressure).

It's the first time I'm dealing with xml and not sure how to approach - anybody that could help me please?

When I run the code below I get printed everything (much longer data sheet than the one I copied here).

What I want to get is:

City name: City name
Temperature: 13.0
Pressure: 1013.9
Humidity: 86

>
> url ="linkx2xml.xml"
> page = requests.get(url)
> root = etree.parse(page.content).getroot()
> instruments = root.find('CityIme')
> instrument = instruments.find_all("City 1 Name")
> for grandchild in instrument:
> temperature, pressure, humid = grandchild.find('Temp'), grandchild.find('Press'), grandchild.find('humid')
> print(temperature.text, pressure.text, humid.text)

Link to xml: https://vrijeme.hr/hrvatska_n.xml

Thanks!

答案1

得分: 1

感谢bloodscript,将我引导到了正确的轨道上!从这里,我能够提供以下代码的解决方案。我相当确定这可以用更好的代码来解决,但对于初学者来说,它完成了需要完成的任务 Python – 从XML中抓取数据
只有这一行需要考虑一下:

if root[i][0].text == grad:

我认为这应该以某种方式更好。

英文:

Thank you bloodscript, for putting me on right track!
From this, I was able to provide solution with this code below. I'm pretty sure this could be solved with much better code, but in the eyes of the beginner it does what it needs to be done Python – 从XML中抓取数据
Only this line is something to think about:

if root[i][0].text == grad:

I believe it should be somehow, better.

def tempxml(grad):

    parser = etree.XMLParser()

    with urlopen('https://vrijeme.hr/hrvatska_n.xml') as f:
        tree = etree.parse(f, parser)
        root = tree.getroot()

    i = 0
    for child in root:
        #print(child.tag, child.attrib)
        if root[i][0].text == grad:
            city = root[i][0].text
            data = child.find("Podatci")
            temperature = data.find('Temp') 
            pressure = data.find('Tlak')
            humid = data.find('Vlaga') 
            return city, temperature.text, pressure.text, humid.text
        i+=1

答案2

得分: 0

Sure, here is the translated content:

你现在正在使用一个名为 ElementTree 的 Python 库,我想。您还可以使用其他与相同 API 兼容的实现,如 lxml,以及 Python 标准库中的 cElementTree。

首先,使用 XML 函数或解析文件构建一个 Element 实例 root,然后迭代子元素并搜索您想要检索的任何子标签:

import requests
import xml.etree.ElementTree as etree

# 获取 XML 文件并解析
url = "linkx2xml.xml"
page = requests.get(url)
tree = etree.parse(page.content)
root = tree.getroot()
# 遍历根元素的每个子元素,类型为 'Grad'
for child in root.iter('Grad'):
    # 保存城市名称
    cityname = child.find('GradIme')
    # 获取包含天气数据的子元素
    data = child.find("Podatci")
    # 保存位于 Podcatci 元素下的天气数据
    temperature = data.find('Temp')
    pressure = data.find('Tlak')
    humid = data.find('Vlaga')
    print(cityname.text + ":\t" + temperature.text + "\t" + pressure.text + "\t" + humid.text)

这里是文档链接:
https://docs.python.org/3/library/xml.etree.elementtree.html

英文:

You are using a python library called ElementTree I suppose. You could also use other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file and then iterate over the children and search for whatever child tags you would like to retrieve:

import requests
import xml.etree.ElementTree as etree

#get xml file and parse
url ="linkx2xml.xml" 
page = requests.get(url)
tree = etree.parse(page.content)
root = tree.getroot()
#iterate over every child of the root with the type 'Grad'
for child in root.iter('Grad'):
    # save cityname
    cityname = child.find('GradIme')
    # get child_element which contains the weather dataa
    data = child.find("Podatci")
    # save weather data located under Podcatci element
    temperature = data.find('Temp') 
    pressure = data.find('Tlak')
    humid = data.find('Vlaga') 
    print(cityname.text +":\t"+temperature.text + "\t" + pressure.text +"\t"+ humid.text)

Here the documentation:
https://docs.python.org/3/library/xml.etree.elementtree.html

huangapple
  • 本文由 发表于 2023年5月11日 19:49:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227325.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定