2023年5月11日 19:49:17go评论107阅读模式

英文:

Python - web scrape data from xml

问题

I can help you with the translation. Here's the translated text:

遇到了一点从网页上抓取XML数据的问题。这是一个气象数据页面，我想在我的Python代码中手动输入城市名称，并从XML文件中获取数据（温度、湿度、气压）。

这是我第一次处理XML，不太确定该如何处理 - 有人可以帮助我吗？

当我运行下面的代码时，会打印出所有内容（比我在此处复制的数据表格要长得多）。

我想要获取的是：

城市名称：城市名称
温度：13.0
气压：1013.9
湿度：86

url = "linkx2xml.xml"
page = requests.get(url)
root = etree.parse(page.content).getroot()
instruments = root.find('CityIme')
instrument = instruments.find_all("City 1 Name")
for grandchild in instrument:
temperature, pressure, humid = grandchild.find('Temp'), grandchild.find('Press'), grandchild.find('humid')
print(temperature.text, pressure.text, humid.text)

XML链接：https://vrijeme.hr/hrvatska_n.xml

谢谢！

英文:

have a little issue webscraping xml from web. It's a meteo data page for which I want to manually enter the city name in my python code and get data for it from xml file (temperature, humidity, pressure).

It's the first time I'm dealing with xml and not sure how to approach - anybody that could help me please?

When I run the code below I get printed everything (much longer data sheet than the one I copied here).

What I want to get is:

City name: City name
Temperature: 13.0
Pressure: 1013.9
Humidity: 86

>
> url ="linkx2xml.xml"
> page = requests.get(url)
> root = etree.parse(page.content).getroot()
> instruments = root.find('CityIme')
> instrument = instruments.find_all("City 1 Name")
> for grandchild in instrument:
> temperature, pressure, humid = grandchild.find('Temp'), grandchild.find('Press'), grandchild.find('humid')
> print(temperature.text, pressure.text, humid.text)

Link to xml: https://vrijeme.hr/hrvatska_n.xml

Thanks!

答案1

得分: 1

感谢bloodscript，将我引导到了正确的轨道上！从这里，我能够提供以下代码的解决方案。我相当确定这可以用更好的代码来解决，但对于初学者来说，它完成了需要完成的任务
只有这一行需要考虑一下：

if root[i][0].text == grad:

我认为这应该以某种方式更好。

英文:

Thank you bloodscript, for putting me on right track!
From this, I was able to provide solution with this code below. I'm pretty sure this could be solved with much better code, but in the eyes of the beginner it does what it needs to be done
Only this line is something to think about:

if root[i][0].text == grad:

I believe it should be somehow, better.

def tempxml(grad):
    parser = etree.XMLParser()
    with urlopen(&#39;https://vrijeme.hr/hrvatska_n.xml&#39;) as f:
        tree = etree.parse(f, parser)
        root = tree.getroot()
    i = 0
    for child in root:
        #print(child.tag, child.attrib)
        if root[i][0].text == grad:
            city = root[i][0].text
            data = child.find(&quot;Podatci&quot;)
            temperature = data.find(&#39;Temp&#39;) 
            pressure = data.find(&#39;Tlak&#39;)
            humid = data.find(&#39;Vlaga&#39;) 
            return city, temperature.text, pressure.text, humid.text
        i+=1

答案2

得分: 0

Sure, here is the translated content:

你现在正在使用一个名为 ElementTree 的 Python 库，我想。您还可以使用其他与相同 API 兼容的实现，如 lxml，以及 Python 标准库中的 cElementTree。

首先，使用 XML 函数或解析文件构建一个 Element 实例 root，然后迭代子元素并搜索您想要检索的任何子标签：

import requests
import xml.etree.ElementTree as etree
# 获取 XML 文件并解析
url = "linkx2xml.xml"
page = requests.get(url)
tree = etree.parse(page.content)
root = tree.getroot()
# 遍历根元素的每个子元素，类型为 'Grad'
for child in root.iter('Grad'):
    # 保存城市名称
    cityname = child.find('GradIme')
    # 获取包含天气数据的子元素
    data = child.find("Podatci")
    # 保存位于 Podcatci 元素下的天气数据
    temperature = data.find('Temp')
    pressure = data.find('Tlak')
    humid = data.find('Vlaga')
    print(cityname.text + ":\t" + temperature.text + "\t" + pressure.text + "\t" + humid.text)

这里是文档链接：
https://docs.python.org/3/library/xml.etree.elementtree.html

英文:

You are using a python library called ElementTree I suppose. You could also use other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file and then iterate over the children and search for whatever child tags you would like to retrieve:

import requests
import xml.etree.ElementTree as etree
#get xml file and parse
url =&quot;linkx2xml.xml&quot; 
page = requests.get(url)
tree = etree.parse(page.content)
root = tree.getroot()
#iterate over every child of the root with the type &#39;Grad&#39;
for child in root.iter(&#39;Grad&#39;):
    # save cityname
    cityname = child.find(&#39;GradIme&#39;)
    # get child_element which contains the weather dataa
    data = child.find(&quot;Podatci&quot;)
    # save weather data located under Podcatci element
    temperature = data.find(&#39;Temp&#39;) 
    pressure = data.find(&#39;Tlak&#39;)
    humid = data.find(&#39;Vlaga&#39;) 
    print(cityname.text +&quot;:\t&quot;+temperature.text + &quot;\t&quot; + pressure.text +&quot;\t&quot;+ humid.text)

Here the documentation:
https://docs.python.org/3/library/xml.etree.elementtree.html

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – 从XML中抓取数据

问题

答案1

答案2

imdbpy – 无法获取剧集ID

Web抓取动态加载页面时出现问题。

如何在客户端存储API密钥（令牌），使用户无法查看或访问？

重现包含该色彩调色板的图像（IVIS机器）。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。