2023年1月9日 01:48:31go评论112阅读模式

英文:

HTML parser find tag info

问题

我有一个项目，其中使用了HTMLParser()。我从未使用过这个解析器，所以我阅读了文档，并找到了两个有用的方法，我可以重写这些方法来从网站中提取信息：handle_starttag和handle_data。但我不明白如何找到所需的标签信息并将其传递给handle_data以打印信息。

我需要从页面上的所有span标签中获取价格

&lt;span itemprop=&quot;price&quot; content=&quot;590&quot;&gt;590美元&lt;/span&gt;

我该如何做到这一点？

英文:

I have a project where uses HTMLParser(). I never worked with this parser, so I read the documentation and found two useful methods I can override to extract information from the site: handle_starttag and handle_data. But I don't understand how to find needed tags info and pass the to handle_data to print info.

I need to get the price from all span tags on the page

&lt;span itemprop=&quot;price&quot; content=&quot;590&quot;&gt;590 dollars&lt;/span&gt;

How do I get this?

答案1

得分: 1

如果每个<span>价格标签都有itemprop属性为"price"，并且美元金额在content属性中，那么你可以在hanlde_starttag中像这样完成它：

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        attrsDict = dict(attrs)
        if tag == 'span' and attrsDict['itemprop'] == 'price':
            price = attrsDict['content']
            print(price)
            # 在这里对`price`执行其他操作
# 示例测试案例
parser = MyHTMLParser()
parser.feed('''
<span itemprop="price" content="590">590 dollars</span>
<span itemprop="price" content="430">430 dollars</span>
<span itemprop="price" content="684">684 dollars</span>
''')

希望这对你有帮助。

英文:

If every <span> price tag has the itemprop attribute of "price" and the dollar amount is in the content attribute, then you can do it all in hanlde_starttag like this:

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        attrsDict = dict(attrs)
        if tag == &#39;span&#39; and attrsDict[&#39;itemprop&#39;] == &#39;price&#39;:
            price = attrsDict[&#39;content&#39;]
            print(price)
            # do something else with `price` here
# Example test cases
parser = MyHTMLParser()
parser.feed(&quot;&quot;&quot;
&lt;span itemprop=&quot;price&quot; content=&quot;590&quot;&gt;590 dollars&lt;/span&gt;
&lt;span itemprop=&quot;price&quot; content=&quot;430&quot;&gt;430 dollars&lt;/span&gt;
&lt;span itemprop=&quot;price&quot; content=&quot;684&quot;&gt;684 dollars&lt;/span&gt;
            &quot;&quot;&quot;)

答案2

得分: 1

这个示例将初始化自定义的 HTMLParser 并获取 <span> 标签之间的文本（使用 handle_data）：

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self._price_tag = None
        self.prices = []
    def handle_starttag(self, tag, attrs):
        if tag == "span" and ('itemprop', 'price') in attrs:
            self._price_tag = tag
    def handle_endtag(self, tag):
        if tag == self._price_tag:
            self._price_tag = None
    def handle_data(self, data):
        if self._price_tag:
            self.prices.append(data)
parser = MyHTMLParser()
parser.feed("""
<html>
    <span itemprop="price" content="570">570 dollars</span>
    <span itemprop="price" content="590">590 dollars</span>
</html>
""")
print(parser.prices)

打印结果：

['570 dollars', '590 dollars']

英文:

This example will initialize custom HTMLParser and get the text between the <span> tags (using handle_data):

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self._price_tag = None
        self.prices = []
    def handle_starttag(self, tag, attrs):
        if tag == &quot;span&quot; and (&#39;itemprop&#39;, &#39;price&#39;) in attrs:
            self._price_tag = tag
    def handle_endtag(self, tag):
        if tag == self._price_tag:
            self._price_tag = None
    def handle_data(self, data):
        if self._price_tag:
            self.prices.append(data)
parser = MyHTMLParser()
parser.feed(r&quot;&quot;&quot;\
&lt;html&gt;
    &lt;span itemprop=&quot;price&quot; content=&quot;570&quot;&gt;570 dollars&lt;/span&gt;
    &lt;span itemprop=&quot;price&quot; content=&quot;590&quot;&gt;590 dollars&lt;/span&gt;
&lt;/html&gt;
&quot;&quot;&quot;
)
print(parser.prices)

Prints:

[&#39;570 dollars&#39;, &#39;590 dollars&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

HTML解析器查找标签信息

问题

答案1

答案2

Powershell 基于另一列添加成员/列

表单与可迭代字段

确定一个100×100矩阵的左特征向量和特征值

Change color of popup (angular, TS) 改变弹出框的颜色 (Angular, TypeScript)

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。