英文:
HTML parser find tag info
问题
我有一个项目,其中使用了HTMLParser()。我从未使用过这个解析器,所以我阅读了文档,并找到了两个有用的方法,我可以重写这些方法来从网站中提取信息:handle_starttag和handle_data。但我不明白如何找到所需的标签信息并将其传递给handle_data以打印信息。
我需要从页面上的所有span标签中获取价格
<span itemprop="price" content="590">590美元</span>
我该如何做到这一点?
英文:
I have a project where uses HTMLParser(). I never worked with this parser, so I read the documentation and found two useful methods I can override to extract information from the site: handle_starttag and handle_data. But I don't understand how to find needed tags info and pass the to handle_data to print info.
I need to get the price from all span tags on the page
<span itemprop="price" content="590">590 dollars</span>
How do I get this?
答案1
得分: 1
如果每个<span>价格标签都有itemprop属性为"price",并且美元金额在content属性中,那么你可以在hanlde_starttag中像这样完成它:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
attrsDict = dict(attrs)
if tag == 'span' and attrsDict['itemprop'] == 'price':
price = attrsDict['content']
print(price)
# 在这里对`price`执行其他操作
# 示例测试案例
parser = MyHTMLParser()
parser.feed('''
<span itemprop="price" content="590">590 dollars</span>
<span itemprop="price" content="430">430 dollars</span>
<span itemprop="price" content="684">684 dollars</span>
''')
希望这对你有帮助。
英文:
If every <span> price tag has the itemprop attribute of "price" and the dollar amount is in the content attribute, then you can do it all in hanlde_starttag like this:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
attrsDict = dict(attrs)
if tag == 'span' and attrsDict['itemprop'] == 'price':
price = attrsDict['content']
print(price)
# do something else with `price` here
# Example test cases
parser = MyHTMLParser()
parser.feed("""
<span itemprop="price" content="590">590 dollars</span>
<span itemprop="price" content="430">430 dollars</span>
<span itemprop="price" content="684">684 dollars</span>
""")
答案2
得分: 1
这个示例将初始化自定义的 HTMLParser 并获取 <span> 标签之间的文本(使用 handle_data):
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self._price_tag = None
self.prices = []
def handle_starttag(self, tag, attrs):
if tag == "span" and ('itemprop', 'price') in attrs:
self._price_tag = tag
def handle_endtag(self, tag):
if tag == self._price_tag:
self._price_tag = None
def handle_data(self, data):
if self._price_tag:
self.prices.append(data)
parser = MyHTMLParser()
parser.feed("""
<html>
<span itemprop="price" content="570">570 dollars</span>
<span itemprop="price" content="590">590 dollars</span>
</html>
""")
print(parser.prices)
打印结果:
['570 dollars', '590 dollars']
英文:
This example will initialize custom HTMLParser and get the text between the <span> tags (using handle_data):
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self._price_tag = None
self.prices = []
def handle_starttag(self, tag, attrs):
if tag == "span" and ('itemprop', 'price') in attrs:
self._price_tag = tag
def handle_endtag(self, tag):
if tag == self._price_tag:
self._price_tag = None
def handle_data(self, data):
if self._price_tag:
self.prices.append(data)
parser = MyHTMLParser()
parser.feed(r"""\
<html>
<span itemprop="price" content="570">570 dollars</span>
<span itemprop="price" content="590">590 dollars</span>
</html>
"""
)
print(parser.prices)
Prints:
['570 dollars', '590 dollars']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论