尝试在 HTML 元素上使用 BeautifulSoup4 运行 for 循环,但它不会迭代。

huangapple go评论60阅读模式
英文:

Trying to run a for loop on a html element while using bs4 but it does not iterate

问题

from bs4 import BeautifulSoup
import requests

# 发送 HTTP 请求并获取页面内容
html_text = requests.get('https://www.freshplaza.com/europe/content/retailers/').text

# 使用 BeautifulSoup 解析页面
soup = BeautifulSoup(html_text, 'lxml')

# 找到包含零售商信息的 div 元素
retailer_links = soup.find_all('div', {'id': 'retailers'})

# 遍历零售商信息并打印 h2 标签的文本
for el in retailer_links:
    print(el.h2.text)

# 检查是否有零售商名称为 'Afghanistan'
for el in retailer_links:
    if el.h2.text == 'Afghanistan':
        print(True)
    else:
        print(False)

# 检查是否有零售商名称为 'Hong Kong'
for el in retailer_links:
    if el.h2.text == 'Hong Kong':
        print(True)
    else:
        print(False)

这段代码尝试使用 BeautifulSoup 库从给定的网站中获取国家名称、市场名称和链接。它首先发送 HTTP 请求以获取页面内容,然后使用 BeautifulSoup 进行解析。然后,它遍历零售商信息,打印每个零售商的 h2 标签文本,并检查是否存在名为 'Afghanistan' 和 'Hong Kong' 的零售商。

英文:

I am trying to get the country names, market names and the url's from the given website below by using BeautifulSoup library. I am trying to get countrynames by a for loop but it only gives me the first one.

I was expecting it to iterate through all countries but it does not do that.

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.freshplaza.com/europe/content/retailers/').text
soup = BeautifulSoup(html_text, 'lxml')
retailer_links = soup.find_all('div', {'id':'retailers'} )
for el in retailer_links:
    print(el.h2.text)

for el in retailer_links:
    if el.h2.text == 'Afghanistan':
        print(True)
    else: 
        print(False)



for el in retailer_links:
    if el.h2.text == 'Hong Kong':
        print(True)
    else: 
        print(False)

答案1

得分: 0

el.h2 意味着元素的 h2 属性,而不是 DIV 元素内的 <h2> 元素。

您可以使用 soup.select() 获取 retailers div 内的所有 h2 元素:

retailer_links = soup.select("div#retailers h2")
for el in retailer_links:
    text = el.get_text()
    if text == 'Afghanistan':
        print('Afghanistan found')
    elif text == 'Hong Kong':
        print('Hong Kong found')
英文:

el.h2 means the h2 attribute of the element, not the <h2> elements that are inside the DIV.

You can use soup.select() to get all the h2 elements inside the retailers div:

retailer_links = soup.select("div#retailers h2")
for el in retailer_links:
    text = el.get_text()
    if text == 'Afghanistan':
        print('Afghanistan found')
    elif text == 'Hong Kong'
        print('Hong Kong found')

huangapple
  • 本文由 发表于 2023年2月6日 05:08:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75355518.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定