英文:
Python BeautifulSoup Span Scraping
问题
我正在尝试抓取Span ID内的字段,但值不像使用find并从span中获取文本那样简单。
以下是网页的HTML。
HTML
我想要打印出"B0C4YKLXPQ"。
这是我尝试的所有失败的方法。
-
page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
-
page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]
-
page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")
-
page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
-
page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])
英文:
I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.
Below is the HTML from the webpage.
HTML
I am trying to print "B0C4YKLXPQ"
This gets me the
Below are all attempts that failed.
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])
答案1
得分: 1
以下是已翻译的代码部分:
以下代码有很大的可能性可以正常运行,除非您的IP由于一些原因被亚马逊列入黑名单,例如过多的网络爬取尝试:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.amazon.com/dp/B002G9UDYG'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)
终端中的结果:
B0C4YKLXPQ
BeautifulSoup文档可以在[这里](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)找到。
英文:
The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.amazon.com/dp/B002G9UDYG'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)
Result in terminal:
B0C4YKLXPQ
BeautifulSoup documentation can be found here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论