问题

我正在尝试抓取Span ID内的字段，但值不像使用find并从span中获取文本那样简单。

以下是网页的HTML。
HTML

我想要打印出"B0C4YKLXPQ"。

这是我尝试的所有失败的方法。

page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]
page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")
page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])

英文:

I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.

Below is the HTML from the webpage.
HTML

I am trying to print "B0C4YKLXPQ"

This gets me the

Below are all attempts that failed.

- page_soup.find(&quot;div&quot;, {&quot;id&quot;: &quot;twisterContainer&quot;}).find_all(&quot;data-asin&quot;)

- page_soup.find(&quot;div&quot;, {&quot;id&quot;: &quot;twisterContainer&quot;}).find(&quot;span&quot;, {&quot;id&quot;: &quot;fitRecommendationsSection&quot;}).span[&quot;data-asin&quot;]

- page_soup.find(&quot;div&quot;, {&quot;id&quot;: &quot;twisterContainer&quot;}).find(&quot;span&quot;, {&quot;id&quot;: &quot;fitRecommendationsSection&quot;}).find_all(&quot;data-asin&quot;)

- page_soup.find(&quot;div&quot;, {&quot;id&quot;: &quot;twisterContainer&quot;}).find_all(&quot;data-asin&quot;)

- page_soup.find(&quot;div&quot;, {&quot;id&quot;: &quot;twisterContainer&quot;}).find_all([&quot;data-asin&quot;])

答案1

得分: 1

以下是已翻译的代码部分：

以下代码有很大的可能性可以正常运行，除非您的IP由于一些原因被亚马逊列入黑名单，例如过多的网络爬取尝试：

import requests
from bs4 import BeautifulSoup as bs

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}

url = 'https://www.amazon.com/dp/B002G9UDYG'

r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')

item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)

终端中的结果：

B0C4YKLXPQ

BeautifulSoup文档可以在[这里](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)找到。

英文:

The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:

import requests
from bs4 import BeautifulSoup as bs

headers = {
    &#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36&#39;
}

url = &#39;https://www.amazon.com/dp/B002G9UDYG&#39;

r = requests.get(url, headers=headers)
soup = bs(r.text, &#39;html.parser&#39;)

item = soup.select_one(&#39;span[id=&quot;fitRecommendationsSection&quot;]&#39;).get(&#39;data-asin&#39;)
print(item)

Result in terminal:

B0C4YKLXPQ

BeautifulSoup documentation can be found here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python BeautifulSoup Span Scraping

问题

答案1

‘say’未被识别为内部或外部命令、可操作的程序或批处理文件。

无法通过pip安装xmlsec。

Tkinter – GUI：用户文本输入，带有检查输入异常并关闭窗口的按钮。

zmq socket bind fails with: "No such device (addr='tcp://wpan0:5556')" when Thread-network interface should be used

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论