问题

I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)

source: https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/

I am using requests to get the information but still no luck.

url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/'
requests.get(url)

I got all the information except review section.

Screenshot

英文:

I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)

source : https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/

I am using requests to get the information but still no luck.

url = &#39;https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/&#39;
requests.get(url)

I got all the information except review section.

Screenshot

答案1

得分: 1

产品ID存储在主页面内的<script>元素中。要获取它，您可以使用以下示例：

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = soup.select_one('#__NEXT_DATA__')
data = json.loads(data.text)

# 取消注释以打印所有数据：
# print(json.dumps(data, indent=4))

print(data['props']['pageProps']['product']['id'])

打印结果：

注意：在您的截图中未显示HTTP标头，但有URL参数。

英文:

The product ID is stored on the main page inside <script> element. To get it you can use next example:

import json
import requests
from bs4 import BeautifulSoup

url = &#39;https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/&#39;

soup = BeautifulSoup(requests.get(url).content, &#39;html.parser&#39;)
data = soup.select_one(&#39;#__NEXT_DATA__&#39;)
data = json.loads(data.text)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print(data[&#39;props&#39;][&#39;pageProps&#39;][&#39;product&#39;][&#39;id&#39;])

Prints:

Note: On your screenshot is not shown HTTP header but URL parameter.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取用于网络爬虫的HTTP标头信息

问题

答案1

将数据传递给一个使用pyodbc的变量。

禁用 PostgreSQL 索引更新暂时，并稍后手动更新索引以提高插入语句性能。

确定哪位玩家在行动

使用Plotly中的下拉菜单更改数据框。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论