如何获取用于网络爬虫的HTTP标头信息

huangapple go评论76阅读模式
英文:

How to get the http Header Information for web scraping

问题

I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)

source: https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/

I am using requests to get the information but still no luck.

url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/'
requests.get(url)

I got all the information except review section.

Screenshot

英文:

I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)

source : https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/

I am using requests to get the information but still no luck.

url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/'
requests.get(url)

I got all the information except review section.

Screenshot

答案1

得分: 1

产品ID存储在主页面内的<script>元素中。要获取它,您可以使用以下示例:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = soup.select_one('#__NEXT_DATA__')
data = json.loads(data.text)

# 取消注释以打印所有数据:
# print(json.dumps(data, indent=4))

print(data['props']['pageProps']['product']['id'])

打印结果:

85397

注意:在您的截图中未显示HTTP标头,但有URL参数。

英文:

The product ID is stored on the main page inside &lt;script&gt; element. To get it you can use next example:

import json
import requests
from bs4 import BeautifulSoup

url = &#39;https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/&#39;

soup = BeautifulSoup(requests.get(url).content, &#39;html.parser&#39;)
data = soup.select_one(&#39;#__NEXT_DATA__&#39;)
data = json.loads(data.text)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print(data[&#39;props&#39;][&#39;pageProps&#39;][&#39;product&#39;][&#39;id&#39;])

Prints:

85397

Note: On your screenshot is not shown HTTP header but URL parameter.

huangapple
  • 本文由 发表于 2023年5月21日 03:40:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76297047.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定