英文:
How to get the http Header Information for web scraping
问题
I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)
source: https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/
I am using requests to get the information but still no luck.
url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/'
requests.get(url)
I got all the information except review section.
英文:
I am new to web scraping,
How can i get the product ID from the HTTP header (screenshot attached)
source : https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/
I am using requests to get the information but still no luck.
url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a04-3gb-32gb/'
requests.get(url)
I got all the information except review section.
答案1
得分: 1
产品ID存储在主页面内的<script>
元素中。要获取它,您可以使用以下示例:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = soup.select_one('#__NEXT_DATA__')
data = json.loads(data.text)
# 取消注释以打印所有数据:
# print(json.dumps(data, indent=4))
print(data['props']['pageProps']['product']['id'])
打印结果:
85397
注意:在您的截图中未显示HTTP标头,但有URL参数。
英文:
The product ID is stored on the main page inside <script>
element. To get it you can use next example:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.pickaboo.com/product-detail/samsung-galaxy-a03-3gb-32gb/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = soup.select_one('#__NEXT_DATA__')
data = json.loads(data.text)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
print(data['props']['pageProps']['product']['id'])
Prints:
85397
Note: On your screenshot is not shown HTTP header but URL parameter.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论