无法使用BeautifulSoup抓取网站信息。

huangapple go评论103阅读模式
英文:

Unable to Scrape Website using beautifulsoup

问题

I want to scrape product names and prices from the website:
https://www.carrefouruae.com/mafuae/en/c/F21600000

import requests
html = requests.get('https://www.carrefouruae.com/mafuae/en/c/F21600000')
soup = BeautifulSoup(html.content, "html5lib")
soup.findAll('ul', attrs={'class':'css-1wgjvs'})

它返回一个空列表。它无法获取包含产品名称的实际页面源代码。原因是什么?我如何从该网站获取产品详情?

英文:

I want to scrape product names and prices from the website :
https://www.carrefouruae.com/mafuae/en/c/F21600000

import requests
html = requests.get('https://www.carrefouruae.com/mafuae/en/c/F21600000')
soup = BeautifulSoup(html.content, "html5lib")
soup.findAll('ul',attrs={'class':'css-1wgjvs'})

It's returning an empty list. It's unable to fetch the actual page source with the product names. What is the reason? How can I fetch the product details from the site?

答案1

得分: 2

以下是您提供的Python代码的翻译:

import requests
import json

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0',
    'appId': 'Reactweb',
    'storeId': 'mafuae',
}

def main():
    with requests.session() as req:
        req.headers.update(headers)
        params = {
            "areaCode": "Dubai Festival City - Dubai",
            "currentPage": "0",
            "depth": "3",
            "displayCurr": "AED",
            "filter": "",
            "lang": "en",
            "latitude": "25.2171003",
            "longitude": "55.3613635",
            "maxPrice": "",
            "minPrice": "",
            "needVariantsData": "true",
            "nextOffset": "",
            "pageSize": "60",
            "requireSponsProducts": "true",
            "responseWithCatTree": "true",
            "sortBy": "relevance"
        }
        r = req.get('https://www.carrefouruae.com/api/v8/categories/F21600000', params=params)
        with open('data.json', 'w', encoding='utf-8-sig') as f:
            json.dump(r.json(), f, indent=4)

if __name__ == "__main__":
    main()

希望这对您有所帮助。如果您有任何其他需要,请随时告诉我。

英文:
import requests
import json


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0',
    'appId': 'Reactweb',
    'storeId': 'mafuae',
}


def main():
    with requests.session() as req:
        req.headers.update(headers)
        params = {
            "areaCode": "Dubai Festival City - Dubai",
            "currentPage": "0",
            "depth": "3",
            "displayCurr": "AED",
            "filter": "",
            "lang": "en",
            "latitude": "25.2171003",
            "longitude": "55.3613635",
            "maxPrice": "",
            "minPrice": "",
            "needVariantsData": "true",
            "nextOffset": "",
            "pageSize": "60",
            "requireSponsProducts": "true",
            "responseWithCatTree": "true",
            "sortBy": "relevance"
        }
        r = req.get(
            'https://www.carrefouruae.com/api/v8/categories/F21600000', params=params)
        with open('data.json', 'w', encoding='utf-8-sig') as f:
            json.dump(r.json(), f, indent=4)


if __name__ == "__main__":
    main()

huangapple
  • 本文由 发表于 2023年5月25日 21:17:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76332741.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定