英文:
Getting error 403 when attempting to webscrape using python and jupyter notebook
问题
I am trying to webscrape some football stats. I am unsure why I am getting the Error 403. I have searched online and I think it has something to do with restricted access? Is there anyway to get past this?
Thank you!
Here is what I have coded so far
I have tried using .session() or .text() but both have not seemed to work.
英文:
I am trying to webscrape some football stats. I am unsure why I am getting the Error 403. I have searched online and I think it has something to do with restricted access? Is there anyway to get past this?
Thank you!
Here is what I have coded so far
I have tried using .session() or .text() but both have not seemed to work.
答案1
得分: 0
你应该将你的User-Agent请求头设置为常见的值。如果不设置,很容易被识别为非人类请求,许多网站会阻止没有常见User-Agent的请求。
下面的代码设置User-Agent,使得你的请求看起来像是来自Windows上的Chrome浏览器:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'}
r = requests.get(test_url, headers=headers)
英文:
You should set your User-Agent request header to be something common. Without setting this it is very obvious that your request is not coming from a human and many websites will block requests without a common User-Agent.
Below code sets the User-Agent like your request is coming from a chrome browser for windows.
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'}
r = requests.get(test_url, headers=headers)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论