尝试使用Python和Jupyter Notebook进行网页抓取时收到403错误。

huangapple go评论73阅读模式
英文:

Getting error 403 when attempting to webscrape using python and jupyter notebook

问题

I am trying to webscrape some football stats. I am unsure why I am getting the Error 403. I have searched online and I think it has something to do with restricted access? Is there anyway to get past this?

Thank you!

Here is what I have coded so far

I have tried using .session() or .text() but both have not seemed to work.

英文:

I am trying to webscrape some football stats. I am unsure why I am getting the Error 403. I have searched online and I think it has something to do with restricted access? Is there anyway to get past this?

Thank you!

Here is what I have coded so far

I have tried using .session() or .text() but both have not seemed to work.

答案1

得分: 0

你应该将你的User-Agent请求头设置为常见的值。如果不设置,很容易被识别为非人类请求,许多网站会阻止没有常见User-Agent的请求。

下面的代码设置User-Agent,使得你的请求看起来像是来自Windows上的Chrome浏览器:

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'}

r = requests.get(test_url, headers=headers)
英文:

You should set your User-Agent request header to be something common. Without setting this it is very obvious that your request is not coming from a human and many websites will block requests without a common User-Agent.
Below code sets the User-Agent like your request is coming from a chrome browser for windows.

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'}

r = requests.get(test_url, headers=headers)

huangapple
  • 本文由 发表于 2023年5月21日 15:48:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定