从IMDb网站上爬取前250部电影

huangapple go评论122阅读模式
英文:

Scraping top 250 movies from imdb website

问题

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

  1. import requests
  2. from bs4 import BeautifulSoup
  3. url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
  4. response = requests.get(url)
  5. html_content = response.content
  6. soup = BeautifulSoup(html_content, "html.parser")
  7. movies = soup.find_all("li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
  8. for x in movies:
  9. print(x.h3.text)

我想使用Python中的Beautiful Soup从IMDb网站上爬取前250部电影,但我的输出为空。

英文:

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

  1. import requests
  2. from bs4 import BeautifulSoup
  3. url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
  4. response = requests.get(url)
  5. html_content = response.content
  6. soup = BeautifulSoup(html_content, "html.parser")
  7. movies = soup.find_all(
  8. "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
  9. for x in movies:
  10. print(x.h3.text)

答案1

得分: 2

您的问题与Python requests. 403 Forbidden有关。

所以,我尝试了与StackOverflow线程中建议的相同方法并成功了。以下是对我有效的代码:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
  4. url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
  5. response = requests.get(url, headers=headers)
  6. html_content = response.content
  7. soup = BeautifulSoup(html_content, "html.parser")
  8. movies = soup.find_all(
  9. "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
  10. for x in movies:
  11. print(x.h3.text)
英文:

Your problem is related with Python requests. 403 Forbidden

So, I've tried the same as suggested in the StackOverflow thread and work. This is the code that works for me:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
  4. url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
  5. response = requests.get(url, headers=headers)
  6. html_content = response.content
  7. soup = BeautifulSoup(html_content, "html.parser")
  8. movies = soup.find_all(
  9. "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
  10. for x in movies:
  11. print(x.h3.text)

huangapple
  • 本文由 发表于 2023年6月30日 04:28:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584417.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定