从IMDb网站上爬取前250部电影

huangapple go评论93阅读模式
英文:

Scraping top 250 movies from imdb website

问题

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all("li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)

我想使用Python中的Beautiful Soup从IMDb网站上爬取前250部电影,但我的输出为空。

英文:

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

import requests
from bs4 import BeautifulSoup


url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
    "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)

答案1

得分: 2

您的问题与Python requests. 403 Forbidden有关。

所以,我尝试了与StackOverflow线程中建议的相同方法并成功了。以下是对我有效的代码:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
    "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)
英文:

Your problem is related with Python requests. 403 Forbidden

So, I've tried the same as suggested in the StackOverflow thread and work. This is the code that works for me:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
    "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)

huangapple
  • 本文由 发表于 2023年6月30日 04:28:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584417.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定