英文:
Scraping top 250 movies from imdb website
问题
I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.
import requests
from bs4 import BeautifulSoup
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all("li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
print(x.h3.text)
我想使用Python中的Beautiful Soup从IMDb网站上爬取前250部电影,但我的输出为空。
英文:
I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.
import requests
from bs4 import BeautifulSoup
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
"li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
print(x.h3.text)
答案1
得分: 2
您的问题与Python requests. 403 Forbidden有关。
所以,我尝试了与StackOverflow线程中建议的相同方法并成功了。以下是对我有效的代码:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
"li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
print(x.h3.text)
英文:
Your problem is related with Python requests. 403 Forbidden
So, I've tried the same as suggested in the StackOverflow thread and work. This is the code that works for me:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
"li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
print(x.h3.text)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论