问题

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all("li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)

我想使用Python中的Beautiful Soup从IMDb网站上爬取前250部电影，但我的输出为空。

英文:

I want to scrape top 250 movies from imdb website by beautiful soup in python but it returns nothing in my output.

import requests
from bs4 import BeautifulSoup


url = &quot;https://www.imdb.com/chart/top/?ref_=nv_mv_250&quot;
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, &quot;html.parser&quot;)
movies = soup.find_all(
    &quot;li&quot;, class_=&quot;ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent&quot;)
for x in movies:
    print(x.h3.text)

答案1

得分: 2

您的问题与Python requests. 403 Forbidden有关。

所以，我尝试了与StackOverflow线程中建议的相同方法并成功了。以下是对我有效的代码：

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
movies = soup.find_all(
    "li", class_="ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent")
for x in movies:
    print(x.h3.text)

英文:

Your problem is related with Python requests. 403 Forbidden

So, I've tried the same as suggested in the StackOverflow thread and work. This is the code that works for me:

import requests
from bs4 import BeautifulSoup


headers = {&#39;User-Agent&#39;: &#39;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36&#39;}
url = &quot;https://www.imdb.com/chart/top/?ref_=nv_mv_250&quot;
response = requests.get(url, headers=headers)
html_content = response.content
soup = BeautifulSoup(html_content, &quot;html.parser&quot;)
movies = soup.find_all(
    &quot;li&quot;, class_=&quot;ipc-metadata-list-summary-item sc-bca49391-0 eypSaE cli-parent&quot;)
for x in movies:
    print(x.h3.text)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从IMDb网站上爬取前250部电影

问题

答案1

如何将内嵌有列表的字典转换为Pandas DataFrame？

Jupyter Notebook导出时省略了Markdown。

如何在Python中获取嵌套的defaultdict中的三个最大值？

“execute statement”中出现意外的关键字参数。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论