问题

I'm going through the thenewboston Python tutorial for a web crawler, and I'm trying to follow his steps, but I have not been able to get what I want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/, however, it keeps returning "None."

import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'http://quotes.toscrape.com/page/' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('div', {'class': 'quote'}):
            href = link.get('quote')
            print(href)
        page += 1

trade_spider(1)

I tried a lot of things, however, I can't really find a YouTube tutorial on it.

英文:

Im going through thenewboston python tutorial for a web crawler and im tryong tp follow his steps but have not been able to get what i want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/
however it keeps returning back "None"

`import requests
from bs4 import BeautifulSoup


def trade_spider(max_pages):
    page = 1
    while page &lt;= max_pages:
        url = &#39;http://quotes.toscrape.com/page/&#39; + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, &quot;html.parser&quot;)
        for link in soup.findAll(&#39;div&#39;, {&#39;class&#39;: &#39;quote&#39;}):
            href = link.get(&#39;quote&#39;)
            print(href)
        page += 1


trade_spider(1)`

I tried a lot of things however cant really find a youtube tutorial on it.

答案1

得分: 0

Your bug lies with the line

href = link.get('quote')

link is of type Tag. You are calling the get method on it, which, according to the documentation, returns the value of the corresponding attribute. However, when you print your link variable, you can see that it is a div and does not have the quote attribute. Instead, you can access its span subtag to extract the quotes:

span = link.find('span', {'class': 'text'})
quote = span.text
print(quote)

英文:

Your bug lies with the line

href = link.get(&#39;quote&#39;)

span = link.find(&#39;span&#39;, {&#39;class&#39;: &#39;text&#39;})
quote = span.text
print(quote)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在PyCharm中进行网页抓取时，我不断收到’None’作为响应？

问题

答案1

How to drop duplicate rows using value_counts and also using a condition that uses the actual value in a column using pandas?

FileNotFoundError: [Errno 2] No such file or directory: while exporting a parquet file from pandas dataframe

如何使用VSCode调试Python命令行二进制文件（特指Poetry）。

TarFile.extractall基本路径错误，python？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论