为什么在PyCharm中进行网页抓取时,我不断收到’None’作为响应?

huangapple go评论84阅读模式
英文:

Why do I keep getting 'None' as a response while webscraping in PyCharm?

问题

I'm going through the thenewboston Python tutorial for a web crawler, and I'm trying to follow his steps, but I have not been able to get what I want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/, however, it keeps returning "None."

import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'http://quotes.toscrape.com/page/' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('div', {'class': 'quote'}):
            href = link.get('quote')
            print(href)
        page += 1

trade_spider(1)

I tried a lot of things, however, I can't really find a YouTube tutorial on it.

英文:

Im going through thenewboston python tutorial for a web crawler and im tryong tp follow his steps but have not been able to get what i want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/
however it keeps returning back "None"

`import requests
from bs4 import BeautifulSoup


def trade_spider(max_pages):
    page = 1
    while page &lt;= max_pages:
        url = &#39;http://quotes.toscrape.com/page/&#39; + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, &quot;html.parser&quot;)
        for link in soup.findAll(&#39;div&#39;, {&#39;class&#39;: &#39;quote&#39;}):
            href = link.get(&#39;quote&#39;)
            print(href)
        page += 1


trade_spider(1)`

I tried a lot of things however cant really find a youtube tutorial on it.

答案1

得分: 0

Your bug lies with the line

href = link.get('quote')

link is of type Tag. You are calling the get method on it, which, according to the documentation, returns the value of the corresponding attribute. However, when you print your link variable, you can see that it is a div and does not have the quote attribute. Instead, you can access its span subtag to extract the quotes:

span = link.find('span', {'class': 'text'})
quote = span.text
print(quote)
英文:

Your bug lies with the line

href = link.get(&#39;quote&#39;)

link is of type Tag. You are calling the get method on it, which, according to the documentation, returns the value of the corresponding attribute. However, when you print your link variable, you can see that it is a div and does not have the quote attribute. Instead, you can access its span subtag to extract the quotes:

span = link.find(&#39;span&#39;, {&#39;class&#39;: &#39;text&#39;})
quote = span.text
print(quote)

huangapple
  • 本文由 发表于 2023年6月5日 04:13:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76402234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定