为什么在PyCharm中进行网页抓取时,我不断收到’None’作为响应?

huangapple go评论124阅读模式
英文:

Why do I keep getting 'None' as a response while webscraping in PyCharm?

问题

I'm going through the thenewboston Python tutorial for a web crawler, and I'm trying to follow his steps, but I have not been able to get what I want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/, however, it keeps returning "None."

  1. import requests
  2. from bs4 import BeautifulSoup
  3. def trade_spider(max_pages):
  4. page = 1
  5. while page <= max_pages:
  6. url = 'http://quotes.toscrape.com/page/' + str(page)
  7. source_code = requests.get(url)
  8. plain_text = source_code.text
  9. soup = BeautifulSoup(plain_text, "html.parser")
  10. for link in soup.findAll('div', {'class': 'quote'}):
  11. href = link.get('quote')
  12. print(href)
  13. page += 1
  14. trade_spider(1)

I tried a lot of things, however, I can't really find a YouTube tutorial on it.

英文:

Im going through thenewboston python tutorial for a web crawler and im tryong tp follow his steps but have not been able to get what i want. I want to get all the quotes from this website https://quotes.toscrape.com/page/1/
however it keeps returning back "None"

  1. `import requests
  2. from bs4 import BeautifulSoup
  3. def trade_spider(max_pages):
  4. page = 1
  5. while page &lt;= max_pages:
  6. url = &#39;http://quotes.toscrape.com/page/&#39; + str(page)
  7. source_code = requests.get(url)
  8. plain_text = source_code.text
  9. soup = BeautifulSoup(plain_text, &quot;html.parser&quot;)
  10. for link in soup.findAll(&#39;div&#39;, {&#39;class&#39;: &#39;quote&#39;}):
  11. href = link.get(&#39;quote&#39;)
  12. print(href)
  13. page += 1
  14. trade_spider(1)`

I tried a lot of things however cant really find a youtube tutorial on it.

答案1

得分: 0

Your bug lies with the line

  1. href = link.get('quote')

link is of type Tag. You are calling the get method on it, which, according to the documentation, returns the value of the corresponding attribute. However, when you print your link variable, you can see that it is a div and does not have the quote attribute. Instead, you can access its span subtag to extract the quotes:

  1. span = link.find('span', {'class': 'text'})
  2. quote = span.text
  3. print(quote)
英文:

Your bug lies with the line

  1. href = link.get(&#39;quote&#39;)

link is of type Tag. You are calling the get method on it, which, according to the documentation, returns the value of the corresponding attribute. However, when you print your link variable, you can see that it is a div and does not have the quote attribute. Instead, you can access its span subtag to extract the quotes:

  1. span = link.find(&#39;span&#39;, {&#39;class&#39;: &#39;text&#39;})
  2. quote = span.text
  3. print(quote)

huangapple
  • 本文由 发表于 2023年6月5日 04:13:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76402234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定