英文:
Why am I getting this error? AttributeError: 'NoneType' object has no attribute 'attrs'
问题
以下是您要翻译的代码部分:
from requests_html import HTMLSession
s = HTMLSession()
url = 'https://lakesshoweringspaces.com/catalogue-product-filter/page/1'
r = s.get(url)
products = r.html.find('article.contentwrapper section')
for item in products:
print(item.find('a', first=True).attrs['href'])
英文:
I am trying to scrape all of the href attributes from the page below:
Problem is I get the first link but I get an error at that point. Can anyone show me how to fix this please? I am still learning about Python.
Many thanks
from requests_html import HTMLSession
s = HTMLSession()
url = 'https://lakesshoweringspaces.com/catalogue-product-filter/page/1'
r = s.get(url)
products = r.html.find('article.contentwrapper section')
for item in products:
print(item.find('a', first=True).attrs['href'])
答案1
得分: 0
根据Brian的评论进一步说明,您的代码假定item.find('a', first=True)
将成功找到一个元素。如果没有找到,该代码将返回None
,然后您正在请求None.attrs['href']
(而None
没有attrs
属性,因此会出现错误消息)。
如果我们重写您的代码以实际检查find
方法的返回值:
from requests_html import HTMLSession
s = HTMLSession()
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"
r = s.get(url)
products = r.html.find("article.contentwrapper section")
for item in products:
res = item.find("a", first=True)
if res:
print(res.attrs["href"])
else:
print("no match")
然后我们会发现它无法在每个循环迭代中找到任何a
元素:
https://.../catalogue_product/alassio/?
no match
https://.../catalogue_product/amare/?
no match
https://.../catalogue_product/ambient/?
no match
https://.../catalogue_product/andora/?
no match
https://.../catalogue_product/antigua/?
no match
https://.../catalogue_product/aruba/?
no match
https://.../catalogue_product/avanza/?
no match
https://.../catalogue_product/barbados/?
no match
https://.../catalogue_product/bergen-bi-fold-door/?
no match
https://.../catalogue_product/framed-bi-fold-door/?
no match
https://.../catalogue_product/semi-frameless-bi-fold-door/?
no match
https://.../catalogue_product/cannes-10mm/?
这是因为您的表达式"article.contentwrapper section"
同时匹配具有类collection-wrapper-item
和类compare_favorites_section
的部分,后者不包含任何a
元素。
如果您修改代码以更具选择性:
from requests_html import HTMLSession
s = HTMLSession()
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"
r = s.get(url)
products = r.html.find("article.contentwrapper section.collection-wrapper-item")
for item in products:
res = item.find("a", first=True)
if res:
print(res.attrs["href"])
else:
print("no match")
那么您将可靠地找到链接。运行上述代码会产生以下结果:
https://.../catalogue_product/alassio/?
https://.../catalogue_product/amare/?
https://.../catalogue_product/ambient/?
https://.../catalogue_product/andora/?
https://.../catalogue_product/antigua/?
https://.../catalogue_product/aruba/?
https://.../catalogue_product/avanza/?
https://.../catalogue_product/barbados/?
https://.../catalogue_product/bergen-bi-fold-door/?
https://.../catalogue_product/framed-bi-fold-door/?
https://.../catalogue_product/semi-frameless-bi-fold-door/?
https://.../catalogue_product/cannes-10mm/?
英文:
To expand on Brian's comment, your code assumes that item.find('a', first=True
) will successfully find an element. If it doesn't, that code returns None
, and then you're asking for None.attrs['href']
(and None
doesn't have an attrs
attribute, hence the error message).
If we rewrite your code to actually check the return value of the find
method:
from requests_html import HTMLSession
s = HTMLSession()
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"
r = s.get(url)
products = r.html.find("article.contentwrapper section")
for item in products:
res = item.find("a", first=True)
if res:
print(res.attrs["href"])
else:
print("no match")
Then we find that it fails to find any a
elements in every other loop iteration:
https://.../catalogue_product/alassio/?
no match
https://.../catalogue_product/amare/?
no match
https://.../catalogue_product/ambient/?
no match
https://.../catalogue_product/andora/?
no match
https://.../catalogue_product/antigua/?
no match
https://.../catalogue_product/aruba/?
no match
https://.../catalogue_product/avanza/?
no match
https://.../catalogue_product/barbados/?
no match
https://.../catalogue_product/bergen-bi-fold-door/?
no match
https://.../catalogue_product/framed-bi-fold-door/?
no match
https://.../catalogue_product/semi-frameless-bi-fold-door/?
no match
https://.../catalogue_product/cannes-10mm/?
no match
And that's because your expression article.contentwrapper section
is matching both sections with class collection-wrapper-item
and sections with class compare_favorites_section
, the latter of which contain no a
elements.
If you modify your code to be more selective:
from requests_html import HTMLSession
s = HTMLSession()
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"
r = s.get(url)
products = r.html.find("article.contentwrapper section.collection-wrapper-item")
for item in products:
res = item.find("a", first=True)
if res:
print(res.attrs["href"])
else:
print("no match")
Then you will reliably find the links. Running the above produces:
https://.../catalogue_product/alassio/?
https://.../catalogue_product/amare/?
https://.../catalogue_product/ambient/?
https://.../catalogue_product/andora/?
https://.../catalogue_product/antigua/?
https://.../catalogue_product/aruba/?
https://.../catalogue_product/avanza/?
https://.../catalogue_product/barbados/?
https://.../catalogue_product/bergen-bi-fold-door/?
https://.../catalogue_product/framed-bi-fold-door/?
https://.../catalogue_product/semi-frameless-bi-fold-door/?
https://.../catalogue_product/cannes-10mm/?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论