AttributeError: ‘NoneType’对象没有属性’attrs’。

huangapple go评论54阅读模式
英文:

Why am I getting this error? AttributeError: 'NoneType' object has no attribute 'attrs'

问题

以下是您要翻译的代码部分:

from requests_html import HTMLSession

s = HTMLSession()

url = 'https://lakesshoweringspaces.com/catalogue-product-filter/page/1'

r = s.get(url)

products = r.html.find('article.contentwrapper section')

for item in products:
  print(item.find('a', first=True).attrs['href'])
英文:

I am trying to scrape all of the href attributes from the page below:

Problem is I get the first link but I get an error at that point. Can anyone show me how to fix this please? I am still learning about Python.

Many thanks

from requests_html import HTMLSession

s = HTMLSession()

url = 'https://lakesshoweringspaces.com/catalogue-product-filter/page/1'

r = s.get(url)

products = r.html.find('article.contentwrapper section')

for item in products:
  print(item.find('a', first=True).attrs['href'])

答案1

得分: 0

根据Brian的评论进一步说明,您的代码假定item.find('a', first=True)将成功找到一个元素。如果没有找到,该代码将返回None,然后您正在请求None.attrs['href'](而None没有attrs属性,因此会出现错误消息)。

如果我们重写您的代码以实际检查find方法的返回值:

from requests_html import HTMLSession

s = HTMLSession()

url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"

r = s.get(url)

products = r.html.find("article.contentwrapper section")

for item in products:
    res = item.find("a", first=True)
    if res:
        print(res.attrs["href"])
    else:
        print("no match")

然后我们会发现它无法在每个循环迭代中找到任何a元素:

https://.../catalogue_product/alassio/?
no match
https://.../catalogue_product/amare/?
no match
https://.../catalogue_product/ambient/?
no match
https://.../catalogue_product/andora/?
no match
https://.../catalogue_product/antigua/?
no match
https://.../catalogue_product/aruba/?
no match
https://.../catalogue_product/avanza/?
no match
https://.../catalogue_product/barbados/?
no match
https://.../catalogue_product/bergen-bi-fold-door/?
no match
https://.../catalogue_product/framed-bi-fold-door/?
no match
https://.../catalogue_product/semi-frameless-bi-fold-door/?
no match
https://.../catalogue_product/cannes-10mm/?

这是因为您的表达式"article.contentwrapper section"同时匹配具有类collection-wrapper-item和类compare_favorites_section的部分,后者不包含任何a元素。

如果您修改代码以更具选择性:

from requests_html import HTMLSession

s = HTMLSession()

url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"

r = s.get(url)

products = r.html.find("article.contentwrapper section.collection-wrapper-item")

for item in products:
    res = item.find("a", first=True)
    if res:
        print(res.attrs["href"])
    else:
        print("no match")

那么您将可靠地找到链接。运行上述代码会产生以下结果:

https://.../catalogue_product/alassio/?
https://.../catalogue_product/amare/?
https://.../catalogue_product/ambient/?
https://.../catalogue_product/andora/?
https://.../catalogue_product/antigua/?
https://.../catalogue_product/aruba/?
https://.../catalogue_product/avanza/?
https://.../catalogue_product/barbados/?
https://.../catalogue_product/bergen-bi-fold-door/?
https://.../catalogue_product/framed-bi-fold-door/?
https://.../catalogue_product/semi-frameless-bi-fold-door/?
https://.../catalogue_product/cannes-10mm/?
英文:

To expand on Brian's comment, your code assumes that item.find('a', first=True) will successfully find an element. If it doesn't, that code returns None, and then you're asking for None.attrs['href'] (and None doesn't have an attrs attribute, hence the error message).

If we rewrite your code to actually check the return value of the find method:

from requests_html import HTMLSession

s = HTMLSession()

url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"

r = s.get(url)

products = r.html.find("article.contentwrapper section")

for item in products:
    res = item.find("a", first=True)
    if res:
        print(res.attrs["href"])
    else:
        print("no match")

Then we find that it fails to find any a elements in every other loop iteration:

https://.../catalogue_product/alassio/?
no match
https://.../catalogue_product/amare/?
no match
https://.../catalogue_product/ambient/?
no match
https://.../catalogue_product/andora/?
no match
https://.../catalogue_product/antigua/?
no match
https://.../catalogue_product/aruba/?
no match
https://.../catalogue_product/avanza/?
no match
https://.../catalogue_product/barbados/?
no match
https://.../catalogue_product/bergen-bi-fold-door/?
no match
https://.../catalogue_product/framed-bi-fold-door/?
no match
https://.../catalogue_product/semi-frameless-bi-fold-door/?
no match
https://.../catalogue_product/cannes-10mm/?
no match

And that's because your expression article.contentwrapper section is matching both sections with class collection-wrapper-item and sections with class compare_favorites_section, the latter of which contain no a elements.

If you modify your code to be more selective:

from requests_html import HTMLSession

s = HTMLSession()

url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/1"

r = s.get(url)

products = r.html.find("article.contentwrapper section.collection-wrapper-item")

for item in products:
    res = item.find("a", first=True)
    if res:
        print(res.attrs["href"])
    else:
        print("no match")

Then you will reliably find the links. Running the above produces:

https://.../catalogue_product/alassio/?
https://.../catalogue_product/amare/?
https://.../catalogue_product/ambient/?
https://.../catalogue_product/andora/?
https://.../catalogue_product/antigua/?
https://.../catalogue_product/aruba/?
https://.../catalogue_product/avanza/?
https://.../catalogue_product/barbados/?
https://.../catalogue_product/bergen-bi-fold-door/?
https://.../catalogue_product/framed-bi-fold-door/?
https://.../catalogue_product/semi-frameless-bi-fold-door/?
https://.../catalogue_product/cannes-10mm/?

huangapple
  • 本文由 发表于 2023年2月10日 04:34:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404148.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定