BeautifulSoup 属性错误:’NoneType’ 对象没有 ‘text’ 属性,在网页抓取尝试中。

huangapple go评论116阅读模式
英文:

BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text' in web scraping attempt

问题

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

  1. import requests
  2. from bs4 import BeautifulSoup
  3. url = 'https://www.work.ua/en/jobs/?ss=1'
  4. # Get the webpage
  5. page = requests.get(url)
  6. # Create a BeautifulSoup object
  7. soup = BeautifulSoup(page.text, 'html.parser')
  8. # Pull all text from the class
  9. jobs = soup.find_all(class_='card card-hover card-visited wordwrap job-link js-hot-block')
  10. # Pull text from all instances of <a> tag within the class
  11. for job in jobs:
  12. job_title = job.find(class_='add-top-xs').text
  13. job_city = job.find(class_='add-top-xs').next_sibling.text
  14. job_salary = job.find(class_='salary').text
  15. job_url = job.find('a')['href']
  16. print('Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n'.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

英文:

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

  1. import requests
  2. from bs4 import BeautifulSoup
  3. url = &#39;https://www.work.ua/en/jobs/?ss=1&#39;
  4. # Get the webpage
  5. page = requests.get(url)
  6. # Create a BeautifulSoup object
  7. soup = BeautifulSoup(page.text, &#39;html.parser&#39;)
  8. # Pull all text from the class
  9. jobs = soup.find_all(class_=&#39;card card-hover card-visited wordwrap job-link js-hot-block&#39;)
  10. # Pull text from all instances of &lt;a&gt; tag within the class
  11. for job in jobs:
  12. job_title = job.find(class_=&#39;add-top-xs&#39;).text
  13. job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
  14. job_salary = job.find(class_=&#39;salary&#39;).text
  15. job_url = job.find(&#39;a&#39;)[&#39;href&#39;]
  16. print(&#39;Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n&#39;.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

答案1

得分: 2

  1. 请理解,对于像

job_title = job.find(class_='add-top-xs').text

  1. 这样的表达式,初始部分可能返回`None`
  2. 而对于`None`来说,没有`.text`属性。
  3. 定义一个辅助函数:

def get_text(e):
if e is None:
return ""
return e.text

  1. 现在你可以重写它为:

job_title = get_text(job.find(class_='add-top-xs'))

  1. 更一般地说,你想更好地理解这些页面的结构。也许`add-top-xs`类总是出现在感兴趣的元素上。或者它是可选的,你的代码必须适应。
  2. 打印出[soup.prettify()](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#pretty-printing)可以帮助你更好地理解格式不佳的输入HTML。
英文:

Please understand that in an expression like

  1. job_title = job.find(class_=&#39;add-top-xs&#39;).text

the initial part can return None.

And there's no .text attribute for None.

Define a helper:

  1. def get_text(e):
  2. if e is None:
  3. return &quot;&quot;
  4. return e.text

Now you can rephrase it as:

  1. job_title = get_text(job.find(class_=&#39;add-top-xs&#39;))

More generally, you want to better understand
how those pages are structured.
Maybe the add-top-xs class is always present
on elements of interest.
Or maybe it's optional, and your code must
learn to adapt.

Printing out
soup.prettify()
can go a long way toward helping you make sense
of poorly formatted input HTML.

答案2

得分: 0

explanation:

你遇到了一个 AttributeError 异常。你可能在一个对象中寻找 'text' 属性,但这个属性并不存在。

错误出现在以下其中一个部分:

  1. soup = BeautifulSoup(page.text, 'html.parser')

或者

  1. job_title = job.find(class_='add-top-xs').text
  2. job_city = job.find(class_='add-top-xs').next_sibling.text
  3. job_salary = job.find(class_='salary').text

看到了吗?你正在尝试从一个类中获取 .text 值...
很可能你在 HTML 页面中寻找的这些属性之一没有 text 属性。

你的问题出在这里:

  1. job_salary = job.find(class_='salary').text

这个页面中没有 salary 类

Solution:

使用 try except - 例如:

  1. try:
  2. soup = BeautifulSoup(page.text, 'html.parser')
  3. job_title = job.find(class_='add-top-xs').text
  4. job_city = job.find(class_='add-top-xs').next_sibling.text
  5. job_salary = job.find(class_='salary').text
  6. except AttributeError:
  7. print("处理它的方式")
英文:

explanation:

You are getting an AttributeError exception. You probably looking for 'text' attribute in an object however this attribute doesn't exist.

The error is in one of these sections:

  1. soup = BeautifulSoup(page.text, &#39;html.parser&#39;)

or

  1. job_title = job.find(class_=&#39;add-top-xs&#39;).text
  2. job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
  3. job_salary = job.find(class_=&#39;salary&#39;).text

you see it? you are asking for a .text value from a class...
Probably one of these attributes that you are looking in the HTML page doesn't have a text attribute.

your problem is in here:

  1. job_salary = job.find(class_=&#39;salary&#39;).text

The page doesn't have salary class

Solution:

use try except - something like:

  1. try:
  2. soup = BeautifulSoup(page.text, &#39;html.parser&#39;)
  3. job_title = job.find(class_=&#39;add-top-xs&#39;).text
  4. job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
  5. job_salary = job.find(class_=&#39;salary&#39;).text
  6. except AttributeError:
  7. print(&quot;do_something_with_it&quot;)

huangapple
  • 本文由 发表于 2023年3月21日 01:23:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793428.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定