英文:
BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text' in web scraping attempt
问题
Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.
import requests
from bs4 import BeautifulSoup
url = 'https://www.work.ua/en/jobs/?ss=1'
# Get the webpage
page = requests.get(url)
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from the class
jobs = soup.find_all(class_='card card-hover card-visited wordwrap job-link js-hot-block')
# Pull text from all instances of <a> tag within the class
for job in jobs:
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
job_url = job.find('a')['href']
print('Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n'.format(job_title, job_city, job_salary))
A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.
英文:
Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.
import requests
from bs4 import BeautifulSoup
url = 'https://www.work.ua/en/jobs/?ss=1'
# Get the webpage
page = requests.get(url)
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from the class
jobs = soup.find_all(class_='card card-hover card-visited wordwrap job-link js-hot-block')
# Pull text from all instances of <a> tag within the class
for job in jobs:
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
job_url = job.find('a')['href']
print('Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n'.format(job_title, job_city, job_salary))
A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.
答案1
得分: 2
请理解,对于像
job_title = job.find(class_='add-top-xs').text
这样的表达式,初始部分可能返回`None`。
而对于`None`来说,没有`.text`属性。
定义一个辅助函数:
def get_text(e):
if e is None:
return ""
return e.text
现在你可以重写它为:
job_title = get_text(job.find(class_='add-top-xs'))
更一般地说,你想更好地理解这些页面的结构。也许`add-top-xs`类总是出现在感兴趣的元素上。或者它是可选的,你的代码必须适应。
打印出[soup.prettify()](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#pretty-printing)可以帮助你更好地理解格式不佳的输入HTML。
英文:
Please understand that in an expression like
job_title = job.find(class_='add-top-xs').text
the initial part can return None
.
And there's no .text
attribute for None
.
Define a helper:
def get_text(e):
if e is None:
return ""
return e.text
Now you can rephrase it as:
job_title = get_text(job.find(class_='add-top-xs'))
More generally, you want to better understand
how those pages are structured.
Maybe the add-top-xs
class is always present
on elements of interest.
Or maybe it's optional, and your code must
learn to adapt.
Printing out
soup.prettify()
can go a long way toward helping you make sense
of poorly formatted input HTML.
答案2
得分: 0
explanation:
你遇到了一个 AttributeError 异常。你可能在一个对象中寻找 'text' 属性,但这个属性并不存在。
错误出现在以下其中一个部分:
soup = BeautifulSoup(page.text, 'html.parser')
或者
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
看到了吗?你正在尝试从一个类中获取 .text 值...
很可能你在 HTML 页面中寻找的这些属性之一没有 text 属性。
你的问题出在这里:
job_salary = job.find(class_='salary').text
这个页面中没有 salary 类
Solution:
使用 try except - 例如:
try:
soup = BeautifulSoup(page.text, 'html.parser')
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
except AttributeError:
print("处理它的方式")
英文:
explanation:
You are getting an AttributeError exception. You probably looking for 'text' attribute in an object however this attribute doesn't exist.
The error is in one of these sections:
soup = BeautifulSoup(page.text, 'html.parser')
or
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
you see it? you are asking for a .text value from a class...
Probably one of these attributes that you are looking in the HTML page doesn't have a text attribute.
your problem is in here:
job_salary = job.find(class_='salary').text
The page doesn't have salary class
Solution:
use try except - something like:
try:
soup = BeautifulSoup(page.text, 'html.parser')
job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text
except AttributeError:
print("do_something_with_it")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论