2023年3月21日 01:23:42go评论116阅读模式

英文:

BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text' in web scraping attempt

问题

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

import requests
from bs4 import BeautifulSoup
url = 'https://www.work.ua/en/jobs/?ss=1'
# Get the webpage
page = requests.get(url)
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from the class
jobs = soup.find_all(class_='card card-hover card-visited wordwrap job-link js-hot-block')
# Pull text from all instances of <a> tag within the class
for job in jobs:
    job_title = job.find(class_='add-top-xs').text
    job_city = job.find(class_='add-top-xs').next_sibling.text
    job_salary = job.find(class_='salary').text
    job_url = job.find('a')['href']
    
    print('Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n'.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

英文:

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

import requests
from bs4 import BeautifulSoup
url = &#39;https://www.work.ua/en/jobs/?ss=1&#39;
# Get the webpage
page = requests.get(url)
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, &#39;html.parser&#39;)
# Pull all text from the class
jobs = soup.find_all(class_=&#39;card card-hover card-visited wordwrap job-link js-hot-block&#39;)
# Pull text from all instances of &lt;a&gt; tag within the class
for job in jobs:
    job_title = job.find(class_=&#39;add-top-xs&#39;).text
    job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
    job_salary = job.find(class_=&#39;salary&#39;).text
    job_url = job.find(&#39;a&#39;)[&#39;href&#39;]
    
    print(&#39;Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n&#39;.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

答案1

得分: 2

请理解，对于像

job_title = job.find(class_='add-top-xs').text

这样的表达式，初始部分可能返回`None`。
而对于`None`来说，没有`.text`属性。
定义一个辅助函数：

def get_text(e):
if e is None:
return ""
return e.text


现在你可以重写它为：

job_title = get_text(job.find(class_='add-top-xs'))


更一般地说，你想更好地理解这些页面的结构。也许`add-top-xs`类总是出现在感兴趣的元素上。或者它是可选的，你的代码必须适应。
打印出[soup.prettify()](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#pretty-printing)可以帮助你更好地理解格式不佳的输入HTML。

英文:

Please understand that in an expression like

job_title = job.find(class_=&#39;add-top-xs&#39;).text

the initial part can return None.

And there's no .text attribute for None.

Define a helper:

def get_text(e):
    if e is None:
        return &quot;&quot;
    return e.text

Now you can rephrase it as:

job_title = get_text(job.find(class_=&#39;add-top-xs&#39;))

More generally, you want to better understand
how those pages are structured.
Maybe the add-top-xs class is always present
on elements of interest.
Or maybe it's optional, and your code must
learn to adapt.

Printing out
soup.prettify()
can go a long way toward helping you make sense
of poorly formatted input HTML.

答案2

得分: 0

explanation:

你遇到了一个 AttributeError 异常。你可能在一个对象中寻找 'text' 属性，但这个属性并不存在。

错误出现在以下其中一个部分：

soup = BeautifulSoup(page.text, 'html.parser')

或者

job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text

看到了吗？你正在尝试从一个类中获取 .text 值...
很可能你在 HTML 页面中寻找的这些属性之一没有 text 属性。

你的问题出在这里：

job_salary = job.find(class_='salary').text

这个页面中没有 salary 类

Solution:

使用 try except - 例如：

try:
    soup = BeautifulSoup(page.text, 'html.parser')
    job_title = job.find(class_='add-top-xs').text
    job_city = job.find(class_='add-top-xs').next_sibling.text
    job_salary = job.find(class_='salary').text
except AttributeError:
    print("处理它的方式")

英文:

explanation:

You are getting an AttributeError exception. You probably looking for 'text' attribute in an object however this attribute doesn't exist.

The error is in one of these sections:

soup = BeautifulSoup(page.text, &#39;html.parser&#39;)

job_title = job.find(class_=&#39;add-top-xs&#39;).text
job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
job_salary = job.find(class_=&#39;salary&#39;).text

you see it? you are asking for a .text value from a class...
Probably one of these attributes that you are looking in the HTML page doesn't have a text attribute.

your problem is in here:

job_salary = job.find(class_=&#39;salary&#39;).text

The page doesn't have salary class

Solution:

use try except - something like:

try:
    soup = BeautifulSoup(page.text, &#39;html.parser&#39;)
    job_title = job.find(class_=&#39;add-top-xs&#39;).text
    job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
    job_salary = job.find(class_=&#39;salary&#39;).text
except AttributeError:
    print(&quot;do_something_with_it&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

BeautifulSoup 属性错误：’NoneType’ 对象没有 ‘text’ 属性，在网页抓取尝试中。

问题

答案1

答案2

如何在循环中将不同数据框的列相加？

如何捕获从Chem.MolFromSmiles(‘Formula’)中的错误消息。

无错误，但使用R进行网页抓取时导致空数据框。

np.random.binomial函数在NumPy中的工作原理如何？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。