BeautifulSoup 属性错误:’NoneType’ 对象没有 ‘text’ 属性,在网页抓取尝试中。

huangapple go评论86阅读模式
英文:

BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text' in web scraping attempt

问题

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

import requests
from bs4 import BeautifulSoup
url = 'https://www.work.ua/en/jobs/?ss=1'

# Get the webpage
page = requests.get(url)

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')

# Pull all text from the class
jobs = soup.find_all(class_='card card-hover card-visited wordwrap job-link js-hot-block')

# Pull text from all instances of <a> tag within the class
for job in jobs:
    job_title = job.find(class_='add-top-xs').text
    job_city = job.find(class_='add-top-xs').next_sibling.text
    job_salary = job.find(class_='salary').text
    job_url = job.find('a')['href']
    
    print('Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n'.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

英文:

Scraping https://www.work.ua/en/jobs/?ss=1 for all job postings I get the AttributeError: 'NoneType' object has no attribute 'text' error.

import requests
from bs4 import BeautifulSoup
url = &#39;https://www.work.ua/en/jobs/?ss=1&#39;

# Get the webpage
page = requests.get(url)

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, &#39;html.parser&#39;)

# Pull all text from the class
jobs = soup.find_all(class_=&#39;card card-hover card-visited wordwrap job-link js-hot-block&#39;)

# Pull text from all instances of &lt;a&gt; tag within the class
for job in jobs:
    job_title = job.find(class_=&#39;add-top-xs&#39;).text
    job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
    job_salary = job.find(class_=&#39;salary&#39;).text
    job_url = job.find(&#39;a&#39;)[&#39;href&#39;]
    
    print(&#39;Job title: {}\nCity: {}\nSalary: {}\nURL: {}\n&#39;.format(job_title, job_city, job_salary))

A web search indicates that this is likely due to a missing or incorrect tag in the HTML page, but this is beyond my skill level.

答案1

得分: 2

请理解,对于像

job_title = job.find(class_='add-top-xs').text

这样的表达式,初始部分可能返回`None`。

而对于`None`来说,没有`.text`属性。

定义一个辅助函数:

def get_text(e):
if e is None:
return ""
return e.text


现在你可以重写它为:

job_title = get_text(job.find(class_='add-top-xs'))


更一般地说,你想更好地理解这些页面的结构。也许`add-top-xs`类总是出现在感兴趣的元素上。或者它是可选的,你的代码必须适应。

打印出[soup.prettify()](https://beautiful-soup-4.readthedocs.io/en/latest/index.html#pretty-printing)可以帮助你更好地理解格式不佳的输入HTML。
英文:

Please understand that in an expression like

job_title = job.find(class_=&#39;add-top-xs&#39;).text

the initial part can return None.

And there's no .text attribute for None.

Define a helper:

def get_text(e):
    if e is None:
        return &quot;&quot;
    return e.text

Now you can rephrase it as:

job_title = get_text(job.find(class_=&#39;add-top-xs&#39;))

More generally, you want to better understand
how those pages are structured.
Maybe the add-top-xs class is always present
on elements of interest.
Or maybe it's optional, and your code must
learn to adapt.

Printing out
soup.prettify()
can go a long way toward helping you make sense
of poorly formatted input HTML.

答案2

得分: 0

explanation:

你遇到了一个 AttributeError 异常。你可能在一个对象中寻找 'text' 属性,但这个属性并不存在。

错误出现在以下其中一个部分:

soup = BeautifulSoup(page.text, 'html.parser')

或者

job_title = job.find(class_='add-top-xs').text
job_city = job.find(class_='add-top-xs').next_sibling.text
job_salary = job.find(class_='salary').text

看到了吗?你正在尝试从一个类中获取 .text 值...
很可能你在 HTML 页面中寻找的这些属性之一没有 text 属性。

你的问题出在这里:

job_salary = job.find(class_='salary').text

这个页面中没有 salary 类

Solution:

使用 try except - 例如:

try:
    soup = BeautifulSoup(page.text, 'html.parser')
    job_title = job.find(class_='add-top-xs').text
    job_city = job.find(class_='add-top-xs').next_sibling.text
    job_salary = job.find(class_='salary').text
except AttributeError:
    print("处理它的方式")
英文:

explanation:

You are getting an AttributeError exception. You probably looking for 'text' attribute in an object however this attribute doesn't exist.

The error is in one of these sections:

soup = BeautifulSoup(page.text, &#39;html.parser&#39;)

or

job_title = job.find(class_=&#39;add-top-xs&#39;).text
job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
job_salary = job.find(class_=&#39;salary&#39;).text

you see it? you are asking for a .text value from a class...
Probably one of these attributes that you are looking in the HTML page doesn't have a text attribute.

your problem is in here:

job_salary = job.find(class_=&#39;salary&#39;).text

The page doesn't have salary class

Solution:

use try except - something like:

try:
    soup = BeautifulSoup(page.text, &#39;html.parser&#39;)
    job_title = job.find(class_=&#39;add-top-xs&#39;).text
    job_city = job.find(class_=&#39;add-top-xs&#39;).next_sibling.text
    job_salary = job.find(class_=&#39;salary&#39;).text
except AttributeError:
    print(&quot;do_something_with_it&quot;)

huangapple
  • 本文由 发表于 2023年3月21日 01:23:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793428.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定