‘NoneType’ 对象没有属性 ‘get_text’ 在使用 BeautifulSoup 时出错

huangapple go评论64阅读模式
英文:

'NoneType' object has no attribute 'get_text' Error while using BeautifulSoup

问题

我是新手使用BeautifulSoup试图爬取[DevCommunity][1]网站但是我遇到了第一篇文章后出现了*'NoneType'对象没有属性'text'*错误

```python
from bs4 import BeautifulSoup
import requests

try:
    
    source = requests.get('https://www.dev.to')
    source.raise_for_status()
    
    soup = BeautifulSoup(source.text, 'html.parser')
    
    articles = soup.find_all(class_="crayons-story")
    for x in articles:
        title = x.find(class_="crayons-story__title")
        title_text = title.text
        print(title_text)
except Exception as e :
    print(e)

<details>
<summary>英文:</summary>

I am new to BeautifulSoup and was trying to scrape the [DevCommunity][1] website. But I&#39;m getting the first article then a *&#39;NoneType&#39; object has no attribute &#39;text&#39;* error.

from bs4 import BeautifulSoup
import requests

try:

source = requests.get(&#39;https://www.dev.to&#39;)
source.raise_for_status()

soup = BeautifulSoup(source.text,&#39;html.parser&#39;)

articles = soup.find_all( class_=&quot;crayons-story&quot;)
for x in articles:
    title = x.find(class_=&quot;crayons-story__title&quot;)
    title_text = title.text
    print(title_text)

except Exception as e :
print(e)



  [1]: https://www.dev.to

</details>


# 答案1
**得分**: 1

以下是翻译好的部分:

```py
数据通过Javascript从外部URL加载。因此,`beautifulsoup`看不到它。要模拟这个Ajax请求,你可以使用以下示例:

```py
import requests

url = 'https://dev.to/search/feed_content'

params = {
    "per_page": "15",
    "page": 1,
    "sort_by": "hotness_score",
    "sort_direction": "desc",
    "approved": "",
    "class_name": "Article",
}

for params['page'] in range(1, 3):    # &lt;-- 增加页面数目在这里
    data = requests.get(url, params=params).json()
    for r in data['result']:
        print(r['title'])

打印输出:

Kubernetes架构概述
革命化的软件工程:拥抱AI的力量
C#中的数据结构 第3部分:HashSets
提取类方法:如何从类派生接口
来自s3的SVG图像优化和缓存
NodeJS基础
Spring REST与枚举一起使用
在JPA中持久化非基本类型
将CSV导入AGE
SQL项目规划 | HackerRank | MSSQL
WordPress预订插件
GitHub流量检查器
提供程序设计模式
WordPress图像优化插件
重要的WordPress插件
带有示例的CSS定位
100天 - 第22天
Javasript中API(应用程序编程接口)的不同方法
JavaScript数组方法
[Typia] 我制作了实时演示网站,验证速度提高了20,000倍(+200倍的JSON字符串化速度)
Git备忘单
设置下一个项目
代码中保持组织的3个绝对有效策略
Python线程:Python多线程的综合指南
这是我的第一篇帖子。
关于JavaScript中的DOM,你需要知道的一切
JavaScript基础
HTML中的表单输入字段
GSoC第2-3周更新
JavaScript vs React
英文:

The data is loaded from external URL via Javascript. So beautifulsoup doesn't see it. To simulate this Ajax requests you can use this example:

import requests

url = &#39;https://dev.to/search/feed_content&#39;

params = {
    &quot;per_page&quot;: &quot;15&quot;,
    &quot;page&quot;: 1,
    &quot;sort_by&quot;: &quot;hotness_score&quot;,
    &quot;sort_direction&quot;: &quot;desc&quot;,
    &quot;approved&quot;: &quot;&quot;,
    &quot;class_name&quot;: &quot;Article&quot;,
}

for params[&#39;page&#39;] in range(1, 3):    # &lt;-- increase number of pages here
    data = requests.get(url, params=params).json()
    for r in data[&#39;result&#39;]:
        print(r[&#39;title&#39;])

Prints:

Kubernetes Architectural Overview
Revolutionizing Software Engineering: Embracing the Power of AI
Data Structures in C# Part 3: HashSets
Extracting Class Methods: How To Derive an Interface From a Class
SVG Image optimisations and caching from s3
NodeJS Basics
Spring REST working with enums
Persisting non-primitive types in&#160;JPA
Importing CSV to AGE
SQL Project Planning | HackerRank | MSSQL
Wordpress Booking Plugins
GitHub Traffic Checker
HOW THE PROVIDER DESIGN PATTERN
Wordpress Image Optimization Plugins
Wordpress Plugins that are important
CSS positioning with an example
100 DAYS - DAY 22
Different methods of API(Application Program Interface ) in Javasript
Javascript Array Methods
[Typia] I made realtime demo site of 20,000x faster validation (+200x faster JSON stringify)
Git Cheat sheet
Setting Up Your Next Project
3 Foolproof Strategies for Staying Organized in Code
Python Threading: A Comprehensive Guide to Multithreading in Python
This is my first post.
Everything you need to know about DOM in Javascript
Javascript Basics
Form input fields in HTML
GSoC Week 2-3 Update
JavaScript vs React

答案2

得分: 0

因此,在该网站的浏览器控制台中执行以下简单测试:

console.log(document.querySelectorAll('.crayons-story').length) // 输出:51
console.log(document.querySelectorAll('.crayons-story .crayons-story__title').length) // 输出:50

因此,必须有一个.crayons-story没有crayons-story__title子元素。因此,为了保持简单,您可以通过检查if title == None:跳过它,或者使用Andrej Kesely的答案。

英文:

A simple test if you put into the console of your browser on that website:

console.log(document.querySelectorAll(&#39;.crayons-story&#39;).length) // output: 51
console.log(document.querySelectorAll(&#39;.crayons-story .crayons-story__title&#39;).length) // output: 50

Therefore, there must be one .crayons-story without a crayons-story__title child. So, to keep it simple, you can just skip it by checking if title == None:, or use Andrej Kesely's anwser.

huangapple
  • 本文由 发表于 2023年6月26日 02:03:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76551777.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定