英文:
'NoneType' object has no attribute 'get_text' Error while using BeautifulSoup
问题
我是新手使用BeautifulSoup,试图爬取[DevCommunity][1]网站。但是我遇到了第一篇文章后出现了*'NoneType'对象没有属性'text'*错误。
```python
from bs4 import BeautifulSoup
import requests
try:
source = requests.get('https://www.dev.to')
source.raise_for_status()
soup = BeautifulSoup(source.text, 'html.parser')
articles = soup.find_all(class_="crayons-story")
for x in articles:
title = x.find(class_="crayons-story__title")
title_text = title.text
print(title_text)
except Exception as e :
print(e)
<details>
<summary>英文:</summary>
I am new to BeautifulSoup and was trying to scrape the [DevCommunity][1] website. But I'm getting the first article then a *'NoneType' object has no attribute 'text'* error.
from bs4 import BeautifulSoup
import requests
try:
source = requests.get('https://www.dev.to')
source.raise_for_status()
soup = BeautifulSoup(source.text,'html.parser')
articles = soup.find_all( class_="crayons-story")
for x in articles:
title = x.find(class_="crayons-story__title")
title_text = title.text
print(title_text)
except Exception as e :
print(e)
[1]: https://www.dev.to
</details>
# 答案1
**得分**: 1
以下是翻译好的部分:
```py
数据通过Javascript从外部URL加载。因此,`beautifulsoup`看不到它。要模拟这个Ajax请求,你可以使用以下示例:
```py
import requests
url = 'https://dev.to/search/feed_content'
params = {
"per_page": "15",
"page": 1,
"sort_by": "hotness_score",
"sort_direction": "desc",
"approved": "",
"class_name": "Article",
}
for params['page'] in range(1, 3): # <-- 增加页面数目在这里
data = requests.get(url, params=params).json()
for r in data['result']:
print(r['title'])
打印输出:
Kubernetes架构概述
革命化的软件工程:拥抱AI的力量
C#中的数据结构 第3部分:HashSets
提取类方法:如何从类派生接口
来自s3的SVG图像优化和缓存
NodeJS基础
Spring REST与枚举一起使用
在JPA中持久化非基本类型
将CSV导入AGE
SQL项目规划 | HackerRank | MSSQL
WordPress预订插件
GitHub流量检查器
提供程序设计模式
WordPress图像优化插件
重要的WordPress插件
带有示例的CSS定位
100天 - 第22天
Javasript中API(应用程序编程接口)的不同方法
JavaScript数组方法
[Typia] 我制作了实时演示网站,验证速度提高了20,000倍(+200倍的JSON字符串化速度)
Git备忘单
设置下一个项目
代码中保持组织的3个绝对有效策略
Python线程:Python多线程的综合指南
这是我的第一篇帖子。
关于JavaScript中的DOM,你需要知道的一切
JavaScript基础
HTML中的表单输入字段
GSoC第2-3周更新
JavaScript vs React
英文:
The data is loaded from external URL via Javascript. So beautifulsoup
doesn't see it. To simulate this Ajax requests you can use this example:
import requests
url = 'https://dev.to/search/feed_content'
params = {
"per_page": "15",
"page": 1,
"sort_by": "hotness_score",
"sort_direction": "desc",
"approved": "",
"class_name": "Article",
}
for params['page'] in range(1, 3): # <-- increase number of pages here
data = requests.get(url, params=params).json()
for r in data['result']:
print(r['title'])
Prints:
Kubernetes Architectural Overview
Revolutionizing Software Engineering: Embracing the Power of AI
Data Structures in C# Part 3: HashSets
Extracting Class Methods: How To Derive an Interface From a Class
SVG Image optimisations and caching from s3
NodeJS Basics
Spring REST working with enums
Persisting non-primitive types in JPA
Importing CSV to AGE
SQL Project Planning | HackerRank | MSSQL
Wordpress Booking Plugins
GitHub Traffic Checker
HOW THE PROVIDER DESIGN PATTERN
Wordpress Image Optimization Plugins
Wordpress Plugins that are important
CSS positioning with an example
100 DAYS - DAY 22
Different methods of API(Application Program Interface ) in Javasript
Javascript Array Methods
[Typia] I made realtime demo site of 20,000x faster validation (+200x faster JSON stringify)
Git Cheat sheet
Setting Up Your Next Project
3 Foolproof Strategies for Staying Organized in Code
Python Threading: A Comprehensive Guide to Multithreading in Python
This is my first post.
Everything you need to know about DOM in Javascript
Javascript Basics
Form input fields in HTML
GSoC Week 2-3 Update
JavaScript vs React
答案2
得分: 0
因此,在该网站的浏览器控制台中执行以下简单测试:
console.log(document.querySelectorAll('.crayons-story').length) // 输出:51
console.log(document.querySelectorAll('.crayons-story .crayons-story__title').length) // 输出:50
因此,必须有一个.crayons-story
没有crayons-story__title
子元素。因此,为了保持简单,您可以通过检查if title == None:
跳过它,或者使用Andrej Kesely的答案。
英文:
A simple test if you put into the console of your browser on that website:
console.log(document.querySelectorAll('.crayons-story').length) // output: 51
console.log(document.querySelectorAll('.crayons-story .crayons-story__title').length) // output: 50
Therefore, there must be one .crayons-story
without a crayons-story__title
child. So, to keep it simple, you can just skip it by checking if title == None:
, or use Andrej Kesely's anwser.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论