问题

我是新手使用BeautifulSoup，试图爬取[DevCommunity][1]网站。但是我遇到了第一篇文章后出现了*'NoneType'对象没有属性'text'*错误。

```python
from bs4 import BeautifulSoup
import requests

try:
    
    source = requests.get('https://www.dev.to')
    source.raise_for_status()
    
    soup = BeautifulSoup(source.text, 'html.parser')
    
    articles = soup.find_all(class_="crayons-story")
    for x in articles:
        title = x.find(class_="crayons-story__title")
        title_text = title.text
        print(title_text)
except Exception as e :
    print(e)


<details>
<summary>英文:</summary>

I am new to BeautifulSoup and was trying to scrape the [DevCommunity][1] website. But I&#39;m getting the first article then a *&#39;NoneType&#39; object has no attribute &#39;text&#39;* error.

from bs4 import BeautifulSoup
import requests

try:

source = requests.get(&#39;https://www.dev.to&#39;)
source.raise_for_status()

soup = BeautifulSoup(source.text,&#39;html.parser&#39;)

articles = soup.find_all( class_=&quot;crayons-story&quot;)
for x in articles:
    title = x.find(class_=&quot;crayons-story__title&quot;)
    title_text = title.text
    print(title_text)

except Exception as e :
print(e)



  [1]: https://www.dev.to

</details>


# 答案1
**得分**: 1

以下是翻译好的部分：

```py
数据通过Javascript从外部URL加载。因此，`beautifulsoup`看不到它。要模拟这个Ajax请求，你可以使用以下示例：

```py
import requests

url = 'https://dev.to/search/feed_content'

params = {
    "per_page": "15",
    "page": 1,
    "sort_by": "hotness_score",
    "sort_direction": "desc",
    "approved": "",
    "class_name": "Article",
}

for params['page'] in range(1, 3):    # &lt;-- 增加页面数目在这里
    data = requests.get(url, params=params).json()
    for r in data['result']:
        print(r['title'])

打印输出：

Kubernetes架构概述
革命化的软件工程：拥抱AI的力量
C#中的数据结构 第3部分：HashSets
提取类方法：如何从类派生接口
来自s3的SVG图像优化和缓存
NodeJS基础
Spring REST与枚举一起使用
在JPA中持久化非基本类型
将CSV导入AGE
SQL项目规划 | HackerRank | MSSQL
WordPress预订插件
GitHub流量检查器
提供程序设计模式
WordPress图像优化插件
重要的WordPress插件
带有示例的CSS定位
100天 - 第22天
Javasript中API（应用程序编程接口）的不同方法
JavaScript数组方法
[Typia] 我制作了实时演示网站，验证速度提高了20,000倍（+200倍的JSON字符串化速度）
Git备忘单
设置下一个项目
代码中保持组织的3个绝对有效策略
Python线程：Python多线程的综合指南
这是我的第一篇帖子。
关于JavaScript中的DOM，你需要知道的一切
JavaScript基础
HTML中的表单输入字段
GSoC第2-3周更新
JavaScript vs React

英文:

The data is loaded from external URL via Javascript. So beautifulsoup doesn't see it. To simulate this Ajax requests you can use this example:

import requests

url = &#39;https://dev.to/search/feed_content&#39;

params = {
    &quot;per_page&quot;: &quot;15&quot;,
    &quot;page&quot;: 1,
    &quot;sort_by&quot;: &quot;hotness_score&quot;,
    &quot;sort_direction&quot;: &quot;desc&quot;,
    &quot;approved&quot;: &quot;&quot;,
    &quot;class_name&quot;: &quot;Article&quot;,
}

for params[&#39;page&#39;] in range(1, 3):    # &lt;-- increase number of pages here
    data = requests.get(url, params=params).json()
    for r in data[&#39;result&#39;]:
        print(r[&#39;title&#39;])

Prints:

Kubernetes Architectural Overview
Revolutionizing Software Engineering: Embracing the Power of AI
Data Structures in C# Part 3: HashSets
Extracting Class Methods: How To Derive an Interface From a Class
SVG Image optimisations and caching from s3
NodeJS Basics
Spring REST working with enums
Persisting non-primitive types in&#160;JPA
Importing CSV to AGE
SQL Project Planning | HackerRank | MSSQL
Wordpress Booking Plugins
GitHub Traffic Checker
HOW THE PROVIDER DESIGN PATTERN
Wordpress Image Optimization Plugins
Wordpress Plugins that are important
CSS positioning with an example
100 DAYS - DAY 22
Different methods of API(Application Program Interface ) in Javasript
Javascript Array Methods
[Typia] I made realtime demo site of 20,000x faster validation (+200x faster JSON stringify)
Git Cheat sheet
Setting Up Your Next Project
3 Foolproof Strategies for Staying Organized in Code
Python Threading: A Comprehensive Guide to Multithreading in Python
This is my first post.
Everything you need to know about DOM in Javascript
Javascript Basics
Form input fields in HTML
GSoC Week 2-3 Update
JavaScript vs React

答案2

得分: 0

因此，在该网站的浏览器控制台中执行以下简单测试：

console.log(document.querySelectorAll('.crayons-story').length) // 输出：51
console.log(document.querySelectorAll('.crayons-story .crayons-story__title').length) // 输出：50

因此，必须有一个.crayons-story没有crayons-story__title子元素。因此，为了保持简单，您可以通过检查if title == None:跳过它，或者使用Andrej Kesely的答案。

英文:

A simple test if you put into the console of your browser on that website:

console.log(document.querySelectorAll(&#39;.crayons-story&#39;).length) // output: 51
console.log(document.querySelectorAll(&#39;.crayons-story .crayons-story__title&#39;).length) // output: 50

Therefore, there must be one .crayons-story without a crayons-story__title child. So, to keep it simple, you can just skip it by checking if title == None:, or use Andrej Kesely's anwser.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

‘NoneType’ 对象没有属性 ‘get_text’ 在使用 BeautifulSoup 时出错

问题

答案2

How can I use transform (or other) instead of merge and temporary columns for special indexing/ranking?

Selenium 在网站重定向后立即获取当前 URL

在创建Dask中的子数据帧时减少任务完成数

pandas：顺序合并会添加新列而不是替换NaN值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论