如何使用Beautiful Soup从HTML类中提取多个文本元素

huangapple

117266
文章

0
评论

2023年3月15日 20:32:38go评论108阅读模式

英文:

How to extract multiple text elements from a HTML class using Beautiful Soup

问题

html_code.find_all('a')[1].text, html_code.find_all('a')[2].text, html_code.find_all('a')[3].text, html_code.find_all('a')[4].text

英文:

This is the sample HTML code (from imdb.com) I want to extract text elements from:

&lt;p class=&quot;&quot;&gt;
    Director:
&lt;a href=&quot;/name/nm0001104/&quot;&gt;Frank Darabont&lt;/a&gt;
&lt;span class=&quot;ghost&quot;&gt;|&lt;/span&gt; 
    Stars:
&lt;a href=&quot;/name/nm0000209/&quot;&gt;Tim Robbins&lt;/a&gt;, 
&lt;a href=&quot;/name/nm0000151/&quot;&gt;Morgan Freeman&lt;/a&gt;, 
&lt;a href=&quot;/name/nm0348409/&quot;&gt;Bob Gunton&lt;/a&gt;, 
&lt;a href=&quot;/name/nm0006669/&quot;&gt;William Sadler&lt;/a&gt;
&lt;/p&gt;

From it, I can extract the director, but can't seem to do that for the stars.

I am extracting the director with this:

&lt;html_code&gt;.find(&#39;a&#39;).text

How can I extract the names of the actors (Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler) using similar syntax?

A beginner in BeautifulSoup thank you!

答案1

得分: 1

假设HTML保持一致，您可以使用 find_all 替代：

director, *cast = <html_code>.find_all('a')
print("导演：", director.text)
print("演员：")
for actor in cast:
    print(actor.text)

英文:

Assuming the HTML is consistent, you can use find_all instead:

director, *cast = &lt;html_code&gt;.find_all(&#39;a&#39;)
print(&quot;Director:&quot;, director.text)
print(&quot;Cast:&quot;)
for actor in cast:
    print(actor.text)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2023年3月15日 20:32:38
转载请务必保留本文链接：https://go.coder-hub.com/75744739.html

beautifulsoup
html
python

为什么os.walk()（Python）会根据目录中的文件数量忽略OneDrive目录？

go 107 03/15

属性覆盖对象getattribute的原因是什么？

go 103 05/22

在类中的单个变量中声明多个值的Python方式

go 97 02/10

SQLAlchemy 2.0与外键和关联的问题

go 107 06/29

如何使用Beautiful Soup从HTML类中提取多个文本元素

问题

答案1

为什么os.walk()（Python）会根据目录中的文件数量忽略OneDrive目录？

属性覆盖对象getattribute的原因是什么？

在类中的单个变量中声明多个值的Python方式

SQLAlchemy 2.0与外键和关联的问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。