如何使用Beautiful Soup从HTML类中提取多个文本元素

huangapple go评论62阅读模式
英文:

How to extract multiple text elements from a HTML class using Beautiful Soup

问题

html_code.find_all('a')[1].text, html_code.find_all('a')[2].text, html_code.find_all('a')[3].text, html_code.find_all('a')[4].text
英文:

This is the sample HTML code (from imdb.com) I want to extract text elements from:

<p class="">
    Director:
<a href="/name/nm0001104/">Frank Darabont</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000209/">Tim Robbins</a>, 
<a href="/name/nm0000151/">Morgan Freeman</a>, 
<a href="/name/nm0348409/">Bob Gunton</a>, 
<a href="/name/nm0006669/">William Sadler</a>
</p>

From it, I can extract the director, but can't seem to do that for the stars.

I am extracting the director with this:

<html_code>.find('a').text

How can I extract the names of the actors (Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler) using similar syntax?

A beginner in BeautifulSoup thank you!

答案1

得分: 1

假设HTML保持一致,您可以使用 find_all 替代:

director, *cast = <html_code>.find_all('a')

print("导演:", director.text)

print("演员:")
for actor in cast:
    print(actor.text)
英文:

Assuming the HTML is consistent, you can use find_all instead:

director, *cast = &lt;html_code&gt;.find_all(&#39;a&#39;)

print(&quot;Director:&quot;, director.text)

print(&quot;Cast:&quot;)
for actor in cast:
    print(actor.text)

huangapple
  • 本文由 发表于 2023年3月15日 20:32:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75744739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定