BeautifulSoup为什么不返回
中的所有文本?

huangapple go评论54阅读模式
英文:

Why does BeautifulSoup not returning all text in div?

问题

Your code seems fine, but it might be missing some lyrics because they might be loaded dynamically using JavaScript. BeautifulSoup only parses the static HTML content, so dynamic content may not be included.

To retrieve the entire lyrics, you might need to use a library like Selenium that can interact with the webpage and access dynamically loaded content. Alternatively, you can explore if the Genius website offers an API for accessing lyrics, which would be a more reliable way to obtain the full lyrics of a song programmatically.

英文:

I am trying to get the lyrics of a song from the Genius website with the BeautifulSoap library. I've seen different approaches online but they all seem to be out-of-date, however, my code should work fine, and it seems to work fine, but it only retrieves some part of the lyrics div. Is my first time using this library so maybe I'm missing something. The lyrics are contained in the same <div class="Lyrics__Container...">

This is my code:

from bs4 import BeautifulSoup
import re
import requests

song_api_path = &#39;/Taylor-swift-cardigan-lyrics&#39;
page_url = &quot;http://genius.com&quot; + song_api_path
page = requests.get(page_url)
soup = BeautifulSoup(page.text, &quot;html.parser&quot;)

div = soup.find(&quot;div&quot;,class_=lambda value: value and re.search(r&#39;^Lyrics__Container&#39;, value))

all_text = div.get_text(separator=&#39;\n&#39;)

print(all_text)

Which produces this output:

[Verse 1]
Vintage tee, brand new phone
High heels on cobblestones
When you are young, they assume you know nothing
Sequin smile, black lipstick
Sensual politics
When you are young, they assume you know nothing
[Chorus]
But I knew you
Dancin' in your Levi's
Drunk under a streetlight, I
I knew you
Hand under my sweatshirt
Baby, kiss it better, I
[Refrain]
And when I felt like I was an old cardigan
Under someone's bed
You put me on and said I was your favorite
[Verse 2]
A friend to all is a friend to none
Chase two girls, lose the one
When you are young, they assume you know nothing

This result is ok but it is only half of the lyrics. I don't know why only this part is retrieved and not the rest of the text. I've checked the html in the Genius website but don't see anything different from the parts that are printed.

Any help is appreciated!

答案1

得分: 1

你使用了 find 方法。

需要查找所有带有 Lyrics__Container 类的 div。

divs = soup.find_all("div", class_=lambda value: value and re.search(r'^Lyrics__Container', value))

all_text = '\n'.join([div.get_text(separator='\n') for div in divs])

print(all_text) 将打印所有歌词。

英文:

You used find method.

Need to find all divs with Lyrics__Container class

divs = soup.find_all(&quot;div&quot;,class_=lambda value: value and re.search(r&#39;^Lyrics__Container&#39;, value))

all_text = &#39;\n&#39;.join([div.get_text(separator=&#39;\n&#39;) for div in divs])


print(all_text)

will print all lyrics

huangapple
  • 本文由 发表于 2023年6月8日 06:30:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定