英文:
Why does BeautifulSoup not returning all text in div?
问题
Your code seems fine, but it might be missing some lyrics because they might be loaded dynamically using JavaScript. BeautifulSoup only parses the static HTML content, so dynamic content may not be included.
To retrieve the entire lyrics, you might need to use a library like Selenium that can interact with the webpage and access dynamically loaded content. Alternatively, you can explore if the Genius website offers an API for accessing lyrics, which would be a more reliable way to obtain the full lyrics of a song programmatically.
英文:
I am trying to get the lyrics of a song from the Genius website with the BeautifulSoap library. I've seen different approaches online but they all seem to be out-of-date, however, my code should work fine, and it seems to work fine, but it only retrieves some part of the lyrics div. Is my first time using this library so maybe I'm missing something. The lyrics are contained in the same <div class="Lyrics__Container...">
This is my code:
from bs4 import BeautifulSoup
import re
import requests
song_api_path = '/Taylor-swift-cardigan-lyrics'
page_url = "http://genius.com" + song_api_path
page = requests.get(page_url)
soup = BeautifulSoup(page.text, "html.parser")
div = soup.find("div",class_=lambda value: value and re.search(r'^Lyrics__Container', value))
all_text = div.get_text(separator='\n')
print(all_text)
Which produces this output:
[Verse 1]
Vintage tee, brand new phone
High heels on cobblestones
When you are young, they assume you know nothing
Sequin smile, black lipstick
Sensual politics
When you are young, they assume you know nothing
[Chorus]
But I knew you
Dancin' in your Levi's
Drunk under a streetlight, I
I knew you
Hand under my sweatshirt
Baby, kiss it better, I
[Refrain]
And when I felt like I was an old cardigan
Under someone's bed
You put me on and said I was your favorite
[Verse 2]
A friend to all is a friend to none
Chase two girls, lose the one
When you are young, they assume you know nothing
This result is ok but it is only half of the lyrics. I don't know why only this part is retrieved and not the rest of the text. I've checked the html in the Genius website but don't see anything different from the parts that are printed.
Any help is appreciated!
答案1
得分: 1
你使用了 find
方法。
需要查找所有带有 Lyrics__Container
类的 div。
divs = soup.find_all("div", class_=lambda value: value and re.search(r'^Lyrics__Container', value))
all_text = '\n'.join([div.get_text(separator='\n') for div in divs])
print(all_text)
将打印所有歌词。
英文:
You used find
method.
Need to find all divs with Lyrics__Container
class
divs = soup.find_all("div",class_=lambda value: value and re.search(r'^Lyrics__Container', value))
all_text = '\n'.join([div.get_text(separator='\n') for div in divs])
print(all_text)
will print all lyrics
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论