2023年2月19日 10:26:27go评论164阅读模式

英文:

Web scrape a a title after a specific class by python

问题

抱歉，您提供的内容似乎包含了代码和文本，您要求只翻译文本部分，以下是文本的翻译：

I'm trying to scrape some information about the positions, artists and songs from a ranking list online. Here is the ranking list website: https://kma.kkbox.com/charts/weekly/newrelease?terr=my&lang=en

I was trying to use the following code to scrape:

import requests
from bs4 import BeautifulSoup
page = requests.get('https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en')
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
all_songs = soup.find_all(class_="charts-list-song")
all_artists = soup.find_all(class_="charts-list-artist")
print(all_songs)
print(all_artists)

However, the output only shows:

[<span class="charts-list-desc">
<span class="charts-list-song"></span>
<span class="charts-list-artist"></span>
</span>, <span class="charts-list-desc">
<span class="charts-list-song"></span>
...

and

<span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>,

My expected output should be:

Pos    artist            songs   
1      張哲瀚             洪荒劇場Primordial Theater
2      張哲瀚             冰川消失那天Lost Glacier
3      告五人             又到天黑

英文:

I'm was trying to use the following code to scrape:

import requests
from bs4 import BeautifulSoup
page = requests.get(&#39;https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en&#39;)
print(page.status_code)
soup = BeautifulSoup(page.content, &#39;html.parser&#39;)
all_songs = soup.find_all(class_=&quot;charts-list-song&quot;)
all_artists = soup.find_all(class_=&quot;charts-list-artist&quot;)
print(all_songs)
print(all_artists)

However, the output only shows:

[&lt;span class=&quot;charts-list-desc&quot;&gt;
&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;charts-list-artist&quot;&gt;&lt;/span&gt;
&lt;/span&gt;, &lt;span class=&quot;charts-list-desc&quot;&gt;
&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;
...

and

&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;,

My expected output should be:

Pos    artist            songs   
1      張哲瀚             洪荒劇場Primordial Theater
2      張哲瀚             冰川消失那天Lost Glacier
3      告五人             又到天黑

答案1

得分: 0

以下是您要翻译的内容：

Use view source in Chrome, you can see that the actual chart content is at the end of the html source code and loaded as chart variable.

code

import requests
from bs4 import BeautifulSoup
import json, re

page = requests.get('https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en')
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.select('script')[-2].string
m = re.search(r'var chart = (\[{.*}\])', data)
songs = json.loads(m.group(1))
for song in songs:
    print(song['rankings']['this_period'], song['artist_name'], song['song_name'])

output

1 張哲瀚 洪荒劇場Primordial Theater
2 張哲瀚 冰川消失那天Lost Glacier
3 告五人 又到天黑
4 孫盛希 Shi Shi 眼淚記得你 (Remembered)
5 陳零九 Nine Chen 夢裡的女孩 (The Girl)
6 告五人 一念之間
7 苏有朋 玫瑰急救箱
8 林俊傑 想見你想見你想見你
...

英文:

Use view source in Chrome, you can see that the actual chart content is at the end of the html source code and loaded as chart variable.

code

import requests
from bs4 import BeautifulSoup
import json, re

page = requests.get(&#39;https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en&#39;)
print(page.status_code)
soup = BeautifulSoup(page.content, &#39;html.parser&#39;)
data = soup.select(&#39;script&#39;)[-2].string
m = re.search(r&#39;var chart = (\[{.*}\])&#39;, data)
songs = json.loads(m.group(1))
for song in songs:
    print(song[&#39;rankings&#39;][&#39;this_period&#39;], song[&#39;artist_name&#39;], song[&#39;song_name&#39;])

output

1 張哲瀚 洪荒劇場Primordial Theater
2 張哲瀚 冰川消失那天Lost Glacier
3 告五人 又到天黑
4 孫盛希 Shi Shi 眼淚記得你 (Remembered)
5 陳零九 Nine Chen 夢裡的女孩 (The Girl)
6 告五人 一念之間
7 苏有朋 玫瑰急救箱
8 林俊傑 想見你想見你想見你
...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Web scrape a a title after a specific class by python

问题

答案1

Python Turtle 模块的编写在屏幕上没有显示出来。

Python如何找到重复项

在Streamlit中使用Plotly Express添加标签和图例。

Python 3使用仅标准库发送带文件的请求。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论