Web scrape a a title after a specific class by python

huangapple go评论62阅读模式
英文:

Web scrape a a title after a specific class by python

问题

抱歉,您提供的内容似乎包含了代码和文本,您要求只翻译文本部分,以下是文本的翻译:

I'm trying to scrape some information about the positions, artists and songs from a ranking list online. Here is the ranking list website: https://kma.kkbox.com/charts/weekly/newrelease?terr=my&lang=en

I was trying to use the following code to scrape:

import requests
from bs4 import BeautifulSoup
page = requests.get('https://kma.kkbox.com/charts/weekly/newrelease?terr=my&lang=en')
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
all_songs = soup.find_all(class_="charts-list-song")
all_artists = soup.find_all(class_="charts-list-artist")
print(all_songs)
print(all_artists)

However, the output only shows:

[<span class="charts-list-desc">
<span class="charts-list-song"></span>
<span class="charts-list-artist"></span>
</span>, <span class="charts-list-desc">
<span class="charts-list-song"></span>
...

and

<span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>, <span class="charts-list-song"></span>,

My expected output should be:

Pos    artist            songs   
1      張哲瀚             洪荒劇場Primordial Theater
2      張哲瀚             冰川消失那天Lost Glacier
3      告五人             又到天黑
英文:

I'm trying to scrape some information about the positions, artists and songs from a ranking list online. Here is the ranking list website: https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en

I'm was trying to use the following code to scrape:

import requests
from bs4 import BeautifulSoup
page = requests.get(&#39;https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en&#39;)
print(page.status_code)
soup = BeautifulSoup(page.content, &#39;html.parser&#39;)
all_songs = soup.find_all(class_=&quot;charts-list-song&quot;)
all_artists = soup.find_all(class_=&quot;charts-list-artist&quot;)
print(all_songs)
print(all_artists)

However, the output only shows:

[&lt;span class=&quot;charts-list-desc&quot;&gt;
&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;charts-list-artist&quot;&gt;&lt;/span&gt;
&lt;/span&gt;, &lt;span class=&quot;charts-list-desc&quot;&gt;
&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;
...

and

&lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;, &lt;span class=&quot;charts-list-song&quot;&gt;&lt;/span&gt;,

My expected output should be:

Pos    artist            songs   
1      張哲瀚             洪荒劇場Primordial Theater
2      張哲瀚             冰川消失那天Lost Glacier
3      告五人             又到天黑

答案1

得分: 0

以下是您要翻译的内容:

Use view source in Chrome, you can see that the actual chart content is at the end of the html source code and loaded as chart variable.

code

import requests
from bs4 import BeautifulSoup
import json, re

page = requests.get('https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en')
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.select('script')[-2].string
m = re.search(r'var chart = (\[{.*}\])', data)
songs = json.loads(m.group(1))
for song in songs:
    print(song['rankings']['this_period'], song['artist_name'], song['song_name'])

output

1 張哲瀚 洪荒劇場Primordial Theater
2 張哲瀚 冰川消失那天Lost Glacier
3 告五人 又到天黑
4 孫盛希 Shi Shi 眼淚記得你 (Remembered)
5 陳零九 Nine Chen 夢裡的女孩 (The Girl)
6 告五人 一念之間
7 苏有朋 玫瑰急救箱
8 林俊傑 想見你想見你想見你
...
英文:

Use view source in Chrome, you can see that the actual chart content is at the end of the html source code and loaded as chart variable.

code

import requests
from bs4 import BeautifulSoup
import json, re

page = requests.get(&#39;https://kma.kkbox.com/charts/weekly/newrelease?terr=my&amp;lang=en&#39;)
print(page.status_code)
soup = BeautifulSoup(page.content, &#39;html.parser&#39;)
data = soup.select(&#39;script&#39;)[-2].string
m = re.search(r&#39;var chart = (\[{.*}\])&#39;, data)
songs = json.loads(m.group(1))
for song in songs:
    print(song[&#39;rankings&#39;][&#39;this_period&#39;], song[&#39;artist_name&#39;], song[&#39;song_name&#39;])

output

1 張哲瀚 洪荒劇場Primordial Theater
2 張哲瀚 冰川消失那天Lost Glacier
3 告五人 又到天黑
4 孫盛希 Shi Shi 眼淚記得你 (Remembered)
5 陳零九 Nine Chen 夢裡的女孩 (The Girl)
6 告五人 一念之間
7 苏有朋 玫瑰急救箱
8 林俊傑 想見你想見你想見你
...

huangapple
  • 本文由 发表于 2023年2月19日 10:26:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75497635.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定