英文:
Cannot extract the user score details for the anime while scraping using requests and Beautiful Soup
问题
以下是您提供的代码的中文翻译:
我是一个网页抓取的初学者,我正在抓取特定网页https://myanimelist.net/animelist/Arcane,但是使用requests和Beautiful Soup的Python代码无法获取用户评分详情。找不到我做错了什么。请帮忙。
我代码的进展如下:
import requests
from bs4 import BeautifulSoup
import csv
def scrape_user_profile(username):
url = f"https://myanimelist.net/animelist/{username}"
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
# 查找包含动画列表的表格
tbody_list = soup.find_all("tbody", class_="list-item")
if tbody_list:
data = []
for tbody in tbody_list:
anime_row = tbody.find("tr", class_="list-table-data")
title_element = anime_row.find("td", class_="data title clearfix").find("a", class_="link sort")
score_element = anime_row.find("td", class_="data score").find("span", class_="score-label")
if title_element and score_element:
title = title_element.text.strip()
score = score_element.text.strip()
data.append([username, title, score])
if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # 写入列名
writer.writerows(data)
print(f"已保存用户名为{username}的用户详情")
else:
print(f"未找到用户名为{username}的动画列表")
else:
print(f"未找到用户名为{username}的动画列表")
else:
print(f"获取用户名为{username}的用户资料时发生错误")
usernames = ["Arcane"]
for username in usernames:
scrape_user_profile(username)
数据以以下格式获取并写入:
用户名 动画标题 评分
Xinil ${ item.title_localized || item.anime_title } ${ item.score != 0 ? item.score : "-" }
<details>
<summary>英文:</summary>
I'm a beginner to web scraping, I was scraping this particular web page https://myanimelist.net/animelist/Arcane where I was unable to fetch the user score details though my python code using requests and Beautiful Soup. Can't find what i am doing wrong. Please help.
My progress so far in the code is:
import requests
from bs4 import BeautifulSoup
import csv
def scrape_user_profile(username):
url = f"https://myanimelist.net/animelist/{username}"
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
# Find the table containing the anime list
tbody_list = soup.find_all("tbody", class_="list-item")
if tbody_list:
data = []
for tbody in tbody_list:
anime_row = tbody.find("tr", class_="list-table-data")
title_element = anime_row.find("td", class_="data title clearfix").find("a", class_="link sort")
score_element = anime_row.find("td", class_="data score").find("span", class_="score-label")
if title_element and score_element:
title = title_element.text.strip()
score = score_element.text.strip()
data.append([username, title, score])
if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # Write column names
writer.writerows(data)
print(f"User details saved for username: {username}")
else:
print(f"No anime list found for username: {username}")
else:
print(f"No anime list found for username: {username}")
else:
print(f"Error occurred while fetching user profile for username: {username}")
usernames = ["Arcane"]
for username in usernames:
scrape_user_profile(username)
the data is fetched and written in this format:
Username Anime Title Score
Xinil ${ item.title_localized || item.anime_title } ${ item.score != 0 ? item.score : "-" }
</details>
# 答案1
**得分**: 1
以下是已翻译的内容:
问题在于您要抓取的页面使用了某种客户端端渲染。
这基本上意味着网页的动态部分是通过浏览器使用Javascript渲染的。您在页面上看到的并不一定都在页面的html文件中。
例如:
${ item.score != 0 ? item.score : "-" }
幸运的是,您要抓取的实际值确实存在于html页面中,只是不在您期望的位置。页面上的表格具有`data-items`属性,其中包含我们想要的值,以JSON字符串的形式存储。
他们可能出于SEO目的将实际数据放在了html页面中,但这只是我的猜测。
以下是如何抓取数据的方法:
data_items = soup.find("table", {"data-items" : True}) # 获取data-items属性值
data_items_parsed = json.loads(data_items.get("data-items")) # 从JSON字符串中解析Javascript对象
现在我们只需要循环遍历数据并将其发送到您的CSV文件
data = []
for data_item in data_items_parsed:
data.append([username, data_item['anime_title'], data_item['score']])
if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # 写入列名
writer.writerows(data)
代码中使用了`json`,请确保导入它。
<details>
<summary>英文:</summary>
The problem is that the page you're scraping uses some kind of client side rendering.
This basically means that dynamic parts of the web page are rendered by the browser using Javascript. Not all things you see rendered on the page have to actually be inside of the page's html file.
For example:
${ item.score != 0 ? item.score : "-" }
Luckily the actual values you're looking to scrape do exist inside the html page, just not where you expect them to be. The table on the page has a `data-items` attribute which holds the values we want as a JSON string.
They've probably put the actual data inside the html page for SEO purposes, but that's just my guess.
Here's how you could scrape the data:
data_items = soup.find("table", {"data-items" : True}) # Gets the data-items attribute value
data_items_parsed = json.loads(data_items.get("data-items")) # parses Javascript object from JSON string
Now we just need to loop through the data and send it to your csv file
data = []
for data_item in data_items_parsed:
data.append([username, data_item['anime_title'], data_item['score']])
if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # Write column names
writer.writerows(data)
The code uses `json` so be sure to import it.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论