2023年6月30日 00:23:11go评论163阅读模式

英文:

Cannot extract the user score details for the anime while scraping using requests and Beautiful Soup

问题

以下是您提供的代码的中文翻译：

我是一个网页抓取的初学者，我正在抓取特定网页https://myanimelist.net/animelist/Arcane，但是使用requests和Beautiful Soup的Python代码无法获取用户评分详情。找不到我做错了什么。请帮忙。
我代码的进展如下：

import requests
from bs4 import BeautifulSoup
import csv

def scrape_user_profile(username):
    url = f"https://myanimelist.net/animelist/{username}"
    response = requests.get(url)

    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")

        # 查找包含动画列表的表格
        tbody_list = soup.find_all("tbody", class_="list-item")

        if tbody_list:
            data = []

            for tbody in tbody_list:
                anime_row = tbody.find("tr", class_="list-table-data")
                title_element = anime_row.find("td", class_="data title clearfix").find("a", class_="link sort")
                score_element = anime_row.find("td", class_="data score").find("span", class_="score-label")
                if title_element and score_element:
                    title = title_element.text.strip()
                    score = score_element.text.strip()
                    data.append([username, title, score])

            if data:
                with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
                    writer = csv.writer(file)
                    writer.writerow(["Username", "Anime Title", "Score"])  # 写入列名
                    writer.writerows(data)
                print(f"已保存用户名为{username}的用户详情")
            else:
                print(f"未找到用户名为{username}的动画列表")
        else:
            print(f"未找到用户名为{username}的动画列表")
    else:
        print(f"获取用户名为{username}的用户资料时发生错误")

usernames = ["Arcane"]

for username in usernames:
    scrape_user_profile(username)

数据以以下格式获取并写入：

用户名动画标题评分
Xinil ${ item.title_localized || item.anime_title } ${ item.score != 0 ? item.score : "-" }


<details>
<summary>英文:</summary>
I&#39;m a beginner to web scraping, I was scraping this particular web page https://myanimelist.net/animelist/Arcane where I was unable to fetch the user score details though my python code using requests and Beautiful Soup. Can&#39;t find what i am doing wrong. Please help.
My progress so far in the code is:

import requests
from bs4 import BeautifulSoup
import csv

def scrape_user_profile(username):
url = f"https://myanimelist.net/animelist/{username}"
response = requests.get(url)

if response.status_code == 200:
soup = BeautifulSoup(response.content, &quot;html.parser&quot;)
# Find the table containing the anime list
tbody_list = soup.find_all(&quot;tbody&quot;, class_=&quot;list-item&quot;)
if tbody_list:
data = []
for tbody in tbody_list:
anime_row = tbody.find(&quot;tr&quot;, class_=&quot;list-table-data&quot;)
title_element = anime_row.find(&quot;td&quot;, class_=&quot;data title clearfix&quot;).find(&quot;a&quot;, class_=&quot;link sort&quot;)
score_element = anime_row.find(&quot;td&quot;, class_=&quot;data score&quot;).find(&quot;span&quot;, class_=&quot;score-label&quot;)
if title_element and score_element:
title = title_element.text.strip()
score = score_element.text.strip()
data.append([username, title, score])
if data:
with open(&#39;user_score.csv&#39;, &#39;w&#39;, newline=&#39;&#39;, encoding=&#39;utf-8&#39;) as file:
writer = csv.writer(file)
writer.writerow([&quot;Username&quot;, &quot;Anime Title&quot;, &quot;Score&quot;])  # Write column names
writer.writerows(data)
print(f&quot;User details saved for username: {username}&quot;)
else:
print(f&quot;No anime list found for username: {username}&quot;)
else:
print(f&quot;No anime list found for username: {username}&quot;)
else:
print(f&quot;Error occurred while fetching user profile for username: {username}&quot;)

usernames = ["Arcane"]

for username in usernames:
scrape_user_profile(username)


the data is fetched and written in this format:
Username	Anime Title	Score
Xinil	${ item.title_localized || item.anime_title }	${ item.score != 0 ? item.score : &quot;-&quot; }
</details>
# 答案1
**得分**: 1
以下是已翻译的内容：
问题在于您要抓取的页面使用了某种客户端端渲染。
这基本上意味着网页的动态部分是通过浏览器使用Javascript渲染的。您在页面上看到的并不一定都在页面的html文件中。
例如：

${ item.score != 0 ? item.score : "-" }


幸运的是，您要抓取的实际值确实存在于html页面中，只是不在您期望的位置。页面上的表格具有`data-items`属性，其中包含我们想要的值，以JSON字符串的形式存储。
他们可能出于SEO目的将实际数据放在了html页面中，但这只是我的猜测。
以下是如何抓取数据的方法：

data_items = soup.find("table", {"data-items" : True}) # 获取data-items属性值
data_items_parsed = json.loads(data_items.get("data-items")) # 从JSON字符串中解析Javascript对象

现在我们只需要循环遍历数据并将其发送到您的CSV文件

data = []
for data_item in data_items_parsed:
data.append([username, data_item['anime_title'], data_item['score']])

if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # 写入列名
writer.writerows(data)


代码中使用了`json`，请确保导入它。
<details>
<summary>英文:</summary>
The problem is that the page you&#39;re scraping uses some kind of client side rendering. 
This basically means that dynamic parts of the web page are rendered by the browser using Javascript. Not all things you see rendered on the page have to actually be inside of the page&#39;s html file.
For example:

${ item.score != 0 ? item.score : "-" }


Luckily the actual values you&#39;re looking to scrape do exist inside the html page, just not where you expect them to be. The table on the page has a `data-items` attribute which holds the values we want as a JSON string.
They&#39;ve probably put the actual data inside the html page for SEO purposes, but that&#39;s just my guess.
Here&#39;s how you could scrape the data:

data_items = soup.find("table", {"data-items" : True}) # Gets the data-items attribute value
data_items_parsed = json.loads(data_items.get("data-items")) # parses Javascript object from JSON string

Now we just need to loop through the data and send it to your csv file

data = []
for data_item in data_items_parsed:
data.append([username, data_item['anime_title'], data_item['score']])

if data:
with open('user_score.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["Username", "Anime Title", "Score"]) # Write column names
writer.writerows(data)

The code uses `json` so be sure to import it.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法使用 requests 和 Beautiful Soup 抓取动画的用户评分详细信息。

问题

现在我们只需要循环遍历数据并将其发送到您的CSV文件

Now we just need to loop through the data and send it to your csv file

Azure函数在Python中引发键错误。

如何将Faiss索引写入内存？

Discord.py 在 on_message 客户端事件上识别，但 IF 语句不响应。

Python：嵌套JSON转DataFrame

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论