2023年3月1日 15:09:30go评论106阅读模式

英文:

Scraping data from sankey diagram using python and BS

问题

我是新手学Python，目前正在尝试从这个网站上爬取数据：

https://www.iea.org/sankey/#?c=Indonesia&s=Balance

我尝试使用Beautiful Soup和Selenium，但没有成功。需要获取图表内部显示的数据。谢谢你的回答。

我尝试使用Python和Beautiful Soup，我期望会得到一个表格，但没有成功。

import requests
from bs4 import BeautifulSoup
url = "https://www.iea.org/sankey/#?c=Indonesia&amp;s=Balance"
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, 'html.parser')
data = soup.find_all('div', {'class': 'sankey-data'})[0].text
print(data)

英文:

I am new to Python and am currently trying to figure out how to scrape data from this web:

https://www.iea.org/sankey/#?c=Indonesia&s=Balance

i have tried using BS and selenium but it didnt work. Need data that showed inside the diagram. Thank you for your answer

i tried using python and BS, i expect a table would came out but it didnt

import requests
from bs4 import BeautifulSoup
url = &quot;https://www.iea.org/sankey/#?c=Indonesia&amp;s=Balance&quot;
response = requests.get(url)
html_content = response.content
soup = BeautifulSoup(html_content, &#39;html.parser&#39;)
data = soup.find_all(&#39;div&#39;, {&#39;class&#39;: &#39;sankey-data&#39;})[0].text
print(data)

答案1

得分: 0

没有在页面上有表格，数据是通过额外的请求单独重新加载的（https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt）。

鉴于提问者提供的信息较为有限，包括关于预期输出的信息，以下是一个简单的方法，至少可以指导一个方向，并可以根据要求进行调整。

import pandas as pd
pd.read_csv('https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt', sep='\t', header=[0,1,2,3,4,5,6])

英文:

There is no table on the page and the data is reloaded separately through additional requests (https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt).

Due to the sparse information provided by the OP, also with regard to the expected output, here is a simple approach that should at least point in one direction and can be adapted to the requirements.

import pandas as pd
pd.read_csv(&#39;https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt&#39;, sep=&#39;\t&#39;, header=[0,1,2,3,4,5,6])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从桑基图中使用Python和Beautiful Soup（BS）抓取数据。

问题

答案1

pandas系列转为JSON内存泄漏

使用pandas获取文件的动态路径 – Python

有没有更好的方法来接收输入并将其与命令列表进行检查？

函数内部的变量未定义，尽管在全局范围内已经定义。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。