从桑基图中使用Python和Beautiful Soup(BS)抓取数据。

huangapple go评论66阅读模式
英文:

Scraping data from sankey diagram using python and BS

问题

我是新手学Python,目前正在尝试从这个网站上爬取数据:

https://www.iea.org/sankey/#?c=Indonesia&s=Balance

我尝试使用Beautiful Soup和Selenium,但没有成功。需要获取图表内部显示的数据。谢谢你的回答。

我尝试使用Python和Beautiful Soup,我期望会得到一个表格,但没有成功。

import requests
from bs4 import BeautifulSoup

url = "https://www.iea.org/sankey/#?c=Indonesia&s=Balance"
response = requests.get(url)
html_content = response.content

soup = BeautifulSoup(html_content, 'html.parser')
data = soup.find_all('div', {'class': 'sankey-data'})[0].text

print(data)
英文:

I am new to Python and am currently trying to figure out how to scrape data from this web:

https://www.iea.org/sankey/#?c=Indonesia&s=Balance

i have tried using BS and selenium but it didnt work. Need data that showed inside the diagram. Thank you for your answer

i tried using python and BS, i expect a table would came out but it didnt

import requests
from bs4 import BeautifulSoup

url = "https://www.iea.org/sankey/#?c=Indonesia&s=Balance"
response = requests.get(url)
html_content = response.content

soup = BeautifulSoup(html_content, 'html.parser')
data = soup.find_all('div', {'class': 'sankey-data'})[0].text

print(data)

答案1

得分: 0

没有在页面上有表格,数据是通过额外的请求单独重新加载的(https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt)。

鉴于提问者提供的信息较为有限,包括关于预期输出的信息,以下是一个简单的方法,至少可以指导一个方向,并可以根据要求进行调整。

import pandas as pd

pd.read_csv('https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt', sep='\t', header=[0,1,2,3,4,5,6])
英文:

There is no table on the page and the data is reloaded separately through additional requests (https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt).

Due to the sparse information provided by the OP, also with regard to the expected output, here is a simple approach that should at least point in one direction and can be adapted to the requirements.

import pandas as pd

pd.read_csv('https://www.iea.org/sankey/data/Indonesia.SBBSBBBSBBS_YY.txt', sep='\t', header=[0,1,2,3,4,5,6])  

huangapple
  • 本文由 发表于 2023年3月1日 15:09:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75600517.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定