如何从这个网站上进行数据抓取?

huangapple go评论60阅读模式
英文:

how to web scrape the data from this site?

问题

我想获取来自这个网站的8年数据(图表'Encours des parts référencées')。我不知道如何找到这些数据。我检查了网站,但没有看到它们。我想知道它们在哪里,如何获取它们以及应该使用什么API?

这是网站的图片

我尝试了以下代码:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import json

但之后我不知道该怎么做。

任何帮助都将有助于我。

英文:

I would like to get the 8 years of data (graph 'Encours des parts référencées') from this site.
I don't know I can find the data. I inspect the site but don't get see them. I would like to know where they are and how to get them and what api should I use ?
Here's a image of the site

I tried:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import json

but after that I don't know what to do

Any help would be helpful

答案1

得分: 1

数据以Json形式存储在页面中。要提取到pandas数据帧,您可以执行以下操作:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.quantalys.com/espace/518"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#chartEncours8a input")["value"]
data = json.loads(data)

df = pd.DataFrame(data['dataProvider'])
df['unit'] = data['valueAxes'][0]['unit']
print(df)

打印输出:

   category  column-1   unit
0      2015     41.02   Mrd€
1      2016     44.92   Mrd€
2      2017     43.44   Mrd€
3      2018     31.58   Mrd€
4      2019     25.30   Mrd€
5      2020     25.26   Mrd€
6      2021     25.55   Mrd€
7      2022     19.57   Mrd€
英文:

The data is stored in the page in Json form. To extract it to pandas dataframe you can do:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.quantalys.com/espace/518"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#chartEncours8a input")["value"]
data = json.loads(data)

df = pd.DataFrame(data['dataProvider'])
df['unit'] = data['valueAxes'][0]['unit']
print(df)

Prints:

   category  column-1   unit
0      2015     41.02   Mrd€
1      2016     44.92   Mrd€
2      2017     43.44   Mrd€
3      2018     31.58   Mrd€
4      2019     25.30   Mrd€
5      2020     25.26   Mrd€
6      2021     25.55   Mrd€
7      2022     19.57   Mrd€

huangapple
  • 本文由 发表于 2023年7月13日 00:26:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76672690.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定