如何从这个网站上进行数据抓取?

huangapple go评论102阅读模式
英文:

how to web scrape the data from this site?

问题

我想获取来自这个网站的8年数据(图表'Encours des parts référencées')。我不知道如何找到这些数据。我检查了网站,但没有看到它们。我想知道它们在哪里,如何获取它们以及应该使用什么API?

这是网站的图片

我尝试了以下代码:

  1. import requests
  2. import pandas as pd
  3. from bs4 import BeautifulSoup
  4. import json

但之后我不知道该怎么做。

任何帮助都将有助于我。

英文:

I would like to get the 8 years of data (graph 'Encours des parts référencées') from this site.
I don't know I can find the data. I inspect the site but don't get see them. I would like to know where they are and how to get them and what api should I use ?
Here's a image of the site

I tried:

  1. import requests
  2. import pandas as pd
  3. from bs4 import BeautifulSoup
  4. import json

but after that I don't know what to do

Any help would be helpful

答案1

得分: 1

数据以Json形式存储在页面中。要提取到pandas数据帧,您可以执行以下操作:

  1. import json
  2. import requests
  3. import pandas as pd
  4. from bs4 import BeautifulSoup
  5. url = "https://www.quantalys.com/espace/518"
  6. soup = BeautifulSoup(requests.get(url).content, "html.parser")
  7. data = soup.select_one("#chartEncours8a input")["value"]
  8. data = json.loads(data)
  9. df = pd.DataFrame(data['dataProvider'])
  10. df['unit'] = data['valueAxes'][0]['unit']
  11. print(df)

打印输出:

  1. category column-1 unit
  2. 0 2015 41.02 Mrd
  3. 1 2016 44.92 Mrd
  4. 2 2017 43.44 Mrd
  5. 3 2018 31.58 Mrd
  6. 4 2019 25.30 Mrd
  7. 5 2020 25.26 Mrd
  8. 6 2021 25.55 Mrd
  9. 7 2022 19.57 Mrd
英文:

The data is stored in the page in Json form. To extract it to pandas dataframe you can do:

  1. import json
  2. import requests
  3. import pandas as pd
  4. from bs4 import BeautifulSoup
  5. url = "https://www.quantalys.com/espace/518"
  6. soup = BeautifulSoup(requests.get(url).content, "html.parser")
  7. data = soup.select_one("#chartEncours8a input")["value"]
  8. data = json.loads(data)
  9. df = pd.DataFrame(data['dataProvider'])
  10. df['unit'] = data['valueAxes'][0]['unit']
  11. print(df)

Prints:

  1. category column-1 unit
  2. 0 2015 41.02 Mrd
  3. 1 2016 44.92 Mrd
  4. 2 2017 43.44 Mrd
  5. 3 2018 31.58 Mrd
  6. 4 2019 25.30 Mrd
  7. 5 2020 25.26 Mrd
  8. 6 2021 25.55 Mrd
  9. 7 2022 19.57 Mrd

huangapple
  • 本文由 发表于 2023年7月13日 00:26:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76672690.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定