2023年6月22日 04:25:14go评论126阅读模式

英文:

how to web scrape data from graph

问题

你好，我想从这个互联网页面上进行数据抓取，尤其是历史数据的图表（这里和这里）。

也许有人可以帮助我如何继续进行？更重要的是，我们如何在哪里找到这些数据。

英文:

hi I would like to web scrape data from this internet page especially the graph of historical data (Here and Here)

Maybe someone can help me how to proceed ? and more than that how can we do where and how to find the data.

答案1

得分: 1

以下是您要翻译的代码部分：

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.quantalys.com/Fonds/Historique/19801"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("[data-chartconfig]")["value"]
data = json.loads(data)
df = pd.DataFrame(data["dataProvider"])
df.columns = ["Date"] + [
    t["balloonText"].split(":", maxsplit=1)[-1].strip() for t in data["graphs"]
]
print(df.head())

Prints:

         Date  Amundi Euro High Yield Bond A EUR AD  Oblig. Europe Ht Rendt  ICE BofA European Currency High Yield Index
0  2020-06-19                                100.00                  100.00                                       100.00
1  2020-06-20                                100.00                  100.00                                       100.00
2  2020-06-21                                100.00                  100.00                                       100.00
3  2020-06-22                                 99.07                   99.78                                        99.80
4  2020-06-23                                 99.07                   99.85                                        99.85

如果您需要更多帮助，请告诉我。

英文:

The data for the graph is stored inside the HTML document in Json form. To parse it you can use next example:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = &quot;https://www.quantalys.com/Fonds/Historique/19801&quot;
soup = BeautifulSoup(requests.get(url).content, &quot;html.parser&quot;)
data = soup.select_one(&quot;[data-chartconfig]&quot;)[&quot;value&quot;]
data = json.loads(data)
df = pd.DataFrame(data[&quot;dataProvider&quot;])
df.columns = [&quot;Date&quot;] + [
    t[&quot;balloonText&quot;].split(&quot;:&quot;, maxsplit=1)[-1].strip() for t in data[&quot;graphs&quot;]
]
print(df.head())

Prints:

         Date  Amundi Euro High Yield Bond A EUR AD  Oblig. Europe Ht Rendt  ICE BofA European Currency High Yield Index
0  2020-06-19                                100.00                  100.00                                       100.00
1  2020-06-20                                100.00                  100.00                                       100.00
2  2020-06-21                                100.00                  100.00                                       100.00
3  2020-06-22                                 99.07                   99.78                                        99.80
4  2020-06-23                                 99.07                   99.85                                        99.85

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从图表中网页抓取数据

问题

答案1

在Django中，如何创建一个不使用数据库的独立组合框？

使用公式向数据框添加一行

在循环中绑定变量会改变其类型。

Accessing C pointers to vertices in Blender’s Python API.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。