英文:
Can't fetch data from the analysis tab on Yahoo Finance
问题
我正在尝试从一个网站上抓取一些表格内容。这个[网站](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL)上的数据加载过程发生了巨大变化。以前,所需的数据可以在页面源代码的一些脚本标签中找到。我通过开发工具查看了端点,但没有在那里找到任何数据。不过,我不确定是否在那里漏掉了什么。我对位于“Revenue Estimate”下的表格感兴趣。以下是如何获取内容的示例代码。
```python
import re
import json
import requests
from pprint import pprint
link = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link)
data = re.findall(r'root.App.main[^{]+([\s\S].*);', res.text)[0]
jsoncontent = json.loads(data)
try:
container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['earningsTrend']
except TypeError:
container = ""
pprint(container)
这是示例代码,用于从网站中获取数据。
<details>
<summary>英文:</summary>
I'm trying to scrape some tabular content from a website. The data loading process on this [website](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL) has changed dramatically. Previously, the necessary data could be found within some script tags in the page source. I looked into the endpoint through dev tools but could not find any data there. I'm not sure if I missed anything in there, though. I'm interested in the table located under `Revenue Estimate`. This is something how I could fetch the content.
import re
import json
import requests
from pprint import pprint
link = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link)
data = re.findall(r'root.App.main[^{]+([\s\S].*);',res.text)[0]
jsoncontent = json.loads(data)
# pprint(jsoncontent)
try:
container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['earningsTrend']
except TypeError: container = ""
pprint(container)
</details>
# 答案1
**得分**: 1
请尝试使用以下代码:
```python
import requests
headers = {
'authority': 'finance.yahoo.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78',
}
params = {
'p': 'AAPL',
}
response = requests.get('https://finance.yahoo.com/quote/AAPL/analysis', params=params, headers=headers)
然后从 response.content
中解析所需的值。
英文:
Try using:
import requests
headers = {
'authority': 'finance.yahoo.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78',
}
params = {
'p': 'AAPL',
}
response = requests.get('https://finance.yahoo.com/quote/AAPL/analysis', params=params, headers=headers)
and efter that parse the desired values from response.content
.
答案2
得分: 1
你可以使用Pandas DataFrame来获取Revenue Estimate
表格数据,如下所示:
import requests
import pandas as pd
headers = {"user-agent": "Mozilla/5.0"}
res = requests.get("https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&guccounter=1", headers=headers).text
# print(res)
df = pd.read_html(res)[1]
print(df)
输出:
Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
0 No. of Analysts 24 23 39 36
1 Avg. Estimate 93.19B 85.59B 392.39B 417.75B
2 Low Estimate 91.81B 81.32B 378.62B 398.67B
3 High Estimate 98.84B 90.12B 414.04B 438.76B
4 Year Ago Sales 97.28B 82.96B 394.33B 392.39B
5 Sales Growth (year/est) -4.20% 3.20% -0.50% 6.50%
英文:
You can use Pandas DataFrame to get the Revenue Estimate
table data as follows:
import requests
import pandas as pd
headers= {"user-agent":"Mozilla/5.0"}
res = requests.get("https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&guccounter=1", headers=headers).text
#print(res)
df= pd.read_html(res)[1]
print(df)
Output:
Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
0 No. of Analysts 24 23 39 36
1 Avg. Estimate 93.19B 85.59B 392.39B 417.75B
2 Low Estimate 91.81B 81.32B 378.62B 398.67B
3 High Estimate 98.84B 90.12B 414.04B 438.76B
4 Year Ago Sales 97.28B 82.96B 394.33B 392.39B
5 Sales Growth (year/est) -4.20% 3.20% -0.50% 6.50%
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论