2023年2月8日 19:39:49go评论129阅读模式

英文:

Can't fetch data from the analysis tab on Yahoo Finance

问题

我正在尝试从一个网站上抓取一些表格内容。这个[网站](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL)上的数据加载过程发生了巨大变化。以前，所需的数据可以在页面源代码的一些脚本标签中找到。我通过开发工具查看了端点，但没有在那里找到任何数据。不过，我不确定是否在那里漏掉了什么。我对位于“Revenue Estimate”下的表格感兴趣。以下是如何获取内容的示例代码。
```python
import re
import json
import requests
from pprint import pprint
link = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    data = re.findall(r'root.App.main[^{]+([\s\S].*);', res.text)[0]
    jsoncontent = json.loads(data)
    try:
        container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['earningsTrend']
    except TypeError:
        container = ""
    pprint(container)

这是示例代码，用于从网站中获取数据。


<details>
<summary>英文:</summary>
I&#39;m trying to scrape some tabular content from a website. The data loading process on this [website](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL) has changed dramatically. Previously, the necessary data could be found within some script tags in the page source. I looked into the endpoint through dev tools but could not find any data there. I&#39;m not sure if I missed anything in there, though. I&#39;m interested in the table located under `Revenue Estimate`. This is something how I could fetch the content.
    import re
    import json
    import requests
    from pprint import pprint
    
    link = &#39;https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&#39;
    
    headers = {
        &#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36&#39;,
    }
    with requests.Session() as s:
        s.headers.update(headers)
        res = s.get(link)
        data = re.findall(r&#39;root.App.main[^{]+([\s\S].*);&#39;,res.text)[0]
        jsoncontent = json.loads(data)
    
        # pprint(jsoncontent)
    
        try:
            container = jsoncontent[&#39;context&#39;][&#39;dispatcher&#39;][&#39;stores&#39;][&#39;QuoteSummaryStore&#39;][&#39;earningsTrend&#39;]
        except TypeError: container = &quot;&quot;
    
        pprint(container)
</details>
# 答案1
**得分**: 1
请尝试使用以下代码：
```python
import requests
headers = {
    'authority': 'finance.yahoo.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3',
    'cache-control': 'no-cache',
    'dnt': '1',
    'pragma': 'no-cache',
    'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78',
}
params = {
    'p': 'AAPL',
}
response = requests.get('https://finance.yahoo.com/quote/AAPL/analysis', params=params, headers=headers)

然后从 response.content 中解析所需的值。

英文:

Try using:

import requests
headers = {
    &#39;authority&#39;: &#39;finance.yahoo.com&#39;,
    &#39;accept&#39;: &#39;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&#39;,
    &#39;accept-language&#39;: &#39;de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3&#39;,
    &#39;cache-control&#39;: &#39;no-cache&#39;,
    &#39;dnt&#39;: &#39;1&#39;,
    &#39;pragma&#39;: &#39;no-cache&#39;,
    &#39;sec-ch-ua&#39;: &#39;&quot;Not_A Brand&quot;;v=&quot;99&quot;, &quot;Microsoft Edge&quot;;v=&quot;109&quot;, &quot;Chromium&quot;;v=&quot;109&quot;&#39;,
    &#39;sec-ch-ua-mobile&#39;: &#39;?0&#39;,
    &#39;sec-ch-ua-platform&#39;: &#39;&quot;Windows&quot;&#39;,
    &#39;sec-fetch-dest&#39;: &#39;document&#39;,
    &#39;sec-fetch-mode&#39;: &#39;navigate&#39;,
    &#39;sec-fetch-site&#39;: &#39;same-origin&#39;,
    &#39;sec-fetch-user&#39;: &#39;?1&#39;,
    &#39;upgrade-insecure-requests&#39;: &#39;1&#39;,
    &#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78&#39;,
}
params = {
    &#39;p&#39;: &#39;AAPL&#39;,
}
response = requests.get(&#39;https://finance.yahoo.com/quote/AAPL/analysis&#39;, params=params, headers=headers)

and efter that parse the desired values from response.content.

答案2

得分: 1

你可以使用Pandas DataFrame来获取Revenue Estimate表格数据，如下所示：

import requests
import pandas as pd
headers = {"user-agent": "Mozilla/5.0"}
res = requests.get("https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&amp;guccounter=1", headers=headers).text
# print(res)
df = pd.read_html(res)[1]
print(df)

输出：

   Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
0  No. of Analysts                      24                   23                  39               36
1    Avg. Estimate                  93.19B               85.59B             392.39B          417.75B
2     Low Estimate                  91.81B               81.32B             378.62B          398.67B
3    High Estimate                  98.84B               90.12B             414.04B          438.76B
4   Year Ago Sales                  97.28B               82.96B             394.33B          392.39B
5  Sales Growth (year/est)                  -4.20%                3.20%              -0.50%            6.50%

英文:

You can use Pandas DataFrame to get the Revenue Estimate table data as follows:

import requests
import pandas as pd
headers= {&quot;user-agent&quot;:&quot;Mozilla/5.0&quot;}
res = requests.get(&quot;https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&amp;guccounter=1&quot;, headers=headers).text
#print(res)
df= pd.read_html(res)[1]
print(df)

Output:

      Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
0          No. of Analysts                      24                   23                  39               36
1            Avg. Estimate                  93.19B               85.59B             392.39B          417.75B
2             Low Estimate                  91.81B               81.32B             378.62B          398.67B
3            High Estimate                  98.84B               90.12B             414.04B          438.76B
4           Year Ago Sales                  97.28B               82.96B             394.33B          392.39B
5  Sales Growth (year/est)                  -4.20%                3.20%              -0.50%            6.50%

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法从Yahoo Finance的分析选项卡中获取数据。

问题

答案2

使用configparser删除部分名称，但保留其键和值对。

如何更正此代码，以避免引发SettingWithCopyWarning？

更高效的方式匹配数据框中的数值是什么？

根据另一列中的数值替换缺失数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。