无法从Yahoo Finance的分析选项卡中获取数据。

huangapple go评论129阅读模式
英文:

Can't fetch data from the analysis tab on Yahoo Finance

问题

  1. 我正在尝试从一个网站上抓取一些表格内容这个[网站](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL)上的数据加载过程发生了巨大变化以前所需的数据可以在页面源代码的一些脚本标签中找到我通过开发工具查看了端点但没有在那里找到任何数据不过我不确定是否在那里漏掉了什么我对位于Revenue Estimate下的表格感兴趣以下是如何获取内容的示例代码
  2. ```python
  3. import re
  4. import json
  5. import requests
  6. from pprint import pprint
  7. link = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
  8. headers = {
  9. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
  10. }
  11. with requests.Session() as s:
  12. s.headers.update(headers)
  13. res = s.get(link)
  14. data = re.findall(r'root.App.main[^{]+([\s\S].*);', res.text)[0]
  15. jsoncontent = json.loads(data)
  16. try:
  17. container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['earningsTrend']
  18. except TypeError:
  19. container = ""
  20. pprint(container)

这是示例代码,用于从网站中获取数据。

  1. <details>
  2. <summary>英文:</summary>
  3. I&#39;m trying to scrape some tabular content from a website. The data loading process on this [website](https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL) has changed dramatically. Previously, the necessary data could be found within some script tags in the page source. I looked into the endpoint through dev tools but could not find any data there. I&#39;m not sure if I missed anything in there, though. I&#39;m interested in the table located under `Revenue Estimate`. This is something how I could fetch the content.
  4. import re
  5. import json
  6. import requests
  7. from pprint import pprint
  8. link = &#39;https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&#39;
  9. headers = {
  10. &#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36&#39;,
  11. }
  12. with requests.Session() as s:
  13. s.headers.update(headers)
  14. res = s.get(link)
  15. data = re.findall(r&#39;root.App.main[^{]+([\s\S].*);&#39;,res.text)[0]
  16. jsoncontent = json.loads(data)
  17. # pprint(jsoncontent)
  18. try:
  19. container = jsoncontent[&#39;context&#39;][&#39;dispatcher&#39;][&#39;stores&#39;][&#39;QuoteSummaryStore&#39;][&#39;earningsTrend&#39;]
  20. except TypeError: container = &quot;&quot;
  21. pprint(container)
  22. </details>
  23. # 答案1
  24. **得分**: 1
  25. 请尝试使用以下代码:
  26. ```python
  27. import requests
  28. headers = {
  29. 'authority': 'finance.yahoo.com',
  30. 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
  31. 'accept-language': 'de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3',
  32. 'cache-control': 'no-cache',
  33. 'dnt': '1',
  34. 'pragma': 'no-cache',
  35. 'sec-ch-ua': '"Not_A Brand";v="99", "Microsoft Edge";v="109", "Chromium";v="109"',
  36. 'sec-ch-ua-mobile': '?0',
  37. 'sec-ch-ua-platform': '"Windows"',
  38. 'sec-fetch-dest': 'document',
  39. 'sec-fetch-mode': 'navigate',
  40. 'sec-fetch-site': 'same-origin',
  41. 'sec-fetch-user': '?1',
  42. 'upgrade-insecure-requests': '1',
  43. 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78',
  44. }
  45. params = {
  46. 'p': 'AAPL',
  47. }
  48. response = requests.get('https://finance.yahoo.com/quote/AAPL/analysis', params=params, headers=headers)

然后从 response.content 中解析所需的值。

英文:

Try using:

  1. import requests
  2. headers = {
  3. &#39;authority&#39;: &#39;finance.yahoo.com&#39;,
  4. &#39;accept&#39;: &#39;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&#39;,
  5. &#39;accept-language&#39;: &#39;de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6,fr;q=0.5,de-CH;q=0.4,es;q=0.3&#39;,
  6. &#39;cache-control&#39;: &#39;no-cache&#39;,
  7. &#39;dnt&#39;: &#39;1&#39;,
  8. &#39;pragma&#39;: &#39;no-cache&#39;,
  9. &#39;sec-ch-ua&#39;: &#39;&quot;Not_A Brand&quot;;v=&quot;99&quot;, &quot;Microsoft Edge&quot;;v=&quot;109&quot;, &quot;Chromium&quot;;v=&quot;109&quot;&#39;,
  10. &#39;sec-ch-ua-mobile&#39;: &#39;?0&#39;,
  11. &#39;sec-ch-ua-platform&#39;: &#39;&quot;Windows&quot;&#39;,
  12. &#39;sec-fetch-dest&#39;: &#39;document&#39;,
  13. &#39;sec-fetch-mode&#39;: &#39;navigate&#39;,
  14. &#39;sec-fetch-site&#39;: &#39;same-origin&#39;,
  15. &#39;sec-fetch-user&#39;: &#39;?1&#39;,
  16. &#39;upgrade-insecure-requests&#39;: &#39;1&#39;,
  17. &#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78&#39;,
  18. }
  19. params = {
  20. &#39;p&#39;: &#39;AAPL&#39;,
  21. }
  22. response = requests.get(&#39;https://finance.yahoo.com/quote/AAPL/analysis&#39;, params=params, headers=headers)

and efter that parse the desired values from response.content.

答案2

得分: 1

你可以使用Pandas DataFrame来获取Revenue Estimate表格数据,如下所示:

  1. import requests
  2. import pandas as pd
  3. headers = {"user-agent": "Mozilla/5.0"}
  4. res = requests.get("https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&amp;guccounter=1", headers=headers).text
  5. # print(res)
  6. df = pd.read_html(res)[1]
  7. print(df)

输出:

  1. Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
  2. 0 No. of Analysts 24 23 39 36
  3. 1 Avg. Estimate 93.19B 85.59B 392.39B 417.75B
  4. 2 Low Estimate 91.81B 81.32B 378.62B 398.67B
  5. 3 High Estimate 98.84B 90.12B 414.04B 438.76B
  6. 4 Year Ago Sales 97.28B 82.96B 394.33B 392.39B
  7. 5 Sales Growth (year/est) -4.20% 3.20% -0.50% 6.50%
英文:

You can use Pandas DataFrame to get the Revenue Estimate table data as follows:

  1. import requests
  2. import pandas as pd
  3. headers= {&quot;user-agent&quot;:&quot;Mozilla/5.0&quot;}
  4. res = requests.get(&quot;https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL&amp;guccounter=1&quot;, headers=headers).text
  5. #print(res)
  6. df= pd.read_html(res)[1]
  7. print(df)

Output:

  1. Revenue Estimate Current Qtr. (Mar 2023) Next Qtr. (Jun 2023) Current Year (2023) Next Year (2024)
  2. 0 No. of Analysts 24 23 39 36
  3. 1 Avg. Estimate 93.19B 85.59B 392.39B 417.75B
  4. 2 Low Estimate 91.81B 81.32B 378.62B 398.67B
  5. 3 High Estimate 98.84B 90.12B 414.04B 438.76B
  6. 4 Year Ago Sales 97.28B 82.96B 394.33B 392.39B
  7. 5 Sales Growth (year/est) -4.20% 3.20% -0.50% 6.50%

huangapple
  • 本文由 发表于 2023年2月8日 19:39:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385266.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定