英文:
Why is it that the url I use changes whenever I run the code but when I paste it manually it works fine?
问题
我试图使用Python获取3只股票的季度收入数据。当我运行我的代码时,它会使用这个起始网址 `https://stockanalysis.com/stocks/tsla/financials/?p=quarterly`,但最终会搜索这个网址:`https://stockanalysis.com/stocks/tsla/financials/`。它会漏掉 `?p=quarterly` 部分。但是当我手动复制并粘贴时,它就可以正常工作。我完全不知道出了什么问题,我尝试了我所知道的一切来解决这个问题。无论我怎么做,我总是得到年度数据,而不是季度数据。有人有任何建议吗?谢谢!
我尝试使用用户代理来规避任何机器人阻止器,甚至尝试将所有网址保存到一个文本文件中,然后为我想要的特定股票调用它们。什么都不起作用,我总是得到年度数据,而不是季度数据。
英文:
I am trying to get quarterly revenue data for the 3 stocks using python. When I run my code it takes this starting url https://stockanalysis.com/stocks/tsla/financials/?p=quarterly
and ends up searching this url:https://stockanalysis.com/stocks/tsla/financials/
. This gets left out ?p=quarterly
. But when I copy and paste it manually it works fine. I have no idea what is going wrong and I have tried everything in my knowledge to work this problem. No matter what I always get the annual data instead of the quarterly. Does anyone have any suggestions? Thank you!
import requests
from bs4 import BeautifulSoup
import pandas as pd
tickers = ["AMZN", "FB", "TSLA"]
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
for ticker in tickers:
# construct the URL directly as a string
url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
print(url)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table')
# Convert the table to a DataFrame
df = pd.read_html(str(table))[0]
# Get the first three date columns
columns = df.columns[1:4]
# Find the row that contains 'Revenue'
row = df[df.iloc[:,0].str.contains('Revenue')]
# Get the values for the last 3 quarters
values = row[columns].values[0]
print(f"For {ticker}:")
print(f"Revenue for last quarter ({columns[0]}) is: {values[0]}")
print(f"Revenue for 2 quarters ago ({columns[1]}) is: {values[1]}")
print(f"Revenue for 3 quarters ago ({columns[2]}) is: {values[2]}\n")
I tried using user agents to go around any bot blockers, I even tried saving all the urls to a text file and calling them for specific stocks I want. Nothing works. I always get the annual data instead of the quarterly.
答案1
得分: 1
这些代码片段应该可以正常工作,并根据要求获取季度值,需要两个更改-
1. 将FB重命名为META
2. 将每个股票代码转换为小写,参见注释
import requests
from bs4 import BeautifulSoup
import pandas as pd
tickers = ["AMZN", "META", "TSLA"] # 更改:将FB重命名为META
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
for ticker in tickers:
ticker = ticker.lower() # 更改:将股票代码转换为小写
# 直接构建URL字符串
url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
print(url)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table')
# 将表格转换为DataFrame
df = pd.read_html(str(table))[0]
# 获取前三个日期列
columns = df.columns[1:4]
# 找到包含'Revenue'的行
row = df[df.iloc[:, 0].str.contains('Revenue')]
# 获取最近3个季度的值
values = row[columns].values[0]
print(f"For {ticker}:")
print(f"上个季度的营收 ({columns[0]}) 为: {values[0]}")
print(f"2个季度前的营收 ({columns[1]}) 为: {values[1]}")
print(f"3个季度前的营收 ({columns[2]}) 为: {values[2]}\n")
英文:
These code snippets should work fine and get the quarterly values as per the requirements, Two changes required-
1. Rename FB to META
2. Lowercase each ticker, see the comments
import requests
from bs4 import BeautifulSoup
import pandas as pd
tickers = ["AMZN", "META", "TSLA"] # changes: Rename FB to META
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
for ticker in tickers:
ticker = ticker.lower() # Changes: Make ticker to lowercase
# construct the URL directly as a string
url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
print(url)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table')
# Convert the table to a DataFrame
df = pd.read_html(str(table))[0]
# Get the first three date columns
columns = df.columns[1:4]
# Find the row that contains 'Revenue'
row = df[df.iloc[:,0].str.contains('Revenue')]
# Get the values for the last 3 quarters
values = row[columns].values[0]
print(f"For {ticker}:")
print(f"Revenue for last quarter ({columns[0]}) is: {values[0]}")
print(f"Revenue for 2 quarters ago ({columns[1]}) is: {values[1]}")
print(f"Revenue for 3 quarters ago ({columns[2]}) is: {values[2]}\n")
答案2
得分: 0
url = f"https://stockanalysis.com/stocks/{ticker.lower()}/financials/?p=quarterly"
如果您注意到,当手动导航到链接https://stockanalysis.com/stocks/AMZN/financials/?p=quarterly时,您会被重定向到https://stockanalysis.com/stocks/amzn/financials(开发者的解决方法或其他什么),因此您需要在URL请求中使用小写以避免丢失“q”参数。
而且,您不需要为您可能正在使用的任何内容手动指定标题。
英文:
url = f"https://stockanalysis.com/stocks/{ticker.lower()}/financials/?p=quarterly"
If you pay attention, when manually navigating to the link https://stockanalysis.com/stocks/AMZN/financials/?p=quarterly, you get redirected to https://stockanalysis.com/stocks/amzn/financials (developer workaround or who knows what else), so you need to use lowercase in the URL request to avoid losing the "q" parameter.
And you don't need to manually specify a header for whatever you may be using it for.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论