Why is it that the url I use changes whenever I run the code but when I paste it manually it works fine?

huangapple go评论88阅读模式
英文:

Why is it that the url I use changes whenever I run the code but when I paste it manually it works fine?

问题

我试图使用Python获取3只股票的季度收入数据当我运行我的代码时它会使用这个起始网址 `https://stockanalysis.com/stocks/tsla/financials/?p=quarterly`,但最终会搜索这个网址:`https://stockanalysis.com/stocks/tsla/financials/`。它会漏掉 `?p=quarterly` 部分但是当我手动复制并粘贴时它就可以正常工作我完全不知道出了什么问题我尝试了我所知道的一切来解决这个问题无论我怎么做我总是得到年度数据而不是季度数据有人有任何建议吗谢谢

我尝试使用用户代理来规避任何机器人阻止器甚至尝试将所有网址保存到一个文本文件中然后为我想要的特定股票调用它们什么都不起作用我总是得到年度数据而不是季度数据
英文:

I am trying to get quarterly revenue data for the 3 stocks using python. When I run my code it takes this starting url https://stockanalysis.com/stocks/tsla/financials/?p=quarterly and ends up searching this url:https://stockanalysis.com/stocks/tsla/financials/. This gets left out ?p=quarterly. But when I copy and paste it manually it works fine. I have no idea what is going wrong and I have tried everything in my knowledge to work this problem. No matter what I always get the annual data instead of the quarterly. Does anyone have any suggestions? Thank you!

import requests
from bs4 import BeautifulSoup
import pandas as pd

tickers = ["AMZN", "FB", "TSLA"]

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

for ticker in tickers:
    # construct the URL directly as a string
    url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
    print(url)
    response = requests.get(url, headers=headers)

    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find('table')

    # Convert the table to a DataFrame
    df = pd.read_html(str(table))[0]

    # Get the first three date columns
    columns = df.columns[1:4]

    # Find the row that contains 'Revenue'
    row = df[df.iloc[:,0].str.contains('Revenue')]

    # Get the values for the last 3 quarters
    values = row[columns].values[0]

    print(f"For {ticker}:")
    print(f"Revenue for last quarter ({columns[0]}) is: {values[0]}")
    print(f"Revenue for 2 quarters ago ({columns[1]}) is: {values[1]}")
    print(f"Revenue for 3 quarters ago ({columns[2]}) is: {values[2]}\n")

I tried using user agents to go around any bot blockers, I even tried saving all the urls to a text file and calling them for specific stocks I want. Nothing works. I always get the annual data instead of the quarterly.

答案1

得分: 1

这些代码片段应该可以正常工作,并根据要求获取季度值,需要两个更改-

1. 将FB重命名为META

2. 将每个股票代码转换为小写,参见注释

import requests
from bs4 import BeautifulSoup
import pandas as pd

tickers = ["AMZN", "META", "TSLA"]  # 更改:将FB重命名为META

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

for ticker in tickers:
    ticker = ticker.lower()  # 更改:将股票代码转换为小写
    # 直接构建URL字符串
    url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
    print(url)
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find('table')

    # 将表格转换为DataFrame
    df = pd.read_html(str(table))[0]

    # 获取前三个日期列
    columns = df.columns[1:4]

    # 找到包含'Revenue'的行
    row = df[df.iloc[:, 0].str.contains('Revenue')]

    # 获取最近3个季度的值
    values = row[columns].values[0]

    print(f"For {ticker}:")
    print(f"上个季度的营收 ({columns[0]}) 为: {values[0]}")
    print(f"2个季度前的营收 ({columns[1]}) 为: {values[1]}")
    print(f"3个季度前的营收 ({columns[2]}) 为: {values[2]}\n")
英文:

These code snippets should work fine and get the quarterly values as per the requirements, Two changes required-

1. Rename FB to META

2. Lowercase each ticker, see the comments

import requests
from bs4 import BeautifulSoup
import pandas as pd

tickers = ["AMZN", "META", "TSLA"] # changes: Rename FB to META

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

for ticker in tickers:
    ticker = ticker.lower() # Changes: Make ticker to lowercase
    # construct the URL directly as a string
    url = f"https://stockanalysis.com/stocks/{ticker}/financials/?p=quarterly"
    print(url)
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find('table')

    # Convert the table to a DataFrame
    df = pd.read_html(str(table))[0]

    # Get the first three date columns
    columns = df.columns[1:4]

    # Find the row that contains 'Revenue'
    row = df[df.iloc[:,0].str.contains('Revenue')]

    # Get the values for the last 3 quarters
    values = row[columns].values[0]

    print(f"For {ticker}:")
    print(f"Revenue for last quarter ({columns[0]}) is: {values[0]}")
    print(f"Revenue for 2 quarters ago ({columns[1]}) is: {values[1]}")
    print(f"Revenue for 3 quarters ago ({columns[2]}) is: {values[2]}\n")

答案2

得分: 0

url = f"https://stockanalysis.com/stocks/{ticker.lower()}/financials/?p=quarterly"

如果您注意到,当手动导航到链接https://stockanalysis.com/stocks/AMZN/financials/?p=quarterly时,您会被重定向到https://stockanalysis.com/stocks/amzn/financials(开发者的解决方法或其他什么),因此您需要在URL请求中使用小写以避免丢失“q”参数。

而且,您不需要为您可能正在使用的任何内容手动指定标题。

英文:
url = f"https://stockanalysis.com/stocks/{ticker.lower()}/financials/?p=quarterly"

If you pay attention, when manually navigating to the link https://stockanalysis.com/stocks/AMZN/financials/?p=quarterly, you get redirected to https://stockanalysis.com/stocks/amzn/financials (developer workaround or who knows what else), so you need to use lowercase in the URL request to avoid losing the "q" parameter.

And you don't need to manually specify a header for whatever you may be using it for.

huangapple
  • 本文由 发表于 2023年6月19日 00:13:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76501466.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定