英文:
Suddenly this Python program is not able to fetch data from bseindia API. Any way to debug changes in API and pass proper parametrs?
问题
这段代码已经运行了很多年。几年前,我遇到了类似的问题,当时API发生了变化,我不记得如何调试它,以查看是否添加了额外的页码参数。现在再次出现了一些细微的变化,我的程序无法获取数据。任何帮助都将不胜感激。
import requests
import pandas as pd
import sys
import numpy as np
from pandas.io.json import json_normalize
pdate = "20230721" # 起始日期
date = "20230724" # 截止日期
url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
payload = {
'Pageno': 1,
'strCat': '-1',
'strPrevDate': pdate,
'strScrip': '',
'strSearch': 'P',
'strToDate': date,
'strType': 'C'
}
data = []
should_fetch_next_page = True
while should_fetch_next_page:
print(f"Fetching page {payload['Pageno']} ...")
jsonData = requests.get(url, headers=headers, params=payload).json()
if jsonData["Table"]:
data.extend(jsonData["Table"])
payload['Pageno'] += 1
# 我们想要做的每件事
else:
should_fetch_next_page = False
df = pd.DataFrame(data)
print(df)
英文:
This code has been working for many years. A few years back, I faced similar issues where in there was change in API and I do not remember how I could debug it and see that extra parameter for page number was added. Now again there seems to be some slight change and my program is not able to fetch data. Any help shall be appreciated.
import requests
import pandas as pd
import sys
import numpy as np
from pandas.io.json import json_normalize
pdate ="20230721" # starting date
date ="20230724" # till this date
url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
payload = {
'Pageno': 1,
'strCat': '-1',
'strPrevDate': pdate,
'strScrip': '',
'strSearch': 'P',
'strToDate': date,
'strType': 'C'}
data = []
should_fetch_next_page = True
while should_fetch_next_page:
print(f"Fetching page {payload['Pageno']} ...")
jsonData = requests.get(url, headers=headers, params=payload).json()
if jsonData["Table"]:
data.extend(jsonData["Table"])
payload['Pageno'] += 1
# every thing we want to do
else:
should_fetch_next_page = False
df = pd.DataFrame(data)
print(df)
答案1
得分: 0
API的URL已更改,还需要在HTTP头部中添加Referer
:
import requests
import pandas as pd
pdate = "20230721" # 起始日期
date = "20230724" # 截止日期
url = "https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
"Referer": "https://www.bseindia.com/",
}
payload = {
"pageno": 1,
"strCat": "-1",
"strPrevDate": pdate,
"strScrip": "",
"strSearch": "P",
"strToDate": pdate,
"strType": "C",
"subcategory": "",
}
data = []
should_fetch_next_page = True
while should_fetch_next_page:
print(f"获取页面 {payload['pageno']} ...")
jsonData = requests.get(url, headers=headers, params=payload).json()
if jsonData["Table"]:
data.extend(jsonData["Table"])
payload["pageno"] += 1
# 执行我们想要的操作
else:
should_fetch_next_page = False
df = pd.DataFrame(data)
print(df)
打印:
获取页面 1 ...
获取页面 2 ...
获取页面 3 ...
...
注意:为了调试未来的问题,此API请求的基本URL是https://www.bseindia.com/corporates/ann.html
。因此,请在浏览器中打开此URL,然后打开Web开发者工具-> 网络选项卡并重新加载页面。
您应该在那里看到API的URL以及所需的参数/HTTP头部/cookies等...
英文:
The API url changed, also the server need in HTTP header Referer
:
import requests
import pandas as pd
pdate = "20230721" # starting date
date = "20230724" # till this date
url = "https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
"Referer": "https://www.bseindia.com/",
}
payload = {
"pageno": 1,
"strCat": "-1",
"strPrevDate": pdate,
"strScrip": "",
"strSearch": "P",
"strToDate": pdate,
"strType": "C",
"subcategory": "",
}
data = []
should_fetch_next_page = True
while should_fetch_next_page:
print(f"Fetching page {payload['pageno']} ...")
jsonData = requests.get(url, headers=headers, params=payload).json()
if jsonData["Table"]:
data.extend(jsonData["Table"])
payload["pageno"] += 1
# every thing we want to do
else:
should_fetch_next_page = False
df = pd.DataFrame(data)
print(df)
Prints:
Fetching page 1 ...
Fetching page 2 ...
Fetching page 3 ...
...
NOTE: To debug the future problems, the base URL for this API request is https://www.bseindia.com/corporates/ann.html
. So open the URL in your browser and open Web Developer Tools -> Network tab and reload the page.
You should see the API url there + the required parameters/HTTP headers/cookies/etc...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论