Suddenly this Python program is not able to fetch data from bseindia API. Any way to debug changes in API and pass proper parametrs?

huangapple go评论69阅读模式
英文:

Suddenly this Python program is not able to fetch data from bseindia API. Any way to debug changes in API and pass proper parametrs?

问题

这段代码已经运行了很多年。几年前,我遇到了类似的问题,当时API发生了变化,我不记得如何调试它,以查看是否添加了额外的页码参数。现在再次出现了一些细微的变化,我的程序无法获取数据。任何帮助都将不胜感激。

import requests
import pandas as pd
import sys
import numpy as np
from pandas.io.json import json_normalize

pdate = "20230721"  # 起始日期
date = "20230724"   # 截止日期
url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}

payload = {
    'Pageno': 1,
    'strCat': '-1',
    'strPrevDate': pdate,
    'strScrip': '',
    'strSearch': 'P',
    'strToDate': date,
    'strType': 'C'
}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['Pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload['Pageno'] += 1
        # 我们想要做的每件事

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)
英文:

This code has been working for many years. A few years back, I faced similar issues where in there was change in API and I do not remember how I could debug it and see that extra parameter for page number was added. Now again there seems to be some slight change and my program is not able to fetch data. Any help shall be appreciated.


import requests
import pandas as pd
import sys
import numpy as np
from pandas.io.json import json_normalize
pdate ="20230721"               # starting date
date ="20230724"            # till this date
url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

payload = {
'Pageno': 1,
'strCat': '-1',
'strPrevDate': pdate,
'strScrip': '',
'strSearch': 'P',
'strToDate':   date,
'strType': 'C'}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['Pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload['Pageno'] += 1
        # every thing we want to do

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)

答案1

得分: 0

API的URL已更改,还需要在HTTP头部中添加Referer

import requests
import pandas as pd

pdate = "20230721"  # 起始日期
date = "20230724"  # 截止日期
url = "https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
    "Referer": "https://www.bseindia.com/",
}

payload = {
    "pageno": 1,
    "strCat": "-1",
    "strPrevDate": pdate,
    "strScrip": "",
    "strSearch": "P",
    "strToDate": pdate,
    "strType": "C",
    "subcategory": "",
}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"获取页面 {payload['pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload["pageno"] += 1
        # 执行我们想要的操作

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)

打印:

获取页面 1 ...
获取页面 2 ...
获取页面 3 ...
...

注意:为了调试未来的问题,此API请求的基本URL是https://www.bseindia.com/corporates/ann.html。因此,请在浏览器中打开此URL,然后打开Web开发者工具-> 网络选项卡并重新加载页面。

您应该在那里看到API的URL以及所需的参数/HTTP头部/cookies等...

英文:

The API url changed, also the server need in HTTP header Referer:

import requests
import pandas as pd

pdate = "20230721"  # starting date
date = "20230724"  # till this date
url = "https://api.bseindia.com/BseIndiaAPI/api/AnnSubCategoryGetData/w"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
    "Referer": "https://www.bseindia.com/",
}

payload = {
    "pageno": 1,
    "strCat": "-1",
    "strPrevDate": pdate,
    "strScrip": "",
    "strSearch": "P",
    "strToDate": pdate,
    "strType": "C",
    "subcategory": "",
}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload["pageno"] += 1
        # every thing we want to do

    else:
        should_fetch_next_page = False

df = pd.DataFrame(data)
print(df)

Prints:

Fetching page 1 ...
Fetching page 2 ...
Fetching page 3 ...

...

NOTE: To debug the future problems, the base URL for this API request is https://www.bseindia.com/corporates/ann.html. So open the URL in your browser and open Web Developer Tools -> Network tab and reload the page.

You should see the API url there + the required parameters/HTTP headers/cookies/etc...

huangapple
  • 本文由 发表于 2023年7月24日 18:45:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定