ValueError: DataFrame constructor not properly called! (WebScraping)

huangapple go评论189阅读模式
英文:

ValueError: DataFrame constructor not properly called! (WebScraping)

问题

在网站URL上进行数据更新时,pandas无法解析时间戳值 'TIMESTAMP'。然而,它可以成功获取完整的原始数据。如何筛选时间戳?

尝试1

updateTime = pd.DataFrame(rawdata['records']['timestamp'])

输出:ValueError: DataFrame constructor not properly called!

尝试2

updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'])

输出:updateTime = none

尝试3

updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'], default='Timestamp Not Fetched')

输出:updateTime = Timestamp Not Fetched

原始数据

url = 'https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY'
headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.5',
        'Accept': 'application/json'
    }
session = requests.Session()
request = session.get(url, headers=headers)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, cookies=cookies).json()
rawdata = pd.DataFrame(response)

ValueError: DataFrame constructor not properly called! (WebScraping)

英文:

While scraping at URL for the website's data update 'TIMESTAMP',pandas is not able to parse the timestamp value.
However it does fetch the complete raw data fine.
How to filter for the timestamp?

**Try_1**  
updateTime = pd.DataFrame(rawdata['records']['timestamp'])  
Output:ValueError: DataFrame constructor not properly called!

**Try_2**  
 updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'])  
Output: updateTime = none

**Try_3**  
 updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'],default='Timestamp Not Fetched')  
Output: updateTime = Timestamp Not Fetched   

**RAWDATA**   
  url = 'https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY'  
headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.5',
        'Accept':'application/json'
    }   
    session = requests.Session()  
    request = session.get(url, headers=headers)  
    cookies = dict(request.cookies)  
    response = session.get(url, headers=headers, cookies=cookies).json()   
    rawdata = pd.DataFrame(response)

ValueError: DataFrame constructor not properly called! (WebScraping)

答案1

得分: 0

rawdata已经是一个DataFrame,您不需要调用pd.DataFrame

rawdata['records']['timestamp']
'06-Jan-2023 12:23:23'

pd.Timestamp(rawdata['records']['timestamp'])
Timestamp('2023-01-06 12:23:23')

如果您真的想要一个DataFrame,请使用:

updateTime = pd.DataFrame([rawdata['records']['timestamp']])
#         用 --^ 包围                            --^
print(updateTime)

# 输出
                      0
0  06-Jan-2023 12:23:23
英文:

rawdata is already a DataFrame, you don't need to call pd.DataFrame:

>>> rawdata['records']['timestamp']
'06-Jan-2023 12:23:23'

>>> pd.Timestamp(rawdata['records']['timestamp'])
Timestamp('2023-01-06 12:23:23')

If you really want a DataFrame, use:

updateTime = pd.DataFrame([rawdata['records']['timestamp']])
#         Enclosed with --^                             --^
print(updateTime)

# Output
                      0
0  06-Jan-2023 12:23:23

huangapple
  • 本文由 发表于 2023年1月6日 14:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75027822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定