英文:
ValueError: DataFrame constructor not properly called! (WebScraping)
问题
在网站URL上进行数据更新时,pandas无法解析时间戳值 'TIMESTAMP'。然而,它可以成功获取完整的原始数据。如何筛选时间戳?
尝试1
updateTime = pd.DataFrame(rawdata['records']['timestamp'])
输出:ValueError: DataFrame constructor not properly called!
尝试2
updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'])
输出:updateTime = none
尝试3
updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'], default='Timestamp Not Fetched')
输出:updateTime = Timestamp Not Fetched
原始数据
url = 'https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.5',
'Accept': 'application/json'
}
session = requests.Session()
request = session.get(url, headers=headers)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, cookies=cookies).json()
rawdata = pd.DataFrame(response)
英文:
While scraping at URL for the website's data update 'TIMESTAMP',pandas is not able to parse the timestamp value.
However it does fetch the complete raw data fine.
How to filter for the timestamp?
**Try_1**
updateTime = pd.DataFrame(rawdata['records']['timestamp'])
Output:ValueError: DataFrame constructor not properly called!
**Try_2**
updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'])
Output: updateTime = none
**Try_3**
updateTime = pd.DataFrame(rawdata['records']).get(['timestamp'],default='Timestamp Not Fetched')
Output: updateTime = Timestamp Not Fetched
**RAWDATA**
url = 'https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.5',
'Accept':'application/json'
}
session = requests.Session()
request = session.get(url, headers=headers)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, cookies=cookies).json()
rawdata = pd.DataFrame(response)
答案1
得分: 0
rawdata
已经是一个DataFrame,您不需要调用pd.DataFrame
:
rawdata['records']['timestamp']
'06-Jan-2023 12:23:23'
pd.Timestamp(rawdata['records']['timestamp'])
Timestamp('2023-01-06 12:23:23')
如果您真的想要一个DataFrame,请使用:
updateTime = pd.DataFrame([rawdata['records']['timestamp']])
# 用 --^ 包围 --^
print(updateTime)
# 输出
0
0 06-Jan-2023 12:23:23
英文:
rawdata
is already a DataFrame, you don't need to call pd.DataFrame
:
>>> rawdata['records']['timestamp']
'06-Jan-2023 12:23:23'
>>> pd.Timestamp(rawdata['records']['timestamp'])
Timestamp('2023-01-06 12:23:23')
If you really want a DataFrame, use:
updateTime = pd.DataFrame([rawdata['records']['timestamp']])
# Enclosed with --^ --^
print(updateTime)
# Output
0
0 06-Jan-2023 12:23:23
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论