英文:
I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets
问题
以下是代码部分的翻译:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from googleapiclient.discovery import build
from google.oauth2 import service_account
url0 = "https://www.hockey-reference.com/leagues/NHL_2023_skaters.html"
soup = BeautifulSoup(requests.get(url0).content, "html.parser")
table = soup.find('table', {'class': 'stats_table'})
df = pd.read_html(str(table))[0]
df.columns = df.columns.droplevel(0)
df = df[df["Player"] != "Player"]
df = df [["Player","Pos","GP","G","A","+/-","S%","PS","FO%"]]
df.fillna(0,inplace=True)
SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
creds = None
creds = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
gsheet = '******'
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
var2 = [df.to_json()]
request = sheet.values().update(spreadsheetId=gsheet, range="MasterTest!A1",valueInputOption="USER_ENTERED",body={"values":var2}).execute()
希望这能帮助您解决问题。如果您有其他问题,请随时提出。
英文:
I'm new to Python and I was hoping you guys could help me ... I'm using BeautifulSoup to get information from the website, this works the way I need it, I also need to send this information to a Google sheet, i already enabled the API and made it work via cloud console, I tested it and I was able to send some information to it, but when I try to send this dataframe as .json file I get this error code :
> raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: <HttpError 400 when requesting
> https: ... valueInputOption=USER_ENTERED&alt=json returned "Invalid
> value at 'data.values[0]'
What am I doing wrong?
import requests
from bs4 import BeautifulSoup
import pandas as pd
from googleapiclient.discovery import build
from google.oauth2 import service_account
url0 = "https://www.hockey-reference.com/leagues/NHL_2023_skaters.html"
soup = BeautifulSoup(requests.get(url0).content,"html.parser")
table = soup.find('table', {'class': 'stats_table'})
df = pd.read_html(str(table))[0]
df.columns = df.columns.droplevel(0)
df = df[df["Player"] !="Player"]
df = df [["Player","Pos","GP","G","A","+/-","S%","PS","FO%"]]
df.fillna(0,inplace=True)
SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
creds = None
creds = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
gsheet = '******'
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
var2 = [df.to_json()]
request = sheet.values().update(spreadsheetId=gsheet, range="MasterTest!A1",valueInputOption="USER_ENTERED",body={"values":var2}).execute()
Thanks
BeautifulSoup is fine, the API has been enabled and it is able to get and update info.
答案1
得分: 1
错误发生是因为API等待一个二维数组,var2
需要看起来类似于:
[['2023-01-18', '1234', 'ABC', 1234],
['2023-01-17', '5678', 'DEF', 5678],
['2023-01-16', '9012', 'ABC', 9012]]
然后Sheets将每个内部数组添加到单独的行中。我创建了以下函数,将您的数据从字典转换为API可使用的二维数组:
def transformData(dict):
firstkey = list(dict)[0]
lastInnerKey = list(dict[firstkey])[-1]
main_db = []
row = []
for i in range(0, lastInnerKey + 2):
if i == 0:
for x in dict:
row.append("'" + x) # 在值的开头添加 ' 告诉Google Sheets这是文本而不是公式。
else:
for x in dict:
if (i - 1) in dict[x]:
row.append(dict[x][i - 1])
if len(row) > 0:
main_db.append(row)
row = []
return main_db
然后,您只需要更新变量var2
的声明:
var2 = transformData(df.to_dict())
英文:
The error is showing because the API is waiting for a 2D array, var2
needs to look similar to:
[['2023-01-18', '1234', 'ABC', 1234],
['2023-01-17', '5678', 'DEF', 5678],
['2023-01-16', '9012', 'ABC', 9012]]
Sheets will then add each inner Array on its own row. I made the following function to transform your data from a dictionary to a 2D array that the API can use:
def transformData(dict):
firstkey = list(dict)[0]
lastInnerKey = list(dict[firstkey])[-1]
main_db = []
row = []
for i in range(0, lastInnerKey + 2):
if i == 0:
for x in dict:
row.append("'" + x) # Appending ' to the start of a
# value tells Google Sheets that this is text and not a formula.
else:
for x in dict:
if (i - 1) in dict[x]:
row.append(dict[x][i - 1])
if len(row) > 0:
main_db.append(row)
row = []
return main_db
Then you just need to update the declaration of var2:
var2 = transformData(df.to_dict())
Reference
-
Using ListValue format for the
body
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论