I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets

huangapple go评论105阅读模式
英文:

I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets

问题

以下是代码部分的翻译:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. from googleapiclient.discovery import build
  5. from google.oauth2 import service_account
  6. url0 = "https://www.hockey-reference.com/leagues/NHL_2023_skaters.html"
  7. soup = BeautifulSoup(requests.get(url0).content, "html.parser")
  8. table = soup.find('table', {'class': 'stats_table'})
  9. df = pd.read_html(str(table))[0]
  10. df.columns = df.columns.droplevel(0)
  11. df = df[df["Player"] != "Player"]
  12. df = df [["Player","Pos","GP","G","A","+/-","S%","PS","FO%"]]
  13. df.fillna(0,inplace=True)
  14. SERVICE_ACCOUNT_FILE = 'keys.json'
  15. SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
  16. creds = None
  17. creds = service_account.Credentials.from_service_account_file(
  18. SERVICE_ACCOUNT_FILE, scopes=SCOPES)
  19. gsheet = '******'
  20. service = build('sheets', 'v4', credentials=creds)
  21. sheet = service.spreadsheets()
  22. var2 = [df.to_json()]
  23. request = sheet.values().update(spreadsheetId=gsheet, range="MasterTest!A1",valueInputOption="USER_ENTERED",body={"values":var2}).execute()

希望这能帮助您解决问题。如果您有其他问题,请随时提出。

英文:

I'm new to Python and I was hoping you guys could help me ... I'm using BeautifulSoup to get information from the website, this works the way I need it, I also need to send this information to a Google sheet, i already enabled the API and made it work via cloud console, I tested it and I was able to send some information to it, but when I try to send this dataframe as .json file I get this error code :

> raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: <HttpError 400 when requesting
> https: ... valueInputOption=USER_ENTERED&alt=json returned "Invalid
> value at 'data.values[0]'

What am I doing wrong?

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. from googleapiclient.discovery import build
  5. from google.oauth2 import service_account
  6. url0 = &quot;https://www.hockey-reference.com/leagues/NHL_2023_skaters.html&quot;
  7. soup = BeautifulSoup(requests.get(url0).content,&quot;html.parser&quot;)
  8. table = soup.find(&#39;table&#39;, {&#39;class&#39;: &#39;stats_table&#39;})
  9. df = pd.read_html(str(table))[0]
  10. df.columns = df.columns.droplevel(0)
  11. df = df[df[&quot;Player&quot;] !=&quot;Player&quot;]
  12. df = df [[&quot;Player&quot;,&quot;Pos&quot;,&quot;GP&quot;,&quot;G&quot;,&quot;A&quot;,&quot;+/-&quot;,&quot;S%&quot;,&quot;PS&quot;,&quot;FO%&quot;]]
  13. df.fillna(0,inplace=True)
  14. SERVICE_ACCOUNT_FILE = &#39;keys.json&#39;
  15. SCOPES = [&#39;https://www.googleapis.com/auth/spreadsheets&#39;]
  16. creds = None
  17. creds = service_account.Credentials.from_service_account_file(
  18. SERVICE_ACCOUNT_FILE, scopes=SCOPES)
  19. gsheet = &#39;******&#39;
  20. service = build(&#39;sheets&#39;, &#39;v4&#39;, credentials=creds)
  21. sheet = service.spreadsheets()
  22. var2 = [df.to_json()]
  23. request = sheet.values().update(spreadsheetId=gsheet, range=&quot;MasterTest!A1&quot;,valueInputOption=&quot;USER_ENTERED&quot;,body={&quot;values&quot;:var2}).execute()

Thanks

BeautifulSoup is fine, the API has been enabled and it is able to get and update info.

答案1

得分: 1

错误发生是因为API等待一个二维数组,var2 需要看起来类似于:

  1. [['2023-01-18', '1234', 'ABC', 1234],
  2. ['2023-01-17', '5678', 'DEF', 5678],
  3. ['2023-01-16', '9012', 'ABC', 9012]]

然后Sheets将每个内部数组添加到单独的行中。我创建了以下函数,将您的数据从字典转换为API可使用的二维数组:

  1. def transformData(dict):
  2. firstkey = list(dict)[0]
  3. lastInnerKey = list(dict[firstkey])[-1]
  4. main_db = []
  5. row = []
  6. for i in range(0, lastInnerKey + 2):
  7. if i == 0:
  8. for x in dict:
  9. row.append("&#39;" + x) # 在值的开头添加 &#39; 告诉Google Sheets这是文本而不是公式。
  10. else:
  11. for x in dict:
  12. if (i - 1) in dict[x]:
  13. row.append(dict[x][i - 1])
  14. if len(row) > 0:
  15. main_db.append(row)
  16. row = []
  17. return main_db

然后,您只需要更新变量var2的声明:

  1. var2 = transformData(df.to_dict())
英文:

The error is showing because the API is waiting for a 2D array, var2 needs to look similar to:

  1. [[&#39;2023-01-18&#39;, &#39;1234&#39;, &#39;ABC&#39;, 1234],
  2. [&#39;2023-01-17&#39;, &#39;5678&#39;, &#39;DEF&#39;, 5678],
  3. [&#39;2023-01-16&#39;, &#39;9012&#39;, &#39;ABC&#39;, 9012]]

Sheets will then add each inner Array on its own row. I made the following function to transform your data from a dictionary to a 2D array that the API can use:

  1. def transformData(dict):
  2. firstkey = list(dict)[0]
  3. lastInnerKey = list(dict[firstkey])[-1]
  4. main_db = []
  5. row = []
  6. for i in range(0, lastInnerKey + 2):
  7. if i == 0:
  8. for x in dict:
  9. row.append(&quot;&#39;&quot; + x) # Appending &#39; to the start of a
  10. # value tells Google Sheets that this is text and not a formula.
  11. else:
  12. for x in dict:
  13. if (i - 1) in dict[x]:
  14. row.append(dict[x][i - 1])
  15. if len(row) &gt; 0:
  16. main_db.append(row)
  17. row = []
  18. return main_db

Then you just need to update the declaration of var2:

  1. var2 = transformData(df.to_dict())

Reference

huangapple
  • 本文由 发表于 2023年3月21日 00:55:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定