I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets

huangapple go评论68阅读模式
英文:

I'm trying to create a Python script that requests information from a website and I need this information to be sent to gsheets

问题

以下是代码部分的翻译:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from googleapiclient.discovery import build
from google.oauth2 import service_account

url0 = "https://www.hockey-reference.com/leagues/NHL_2023_skaters.html"
soup = BeautifulSoup(requests.get(url0).content, "html.parser")

table = soup.find('table', {'class': 'stats_table'})

df = pd.read_html(str(table))[0]
df.columns = df.columns.droplevel(0)
df = df[df["Player"] != "Player"]
df = df [["Player","Pos","GP","G","A","+/-","S%","PS","FO%"]]
df.fillna(0,inplace=True)

SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']

creds = None
creds = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

gsheet = '******'
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
var2 = [df.to_json()]
request = sheet.values().update(spreadsheetId=gsheet, range="MasterTest!A1",valueInputOption="USER_ENTERED",body={"values":var2}).execute()

希望这能帮助您解决问题。如果您有其他问题,请随时提出。

英文:

I'm new to Python and I was hoping you guys could help me ... I'm using BeautifulSoup to get information from the website, this works the way I need it, I also need to send this information to a Google sheet, i already enabled the API and made it work via cloud console, I tested it and I was able to send some information to it, but when I try to send this dataframe as .json file I get this error code :

> raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: <HttpError 400 when requesting
> https: ... valueInputOption=USER_ENTERED&alt=json returned "Invalid
> value at 'data.values[0]'

What am I doing wrong?

import requests
from bs4 import BeautifulSoup
import pandas as pd
from googleapiclient.discovery import build
from google.oauth2 import service_account

    url0 = &quot;https://www.hockey-reference.com/leagues/NHL_2023_skaters.html&quot;
    soup = BeautifulSoup(requests.get(url0).content,&quot;html.parser&quot;)

    table = soup.find(&#39;table&#39;, {&#39;class&#39;: &#39;stats_table&#39;})

    df = pd.read_html(str(table))[0]
    df.columns = df.columns.droplevel(0)
    df = df[df[&quot;Player&quot;] !=&quot;Player&quot;]
    df = df [[&quot;Player&quot;,&quot;Pos&quot;,&quot;GP&quot;,&quot;G&quot;,&quot;A&quot;,&quot;+/-&quot;,&quot;S%&quot;,&quot;PS&quot;,&quot;FO%&quot;]]
    df.fillna(0,inplace=True)

    SERVICE_ACCOUNT_FILE = &#39;keys.json&#39;
    SCOPES = [&#39;https://www.googleapis.com/auth/spreadsheets&#39;]

    creds = None
    creds = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_FILE, scopes=SCOPES)

    gsheet = &#39;******&#39;
    service = build(&#39;sheets&#39;, &#39;v4&#39;, credentials=creds)
    sheet = service.spreadsheets()
    var2 = [df.to_json()]
    request = sheet.values().update(spreadsheetId=gsheet, range=&quot;MasterTest!A1&quot;,valueInputOption=&quot;USER_ENTERED&quot;,body={&quot;values&quot;:var2}).execute()   

Thanks

BeautifulSoup is fine, the API has been enabled and it is able to get and update info.

答案1

得分: 1

错误发生是因为API等待一个二维数组,var2 需要看起来类似于:

[['2023-01-18', '1234', 'ABC', 1234], 
['2023-01-17', '5678', 'DEF', 5678], 
['2023-01-16', '9012', 'ABC', 9012]]

然后Sheets将每个内部数组添加到单独的行中。我创建了以下函数,将您的数据从字典转换为API可使用的二维数组:

def transformData(dict):
    firstkey = list(dict)[0]
    lastInnerKey = list(dict[firstkey])[-1]
    main_db = []
    row = []

    for i in range(0, lastInnerKey + 2):
        if i == 0:
            for x in dict:
                row.append("&#39;" + x) # 在值的开头添加 &#39; 告诉Google Sheets这是文本而不是公式。
        else:
            for x in dict:
                if (i - 1) in dict[x]:
                    row.append(dict[x][i - 1])
        if len(row) > 0:
            main_db.append(row)
        row = []

    return main_db

然后,您只需要更新变量var2的声明:

var2 = transformData(df.to_dict())
英文:

The error is showing because the API is waiting for a 2D array, var2 needs to look similar to:

[[&#39;2023-01-18&#39;, &#39;1234&#39;, &#39;ABC&#39;, 1234], 
[&#39;2023-01-17&#39;, &#39;5678&#39;, &#39;DEF&#39;, 5678], 
[&#39;2023-01-16&#39;, &#39;9012&#39;, &#39;ABC&#39;, 9012]]

Sheets will then add each inner Array on its own row. I made the following function to transform your data from a dictionary to a 2D array that the API can use:

def transformData(dict):
    firstkey = list(dict)[0]
    lastInnerKey = list(dict[firstkey])[-1]
    main_db = []
    row = []

    for i in range(0, lastInnerKey + 2):
        if i == 0:
            for x in dict:
                row.append(&quot;&#39;&quot; + x) # Appending &#39; to the start of a 
                # value tells Google Sheets that this is text and not a formula.
        else:
            for x in dict:
                if (i - 1) in dict[x]:
                    row.append(dict[x][i - 1])
        if len(row) &gt; 0:
            main_db.append(row)
        row = []

    return main_db

Then you just need to update the declaration of var2:

var2 = transformData(df.to_dict())

Reference

huangapple
  • 本文由 发表于 2023年3月21日 00:55:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定