2023年6月27日 21:15:22go评论156阅读模式

英文:

How to obtain contents of previous versions of google sheets using Python and Google APIs

问题

I have translated the non-code part of your text as requested:

"I have various google sheets for which I need to retrieve the historic versions of. These historic sheets pertain to the status of products and will be appended to a Google Big Query table. As such, it is important that I be able to access the actual contents of these old sheets and not just their metadata.

I have attempted this problem with the Python code below. In this code, I have been able to set up a service with the proper credentials. I am then able to get historic versions in the variable revisions which is a list of dictionaries that look like this:

{'id': '15104',
 'mimeType': 'application/vnd.google-apps.spreadsheet',
 'kind': 'drive#revision',
 'modifiedTime': '2023-06-27T12:41:52.305Z'}

This is where I then get stuck. I am not able to download or retrieve the content of this historic version of the file. I typically get an error that complains about only being able to download binary files:

HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU/revisions/15098?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'domain': 'global', 'reason': 'fileNotDownloadable', 'location': 'alt', 'locationType': 'parameter'}]">

Please help me to understand how to access the contents of the historic files. I am also aware that it might not be possible. If so, please do let me know about such limitations. Thank you for your time."

英文:

I have various google sheets for which I need to retrieve the historic versions of. These historic sheets pertain to the status of products and will be appended to a Google Big Query table. As such, it is important that I be able to access the actual contents of these old sheets and not just their metadata.

I have attempted this problem with the Python code below. In this code, I have been able to setup a service with the proper credentials. I am then able to get historic versions in the variable revisions which is a list of dictionaries that look like this

{&#39;id&#39;: &#39;15104&#39;,
 &#39;mimeType&#39;: &#39;application/vnd.google-apps.spreadsheet&#39;,
 &#39;kind&#39;: &#39;drive#revision&#39;,
 &#39;modifiedTime&#39;: &#39;2023-06-27T12:41:52.305Z&#39;}

HttpError: &lt;HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU/revisions/15098?alt=media returned &quot;Only files with binary content can be downloaded. Use Export with Docs Editors files.&quot;. Details: &quot;[{&#39;message&#39;: &#39;Only files with binary content can be downloaded. Use Export with Docs Editors files.&#39;, &#39;domain&#39;: &#39;global&#39;, &#39;reason&#39;: &#39;fileNotDownloadable&#39;, &#39;location&#39;: &#39;alt&#39;, &#39;locationType&#39;: &#39;parameter&#39;}]&quot;&gt;

import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
SCOPES = [
    &#39;https://www.googleapis.com/auth/drive&#39;,
    &#39;https://www.googleapis.com/auth/drive.file&#39;,
    &#39;https://www.googleapis.com/auth/spreadsheets&#39;,
]
def login():
    creds = None
    # The file token.json stores the user&#39;s access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists(&#39;token.json&#39;):
        creds = Credentials.from_authorized_user_file(&#39;token.json&#39;, SCOPES)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                &#39;bom_files.json&#39;, SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open(&#39;token.json&#39;, &#39;w&#39;) as token:
            token.write(creds.to_json())
    service = build(&#39;drive&#39;, &#39;v3&#39;, credentials=creds)
    # Call the Drive v3 API
    return service
def get_sheet_revisions(sheet_id,service):
    revisions = service.revisions().list(fileId=sheet_id).execute().get(&#39;revisions&#39;)
    revised_file_contents = []  # contents of revised files
    for revision in revisions:
        request = service.revisions().get_media(fileId=sheet_id,
                                                revisionId=revision[&#39;id&#39;])
        file_contents = request.execute()
        # Do something with the file like save it.
        # For now, lets append it to a list
        revised_file_contents.append(file_contents)
    return revised_file_contents
if __name__ == &#39;__main__&#39;:
    service = login()
    historic_sheets = get_sheet_revisions(sheet_id,service)

EDIT

I have also tried the following. It actually downloads something but it is an unreadable mess. Google sheets cannot even open the xlsx file that it creates. On a positive note, it does give a url request code of 200.

import os.path
import gspread
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
import requests
SCOPES = [&#39;https://www.googleapis.com/auth/drive&#39;, &#39;https://www.googleapis.com/auth/spreadsheets&#39;]
def login():
    creds = None
    if os.path.exists(&#39;token.json&#39;):
        creds = Credentials.from_authorized_user_file(&#39;token.json&#39;, SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            creds = Credentials.from_service_account_file(&#39;bom_files.json&#39;, scopes=SCOPES)
        with open(&#39;token.json&#39;, &#39;w&#39;) as token:
            token.write(creds.to_json())
    return creds
def export_sheet_revision(sheet_id, revision_id, export_format):
    creds = login()
    client = gspread.authorize(creds)
    sheet = client.open_by_key(sheet_id)
    url = f&quot;https://docs.google.com/spreadsheets/export?id={sheet_id}&amp;revision={revision_id}&amp;exportFormat={export_format}&quot;
    return sheet, url
def download_file(url, output_path):
    response = requests.get(url)
    with open(output_path, &#39;wb&#39;) as file:
        file.write(response.content)
if __name__ == &#39;__main__&#39;:
    sheet_id = &#39;1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU&#39;
    sheet_id = &#39;1wl7kLGLAgCnFB0dn7JYubO-ZwnK5-s-4Rxq-mQtRRC8&#39;  # simpler sheet
    revision_id = &#39;15098&#39;
    export_format = &#39;xlsx&#39;
    
    sheet, download_url = export_sheet_revision(sheet_id, revision_id, export_format)
    worksheets = sheet.worksheets()
    for worksheet in worksheets:
        worksheet_title = worksheet.title
        worksheet_url = download_url + f&#39;&amp;gid={worksheet.id}&#39;
        output_path = f&#39;output_{worksheet_title}.xlsx&#39;  # Specify the desired output file path for each worksheet
        
        download_file(worksheet_url, output_path)
        print(f&quot;Worksheet &#39;{worksheet_title}&#39; downloaded to: {output_path}&quot;)

答案1

得分: 3

以下是要翻译的内容：

I thought that the endpoint for exporting Google Spreadsheet in XLSX format with the specific revision ID can be simply created. When this is reflected in a sample script, how about the following sample script?

Sample script:

In this case, creds of creds.token is from creds of service = build('drive', 'v3', credentials=creds).

spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
type = "xlsx"
url = f"https://docs.google.com/spreadsheets/export?id={spreadsheet_id}&revision={revision_id}&exportFormat={type}"
res = requests.get(url, headers={"Authorization": "Bearer " + creds.token})
with open('sample.xlsx', 'wb') as f:
    f.write(res.content)

When this script is run, the Google Spreadsheet is exported in XLSX format with the specific revision ID and it is saved as a file. In the above sample script, Google Spreadsheet can be used. When you want to know other exported mimeTypes, you can use the following sample script.

spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
service = build("drive", "v3", credentials=creds)
obj = service.revisions().get(fileId=spreadsheet_id, revisionId=revision_id, fields="*").execute()
urls = obj.get("exportLinks")
print(urls)

Note:

In the above script, Google Docs files (Documents, Spreadsheets, Slides, and so on) can be used. For example, when the files except for Google Docs files, the following script can be used. This cannot be used for Google Docs files. Please be careful about this. I thought that this might be the reason for your 1st issue.

file_id = "###" # Please set your file ID.
revision_id = "###" # Please set your revision ID.
service = build("drive", "v3", credentials=creds)
request = service.revisions().get_media(fileId=file_id, revisionId=revision_id)
fh = io.FileIO("sample filename", mode='wb')
f = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = f.next_chunk()
    print('Download %d%%.' % int(status.progress() * 100))

Reference:

Method: revisions.get

英文:

Sample script:

In this case, creds of creds.token is from creds of service = build('drive', 'v3', credentials=creds).

spreadsheet_id = &quot;###&quot; # Please set your Spreadsheet ID.
revision_id = &quot;###&quot; # Please set your revision ID.
type = &quot;xlsx&quot;
url = f&quot;https://docs.google.com/spreadsheets/export?id={spreadsheet_id}&amp;revision={revision_id}&amp;exportFormat={type}&quot;
res = requests.get(url, headers={&quot;Authorization&quot;: &quot;Bearer &quot; + creds.token})
with open(&#39;sample.xlsx&#39;, &#39;wb&#39;) as f:
    f.write(res.content)

spreadsheet_id = &quot;###&quot; # Please set your Spreadsheet ID.
revision_id = &quot;###&quot; # Please set your revision ID.
service = build(&quot;drive&quot;, &quot;v3&quot;, credentials=creds)
obj = service.revisions().get(fileId=spreadsheet_id, revisionId=revision_id, fields=&quot;*&quot;).execute()
urls = obj.get(&quot;exportLinks&quot;)
print(urls)

Note:

In the above script, Google Docs files (Documents, Spreadsheets, Slides, and so on) can be used. For example, when the files except for Google Docs files, the following script can be used. This cannot be used for Google Docs files. Please be careful about this. I thought that this might be the reason for your 1st issue.

file_id = &quot;###&quot; # Please set your file ID.
revision_id = &quot;###&quot; # Please set your revision ID.
service = build(&quot;drive&quot;, &quot;v3&quot;, credentials=creds)
request = service.revisions().get_media(fileId=file_id, revisionId=revision_id)
fh = io.FileIO(&quot;sample filename&quot;, mode=&#39;wb&#39;)
f = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = f.next_chunk()
    print(&#39;Download %d%%.&#39; % int(status.progress() * 100))

Reference:

Method: revisions.get

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python和Google API获取Google Sheets先前版本的内容

问题

EDIT

答案1

Sample script:

Note:

Reference:

Sample script:

Note:

Reference:

读取CSV文件并插入数据库的性能

Altair 单击时工具提示的位置记录到文件中

我正在尝试从网站上爬取图像，使用了Selenium，但在代码中出现了错误。

如何使用Qt和Python禁用SSL验证？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论