英文:
How to obtain contents of previous versions of google sheets using Python and Google APIs
问题
I have translated the non-code part of your text as requested:
"I have various google sheets for which I need to retrieve the historic versions of. These historic sheets pertain to the status of products and will be appended to a Google Big Query table. As such, it is important that I be able to access the actual contents of these old sheets and not just their metadata.
I have attempted this problem with the Python code below. In this code, I have been able to set up a service with the proper credentials. I am then able to get historic versions in the variable revisions
which is a list of dictionaries that look like this:
{'id': '15104',
'mimeType': 'application/vnd.google-apps.spreadsheet',
'kind': 'drive#revision',
'modifiedTime': '2023-06-27T12:41:52.305Z'}
This is where I then get stuck. I am not able to download or retrieve the content of this historic version of the file. I typically get an error that complains about only being able to download binary files:
HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU/revisions/15098?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'domain': 'global', 'reason': 'fileNotDownloadable', 'location': 'alt', 'locationType': 'parameter'}]">
Please help me to understand how to access the contents of the historic files. I am also aware that it might not be possible. If so, please do let me know about such limitations. Thank you for your time."
英文:
I have various google sheets for which I need to retrieve the historic versions of. These historic sheets pertain to the status of products and will be appended to a Google Big Query table. As such, it is important that I be able to access the actual contents of these old sheets and not just their metadata.
I have attempted this problem with the Python code below. In this code, I have been able to setup a service with the proper credentials. I am then able to get historic versions in the variable revisions
which is a list of dictionaries that look like this
{'id': '15104',
'mimeType': 'application/vnd.google-apps.spreadsheet',
'kind': 'drive#revision',
'modifiedTime': '2023-06-27T12:41:52.305Z'}
This is where I then get stuck. I am not able to download or retrieve the content of this historic version of the file. I typically get an error that complains about only being able to download binary files:
HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU/revisions/15098?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'domain': 'global', 'reason': 'fileNotDownloadable', 'location': 'alt', 'locationType': 'parameter'}]">
Please help me to understand how to access the contents of the historic files. I am also aware that it might not be possible. If so, please do let me know about such limitations. Thank you for your time.
import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
SCOPES = [
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/drive.file',
'https://www.googleapis.com/auth/spreadsheets',
]
def login():
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'bom_files.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
return service
def get_sheet_revisions(sheet_id,service):
revisions = service.revisions().list(fileId=sheet_id).execute().get('revisions')
revised_file_contents = [] # contents of revised files
for revision in revisions:
request = service.revisions().get_media(fileId=sheet_id,
revisionId=revision['id'])
file_contents = request.execute()
# Do something with the file like save it.
# For now, lets append it to a list
revised_file_contents.append(file_contents)
return revised_file_contents
if __name__ == '__main__':
service = login()
historic_sheets = get_sheet_revisions(sheet_id,service)
EDIT
I have also tried the following. It actually downloads something but it is an unreadable mess. Google sheets cannot even open the xlsx file that it creates. On a positive note, it does give a url request code of 200.
import os.path
import gspread
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
import requests
SCOPES = ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/spreadsheets']
def login():
creds = None
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
creds = Credentials.from_service_account_file('bom_files.json', scopes=SCOPES)
with open('token.json', 'w') as token:
token.write(creds.to_json())
return creds
def export_sheet_revision(sheet_id, revision_id, export_format):
creds = login()
client = gspread.authorize(creds)
sheet = client.open_by_key(sheet_id)
url = f"https://docs.google.com/spreadsheets/export?id={sheet_id}&revision={revision_id}&exportFormat={export_format}"
return sheet, url
def download_file(url, output_path):
response = requests.get(url)
with open(output_path, 'wb') as file:
file.write(response.content)
if __name__ == '__main__':
sheet_id = '1D1pkeTUDoGZnlHHQh0AiRvFAippyX4OYRWR4XNx3leU'
sheet_id = '1wl7kLGLAgCnFB0dn7JYubO-ZwnK5-s-4Rxq-mQtRRC8' # simpler sheet
revision_id = '15098'
export_format = 'xlsx'
sheet, download_url = export_sheet_revision(sheet_id, revision_id, export_format)
worksheets = sheet.worksheets()
for worksheet in worksheets:
worksheet_title = worksheet.title
worksheet_url = download_url + f'&gid={worksheet.id}'
output_path = f'output_{worksheet_title}.xlsx' # Specify the desired output file path for each worksheet
download_file(worksheet_url, output_path)
print(f"Worksheet '{worksheet_title}' downloaded to: {output_path}")
答案1
得分: 3
以下是要翻译的内容:
I thought that the endpoint for exporting Google Spreadsheet in XLSX format with the specific revision ID can be simply created. When this is reflected in a sample script, how about the following sample script?
Sample script:
In this case, creds
of creds.token
is from creds
of service = build('drive', 'v3', credentials=creds)
.
spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
type = "xlsx"
url = f"https://docs.google.com/spreadsheets/export?id={spreadsheet_id}&revision={revision_id}&exportFormat={type}"
res = requests.get(url, headers={"Authorization": "Bearer " + creds.token})
with open('sample.xlsx', 'wb') as f:
f.write(res.content)
When this script is run, the Google Spreadsheet is exported in XLSX format with the specific revision ID and it is saved as a file. In the above sample script, Google Spreadsheet can be used. When you want to know other exported mimeTypes, you can use the following sample script.
spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
service = build("drive", "v3", credentials=creds)
obj = service.revisions().get(fileId=spreadsheet_id, revisionId=revision_id, fields="*").execute()
urls = obj.get("exportLinks")
print(urls)
Note:
- In the above script, Google Docs files (Documents, Spreadsheets, Slides, and so on) can be used. For example, when the files except for Google Docs files, the following script can be used. This cannot be used for Google Docs files. Please be careful about this. I thought that this might be the reason for your 1st issue.
file_id = "###" # Please set your file ID.
revision_id = "###" # Please set your revision ID.
service = build("drive", "v3", credentials=creds)
request = service.revisions().get_media(fileId=file_id, revisionId=revision_id)
fh = io.FileIO("sample filename", mode='wb')
f = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = f.next_chunk()
print('Download %d%%.' % int(status.progress() * 100))
Reference:
英文:
I thought that the endpoint for exporting Google Spreadsheet in XLSX format with the specific revision ID can be simply created. When this is reflected in a sample script, how about the following sample script?
Sample script:
In this case, creds
of creds.token
is from creds
of service = build('drive', 'v3', credentials=creds)
.
spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
type = "xlsx"
url = f"https://docs.google.com/spreadsheets/export?id={spreadsheet_id}&revision={revision_id}&exportFormat={type}"
res = requests.get(url, headers={"Authorization": "Bearer " + creds.token})
with open('sample.xlsx', 'wb') as f:
f.write(res.content)
When this script is run, the Google Spreadsheet is exported in XLSX format with the specific revision ID and it is saved as a file. In the above sample script, Google Spreadsheet can be used. When you want to know other exported mimeTypes, you can use the following sample script.
spreadsheet_id = "###" # Please set your Spreadsheet ID.
revision_id = "###" # Please set your revision ID.
service = build("drive", "v3", credentials=creds)
obj = service.revisions().get(fileId=spreadsheet_id, revisionId=revision_id, fields="*").execute()
urls = obj.get("exportLinks")
print(urls)
Note:
-
In the above script, Google Docs files (Documents, Spreadsheets, Slides, and so on) can be used. For example, when the files except for Google Docs files, the following script can be used. This cannot be used for Google Docs files. Please be careful about this. I thought that this might be the reason for your 1st issue.
file_id = "###" # Please set your file ID. revision_id = "###" # Please set your revision ID. service = build("drive", "v3", credentials=creds) request = service.revisions().get_media(fileId=file_id, revisionId=revision_id) fh = io.FileIO("sample filename", mode='wb') f = MediaIoBaseDownload(fh, request) done = False while done is False: status, done = f.next_chunk() print('Download %d%%.' % int(status.progress() * 100))
Reference:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论