Uploading files to Google Drive using python with Docker – googleapiclient.errors.UnknownFileType

huangapple go评论64阅读模式
英文:

Uploading files to Google Drive using python with Docker - googleapiclient.errors.UnknownFileType

问题

我在尝试在Docker环境中使用Google Drive API上传文件到Google Drive时遇到了错误。文件上传在Docker之外正常工作,但在Docker容器内运行代码时,我收到以下错误信息:

Traceback (most recent call last):
  File "/app/apis/gtest.py", line 60, in <module>
    media = drive_service.files().create(body=file_metadata, media_body=file_path).execute()
  File "/usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py", line 1143, in method
    raise UnknownFileType(media_filename)
googleapiclient.errors.UnknownFileType: upload/2d0a3e62-f442-4b2f-a816-f00e03a6f4db/someexcelfile.xlsx

我的总体目标是将文件上传到Google Drive,文件类型可以是XLSX或JSONL,具体取决于文件的MIME类型。当在Windows机器上运行代码时,上传正常工作,但问题只在Docker容器中出现。

提供的代码执行了必要的检查并生成了正确的文件元数据。对于JSONL文件,它将MIME类型设置为"application/octet-stream"并使用MediaFileUpload对象进行上传过程。然而,在尝试上传XLSX文件时,出现了错误。

如何确保Docker能够识别文件上传的正确MIME类型,以便在容器内正常工作?

以下是参考代码:

import os
from datetime import date
from google.oauth2 import service_account
from googleapiclient.discovery import build

# 定义Google Drive文件夹ID
folder_id = 'somegoogledrivefolderid'

# 定义具有正确文件扩展名的文件名
file_path = 'upload/2d0a3e62-f442-4b2f-a816-f00e03a6f4db/someexcelfile.xlsx'
file_name = os.path.basename(file_path)  # 使用文件路径的基本名称

# 检查文件扩展名是否为'jsonl'
file_ext = os.path.splitext(file_path)[1]  # 提取文件扩展名
if file_ext == '.jsonl':
    print("该文件是jsonl文件")
    # 在此处添加处理jsonl文件的代码
else:
    print("该文件不是jsonl文件")
    # 在此处添加处理非jsonl文件的代码

# 基于当前日期生成文件夹名称
current_date = date.today().strftime('%m/%d/%Y')
folder_name = os.path.join(current_date, '').replace("\\", "")

# 进行身份验证并创建Google Drive服务
credentials = service_account.Credentials.from_service_account_file('scripts/gs_credentials.json')
drive_service = build('drive', 'v3', credentials=credentials)

# 列出指定文件夹中的文件夹
folder_query = f"'{folder_id}' in parents and mimeType = 'application/vnd.google-apps.folder'"
response = drive_service.files().list(q=folder_query).execute()

# 检查文件夹是否已存在
existing_folder_id = None
for folder in response.get('files', []):
    if folder['name'] == folder_name:
        existing_folder_id = folder['id']
        break

if existing_folder_id:
    print(f"具有ID的文件夹已存在:{existing_folder_id}")
    folder_id = existing_folder_id
else:
    # 创建文件夹
    folder_metadata = {
        'name': folder_name,
        'parents': [folder_id],
        'mimeType': 'application/vnd.google-apps.folder'
    }
    folder = drive_service.files().create(body=folder_metadata, fields='id').execute()
    folder_id = folder.get('id')
    print(f"具有ID的新文件夹已创建:{folder_id}")

# 将文件上传到生成的文件夹
file_metadata = {
    'name': file_name,
    'parents': [folder_id]
}
media = drive_service.files().create(body=file_metadata, media_mime_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", media_body=file_path).execute()

print(f'文件已成功保存:{media["name"]}')

如何使Docker识别文件上传的MIME类型以便在容器内正常工作?

英文:

I am encountering an error when attempting to upload files to Google Drive using the Google Drive API in a Docker environment. The file upload works perfectly fine outside of Docker, but when running the code within a Docker container, I receive the following error:

Traceback (most recent call last):
File &quot;/app/apis/gtest.py&quot;, line 60, in &lt;module&gt;
media = drive_service.files().create(body=file_metadata, media_body=file_path).execute()
File &quot;/usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py&quot;, line 1143, in method
raise UnknownFileType(media_filename)
googleapiclient.errors.UnknownFileType: upload/2d0a3e62-f442-4b2f-a816-f00e03a6f4db/someexcelfile.xlsx

My overall goal is to upload files to Google Drive, where the file type can be either XLSX or JSONL, depending on the file's MIME type. While the upload works flawlessly when running the code on a Windows machine, the issue arises only within the Docker container.

The provided code performs the necessary checks and generates the correct file metadata. For JSONL files, it sets the MIME type to "application/octet-stream" and uses the MediaFileUpload object for the upload process. However, when attempting to upload an XLSX file, the error occurs.

How can I ensure that Docker recognizes the correct MIME type for the file upload, allowing it to work properly within the container?

Here is the code for reference:

Here is the code:
import os
from datetime import date
from google.oauth2 import service_account
from googleapiclient.discovery import build
# Define the Google Drive folder ID
folder_id = &#39;somegoogledrivefolderid&#39;
# Define the file name with the correct file extension
file_path = &#39;upload/2d0a3e62-f442-4b2f-a816-f00e03a6f4db/someexcelfile.xlsx&#39;
file_name = os.path.basename(file_path)  # Use the base name of the file path
# Check if the file extension is &#39;jsonl&#39;
file_ext = os.path.splitext(file_path)[1]  # Extract the file extension
if file_ext == &#39;.jsonl&#39;:
print(&quot;The file is a jsonl file&quot;)
# Add your code here to handle jsonl files
else:
print(&quot;The file is not a jsonl file&quot;)
# Add your code here to handle non-jsonl files
# Generate the folder name based on the current date
current_date = date.today().strftime(&#39;%m/%d/%Y&#39;)
folder_name = os.path.join(current_date, &#39;&#39;).replace(&quot;\\&quot;, &quot;&quot;)
# Authenticate and create a Google Drive service
credentials = service_account.Credentials.from_service_account_file(&#39;scripts/gs_credentials.json&#39;)
drive_service = build(&#39;drive&#39;, &#39;v3&#39;, credentials=credentials)
# List folders in the specified folder
folder_query = f&quot;&#39;{folder_id}&#39; in parents and mimeType = &#39;application/vnd.google-apps.folder&#39;&quot;
response = drive_service.files().list(q=folder_query).execute()
# Check if folder already exists
existing_folder_id = None
for folder in response.get(&#39;files&#39;, []):
if folder[&#39;name&#39;] == folder_name:
existing_folder_id = folder[&#39;id&#39;]
break
if existing_folder_id:
print(f&quot;Folder already exists with ID: {existing_folder_id}&quot;)
folder_id = existing_folder_id
else:
# Create the folder
folder_metadata = {
&#39;name&#39;: folder_name,
&#39;parents&#39;: [folder_id],
&#39;mimeType&#39;: &#39;application/vnd.google-apps.folder&#39;
}
folder = drive_service.files().create(body=folder_metadata, fields=&#39;id&#39;).execute()
folder_id = folder.get(&#39;id&#39;)
print(f&quot;New folder created with ID: {folder_id}&quot;)
# Upload the file to the generated folder
file_metadata = {
&#39;name&#39;: file_name,
&#39;parents&#39;: [folder_id]
}
media = drive_service.files().create(body=file_metadata,media_mime_type=&quot;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet&quot;, media_body=file_path).execute()
print(f&#39;Successfully saved the file: {media[&quot;name&quot;]}&#39;)

How can I make docker recognize the mimetype so the upload will work properly?

答案1

得分: 1

我弄清楚了!!!我去了discovery模块(googleapiclient.discovery)中的回溯位置,并开始阅读其中的一些内容。

if media_filename:
    # 确保我们得到一个有效的MediaUpload对象。
    if isinstance(media_filename, str):
        if media_mime_type is None:
            logger.warning(
                "media_mime_type参数未指定:尝试为%s自动检测",
                media_filename,
            )
            media_mime_type, _ = mimetypes.guess_type(media_filename)
        if media_mime_type is None:
            raise UnknownFileType(media_filename)

然后我试图找出media_mime_type是从哪里来的,发现:

media_mime_type = kwargs.get("media_mime_type", None)

所以我意识到它是一个我可以使用的参数,于是我这样做了:

media = drive_service.files().create(body=file_metadata,media_mime_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", media_body=file_path).execute()

然后它成功了。

英文:

I figured it out!!!! I went to where the traceback was in the discovery module (googleapiclient.discovery) and started reading some of it.

if media_filename:
# Ensure we end up with a valid MediaUpload object.
if isinstance(media_filename, str):
if media_mime_type is None:
logger.warning(
&quot;media_mime_type argument not specified: trying to auto-detect for %s&quot;,
media_filename,
)
media_mime_type, _ = mimetypes.guess_type(media_filename)
if media_mime_type is None:
raise UnknownFileType(media_filename)

I then tried to find out where media_mime_type came from and found out:

media_mime_type = kwargs.get(&quot;media_mime_type&quot;, None)

So I realized it was an argument I can use, so I did this:

media = drive_service.files().create(body=file_metadata,media_mime_type=&quot;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet&quot;, media_body=file_path).execute()

and it worked

huangapple
  • 本文由 发表于 2023年6月1日 14:10:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379113.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定