使用Python代码覆盖Google Cloud Storage存储桶中的单个文件。

huangapple go评论74阅读模式
英文:

Overwrite single file in a Google Cloud Storage bucket, via Python code

问题

在Google Cloud Storage桶中用Python语言正确覆盖现有文件的方法是通过设置if_generation_match参数。您可以将if_generation_match参数设置为目标对象的当前生成号码,这将确保只有在生成号码匹配时才会执行覆盖操作。如果生成号码不匹配,操作将失败,因此您可以捕获异常并执行相应的操作。

以下是如何修改您的代码以实现文件覆盖:

from google.cloud import storage

bucket_name = "my-bucket"
destination_blob_name = "logs.txt"
source_file_name = "logs.txt"  # accessible from this script

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

# 获取目标对象的当前生成号码
current_generation = blob.generation

try:
    # 尝试上传文件,只有在生成号码匹配时才会覆盖
    blob.upload_from_filename(source_file_name, if_generation_match=current_generation)
    print(f"File {source_file_name} uploaded and overwritten to {destination_blob_name}.")
except Exception as e:
    print(f"Error: {e}")

这段代码将首先获取目标对象的当前生成号码,然后尝试使用upload_from_filename方法覆盖目标对象。如果生成号码不匹配,它将捕获异常并输出错误消息。这样,您可以确保只有在生成号码匹配时才会执行覆盖操作。

英文:

I have a logs.txt file at certain location, in a Compute Engine VM Instance. I want to periodically backup (i.e. overwrite) logs.txt in a Google Cloud Storage bucket. Since logs.txt is the result of some preprocessing made inside a Python script, I want to also use that script to upload / copy that file, into the Google Cloud Storage bucket (therefore, the use of cp cannot be considered an option). Both the Compute Engine VM instance, and the Cloud Storage bucket, stay at the same GCP project, so "they see each other". What I am attempting right now, based on this sample code, looks like:

from google.cloud import storage

bucket_name = "my-bucket"
destination_blob_name = "logs.txt"
source_file_name = "logs.txt"  # accessible from this script

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

generation_match_precondition = 0
blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)

print(f"File {source_file_name} uploaded to {destination_blob_name}.")

If gs://my-bucket/logs.txt does not exist, the script works correctly, but if I try to overwrite, I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2571, in upload_from_file
    created_json = self._do_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2372, in _do_upload
    response = self._do_multipart_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 1907, in _do_multipart_upload
    response = upload.transmit(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 153, in transmit
    return _request_helpers.wait_and_retry(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry
    response = func()
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
    self._process_response(result)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_upload.py", line 114, in _process_response
    _helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_helpers.py", line 105, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/my_folder/upload_to_gcs.py", line 76, in <module>
    blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2712, in upload_from_filename
    self.upload_from_file(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2588, in upload_from_file
    _raise_from_invalid_response(exc)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 4455, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.PreconditionFailed: 412 POST https://storage.googleapis.com/upload/storage/v1/b/production-onementor-dt-data/o?uploadType=multipart&ifGenerationMatch=0: {
  "error": {
    "code": 412,
    "message": "At least one of the pre-conditions you specified did not hold.",
    "errors": [
      {
        "message": "At least one of the pre-conditions you specified did not hold.",
        "domain": "global",
        "reason": "conditionNotMet",
        "locationType": "header",
        "location": "If-Match"
      }
    ]
  }
}
: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

I have checked the documentation for upload_from_filename, but it seems there is no flag to "enable overwritting".

How to properly overwrite a file existing in a Google Cloud Storage Bucket, using Python language?

答案1

得分: 4

这是由于 if_generation_match

作为特殊情况,将值为0传递给 if_generation_match 参数会使操作仅在没有活动版本的 blob 时成功。

这就是返回消息 "您指定的先决条件之一未满足" 的含义。

您应该传递 None 或完全省略该参数。

英文:

It's because of if_generation_match

> As a special case, passing 0 as the value for if_generation_match
> makes the operation succeed only if there are no live versions of the
> blob.

This is what is meant by the return message "At least one of the pre-conditions you specified did not hold."

You should pass None or leave out that argument altogether.

huangapple
  • 本文由 发表于 2023年2月24日 00:28:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定