如何在Python中复制存储帐户之间的整个结构

huangapple go评论81阅读模式
英文:

How to copy entire structure between storage accounts in python

问题

我的案例如下:

  1. 两个Azure存储帐户(源/目标)
  2. 源帐户可能包含多个容器、文件夹、Blob等。
  3. 所有内容都需要以相同的结构精确复制到目标帐户。
  4. 如果目标帐户中已存在任何元素,并且它们比源存储帐户中的元素旧,它们需要被覆盖。

我迄今为止所做的:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# 设置源和目标存储帐户的连接字符串
SOURCE_CONNECTION_STRING = "您的源连接字符串"
DESTINATION_CONNECTION_STRING = "您的目标连接字符串"

# 为源和目标存储帐户创建BlobServiceClient对象
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# 列出源存储帐户中的所有容器
source_containers = source_blob_service_client.list_containers()

# 遍历源存储帐户中的每个容器
for source_container in source_containers:
    print(f"处理容器 '{source_container.name}'...")

    # 在目标存储帐户中创建新容器(如果尚不存在)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f"在目标存储帐户中创建容器 '{source_container.name}'...")
        destination_container.create_container()

    # 获取当前源容器中所有Blob的列表
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()

    # 遍历当前源容器中的每个Blob
    for source_blob in source_blobs:

        # 检查Blob是否已经存在于目标容器中
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob)
        if not destination_blob.exists() or source_blob.last_modified > destination_blob.get_blob_properties().last_modified:
            # 将Blob复制到目标容器(保持与源相同的目录结构)
            source_blob_client = BlobClient.from_blob_url(source_blob.url)
            destination_blob.start_copy_from_url(source_url=source_blob.url)

            print(f"已复制Blob '{source_blob.name}'到目标存储帐户中的容器 '{source_container.name}'。")

然而,我遇到一个错误 -- AttributeError: 'BlobProperties'对象没有'URL'属性 -- 在这个笔记本中,https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/howto/python/blob-devguide-py/blob-devguide-blobs.py,以及https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#azure-storage-blob-blobclient-start-copy-from-url 中看到有使用。 有人能提出我做错了什么吗?我选择了Python,因为需要迭代的要求(进入每个容器的最细粒度级别),似乎在Synapse通过管道活动中无法完成。

英文:

my case is the following:

  1. Two Azure Storage Accounts (Source/Destination)
  2. Source Account may contain multiple containers, folders, blobs, etc.
  3. All of the above needs to be copied exactly in the same structure to the DESTINATION account.
  4. If any elements already exist in the Destination account then if they are older then in the SOURCE storage account they need to be overriden.

What I made so far:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = "your SOURCE connection string"
DESTINATION_CONNECTION_STRING = "your DESTINATION connection string"

# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()

# Iterate through each container in the source storage account
for source_container in source_containers:
    print(f"Processing container '{source_container.name}'...")

    # Create a new container in the destination storage account (if it doesn't exist already)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f"Creating container '{source_container.name}' in the destination storage account...")
        destination_container.create_container()

    # Get a list of all blobs in the current source container
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()
    
    #source_blobs = source_blob_service_client.list_blobs(source_container.name)

    # Iterate through each blob in the current source container
    for source_blob in source_blobs:
        
        # Check if the blob already exists in the destination container
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob)
        if not destination_blob.exists() or source_blob.last_modified > destination_blob.get_blob_properties().last_modified:
            # Copy the blob to the destination container (with the same directory structure as in the source)
            #source_blob_client = BlobClient.from_blob_url(source_blob.url)
            source_blob_client = BlobClient.from_blob_url(source_blob.url)
            destination_blob.start_copy_from_url(source_url=source_blob.url)

            print(f"Copied blob '{source_blob.name}' to container '{source_container.name}' in the destination storage account.")

However I get an error -- AttributeError: 'BlobProperties' object has no attribute 'url' -- while in the this notebook https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/howto/python/blob-devguide-py/blob-devguide-blobs.py & https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#azure-storage-blob-blobclient-start-copy-from-url - I see it being used.

Can someone suggest what am I doing wrong? I have opted for python due to the iterative requirement (go to the most granular level of each container), which seemed not doable in Synapse via pipeline activities.

答案1

得分: 0

我在我的环境中尝试并获得了以下结果:

最初,在我的环境中出现了相同的错误。

在此笔记本中,我收到了一个错误-- AttributeError: 'BlobProperties' 对象没有属性 'url'。

上述错误是由于 source_blob 对象的类型是 BlobProperties,它没有 url 属性。相反,您应该使用之前创建的 source_blob_client 对象来获取源 Blob 的 URL。

代码:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# 为源和目标存储帐户设置连接字符串
SOURCE_CONNECTION_STRING = "<src_connect_strng>"
DESTINATION_CONNECTION_STRING = "<dest_connect_strng>"

# 为源和目标存储帐户创建 BlobServiceClient 对象
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# 列出源存储帐户中的所有容器
source_containers = source_blob_service_client.list_containers()

# 遍历源存储帐户中的每个容器
for source_container in source_containers:
    print(f"处理容器 '{source_container.name}'...")

    # 在目标存储帐户中创建一个新容器(如果尚不存在)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f"在目标存储帐户中创建容器 '{source_container.name}'...")
        destination_container.create_container()

    # 获取当前源容器中所有 Blob 的列表
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()

    # 遍历当前源容器中的每个 Blob
    for source_blob in source_blobs:

        # 检查目标容器中是否已存在该 Blob
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob.name)
        source_blob_client = source_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob_client.url)
        destination_blob.start_copy_from_url(source_url=source_blob_client.url)
        print(f"已将 Blob '{source_blob.name}' 复制到目标存储帐户的容器 '{source_container.name}'。")

控制台:

上述代码已执行成功,从一个存储帐户复制了相同的结构到另一个存储帐户中,使用 Synapse。

如何在Python中复制存储帐户之间的整个结构

门户:
在门户中,我可以看到目标帐户的结构与源帐户相同。

如何在Python中复制存储帐户之间的整个结构

英文:

I tried in my environment and got below results:

Initially, I got an same error in my environment.

> I got an error -- AttributeError: 'BlobProperties' object has no
> attribute 'url' -- while in the this notebook

The above error occurs due to source_blob object is of type BlobProperties, which doesn't have a url attribute. Instead, you should use the source_blob_client object you created earlier to get the source blob URL.

Code:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = &quot;&lt;src_connect_strng&gt;&quot;
DESTINATION_CONNECTION_STRING = &quot;&lt;dest_connect_strng&gt;&quot;

# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()

# Iterate through each container in the source storage account
for source_container in source_containers:
    print(f&quot;Processing container &#39;{source_container.name}&#39;...&quot;)

    # Create a new container in the destination storage account (if it doesn&#39;t exist already)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f&quot;Creating container &#39;{source_container.name}&#39; in the destination storage account...&quot;)
        destination_container.create_container()

    # Get a list of all blobs in the current source container
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()
    
    # Iterate through each blob in the current source container
    for source_blob in source_blobs:
        
        # Check if the blob already exists in the destination container
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob.name)
        source_blob_client = source_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob_client.url)
        destination_blob.start_copy_from_url(source_url=source_blob_client.url)
        print(f&quot;Copied blob &#39;{source_blob.name}&#39; to container &#39;{source_container.name}&#39; in the destination storage account.&quot;)

Console:

The above code executed and successfully copied same structure from one storage account to another storage account using synapse.

如何在Python中复制存储帐户之间的整个结构

Portal:
In portal I can able to see the destination account as same structure as source account.

如何在Python中复制存储帐户之间的整个结构

huangapple
  • 本文由 发表于 2023年2月16日 02:31:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定