英文:
How to copy entire structure between storage accounts in python
问题
我的案例如下:
- 两个Azure存储帐户(源/目标)
- 源帐户可能包含多个容器、文件夹、Blob等。
- 所有内容都需要以相同的结构精确复制到目标帐户。
- 如果目标帐户中已存在任何元素,并且它们比源存储帐户中的元素旧,它们需要被覆盖。
我迄今为止所做的:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings
# 设置源和目标存储帐户的连接字符串
SOURCE_CONNECTION_STRING = "您的源连接字符串"
DESTINATION_CONNECTION_STRING = "您的目标连接字符串"
# 为源和目标存储帐户创建BlobServiceClient对象
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)
# 列出源存储帐户中的所有容器
source_containers = source_blob_service_client.list_containers()
# 遍历源存储帐户中的每个容器
for source_container in source_containers:
print(f"处理容器 '{source_container.name}'...")
# 在目标存储帐户中创建新容器(如果尚不存在)
destination_container = destination_blob_service_client.get_container_client(source_container.name)
if not destination_container.exists():
print(f"在目标存储帐户中创建容器 '{source_container.name}'...")
destination_container.create_container()
# 获取当前源容器中所有Blob的列表
source_container_client = source_blob_service_client.get_container_client(source_container.name)
source_blobs = source_container_client.list_blobs()
# 遍历当前源容器中的每个Blob
for source_blob in source_blobs:
# 检查Blob是否已经存在于目标容器中
destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob)
if not destination_blob.exists() or source_blob.last_modified > destination_blob.get_blob_properties().last_modified:
# 将Blob复制到目标容器(保持与源相同的目录结构)
source_blob_client = BlobClient.from_blob_url(source_blob.url)
destination_blob.start_copy_from_url(source_url=source_blob.url)
print(f"已复制Blob '{source_blob.name}'到目标存储帐户中的容器 '{source_container.name}'。")
然而,我遇到一个错误 -- AttributeError: 'BlobProperties'对象没有'URL'属性 -- 在这个笔记本中,https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/howto/python/blob-devguide-py/blob-devguide-blobs.py,以及https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#azure-storage-blob-blobclient-start-copy-from-url 中看到有使用。 有人能提出我做错了什么吗?我选择了Python,因为需要迭代的要求(进入每个容器的最细粒度级别),似乎在Synapse通过管道活动中无法完成。
英文:
my case is the following:
- Two Azure Storage Accounts (Source/Destination)
- Source Account may contain multiple containers, folders, blobs, etc.
- All of the above needs to be copied exactly in the same structure to the DESTINATION account.
- If any elements already exist in the Destination account then if they are older then in the SOURCE storage account they need to be overriden.
What I made so far:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings
# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = "your SOURCE connection string"
DESTINATION_CONNECTION_STRING = "your DESTINATION connection string"
# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)
# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()
# Iterate through each container in the source storage account
for source_container in source_containers:
print(f"Processing container '{source_container.name}'...")
# Create a new container in the destination storage account (if it doesn't exist already)
destination_container = destination_blob_service_client.get_container_client(source_container.name)
if not destination_container.exists():
print(f"Creating container '{source_container.name}' in the destination storage account...")
destination_container.create_container()
# Get a list of all blobs in the current source container
source_container_client = source_blob_service_client.get_container_client(source_container.name)
source_blobs = source_container_client.list_blobs()
#source_blobs = source_blob_service_client.list_blobs(source_container.name)
# Iterate through each blob in the current source container
for source_blob in source_blobs:
# Check if the blob already exists in the destination container
destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob)
if not destination_blob.exists() or source_blob.last_modified > destination_blob.get_blob_properties().last_modified:
# Copy the blob to the destination container (with the same directory structure as in the source)
#source_blob_client = BlobClient.from_blob_url(source_blob.url)
source_blob_client = BlobClient.from_blob_url(source_blob.url)
destination_blob.start_copy_from_url(source_url=source_blob.url)
print(f"Copied blob '{source_blob.name}' to container '{source_container.name}' in the destination storage account.")
However I get an error -- AttributeError: 'BlobProperties' object has no attribute 'url' -- while in the this notebook https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/howto/python/blob-devguide-py/blob-devguide-blobs.py & https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#azure-storage-blob-blobclient-start-copy-from-url - I see it being used.
Can someone suggest what am I doing wrong? I have opted for python due to the iterative requirement (go to the most granular level of each container), which seemed not doable in Synapse via pipeline activities.
答案1
得分: 0
我在我的环境中尝试并获得了以下结果:
最初,在我的环境中出现了相同的错误。
在此笔记本中,我收到了一个错误-- AttributeError: 'BlobProperties' 对象没有属性 'url'。
上述错误是由于 source_blob
对象的类型是 BlobProperties
,它没有 url
属性。相反,您应该使用之前创建的 source_blob_client
对象来获取源 Blob 的 URL。
代码:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings
# 为源和目标存储帐户设置连接字符串
SOURCE_CONNECTION_STRING = "<src_connect_strng>"
DESTINATION_CONNECTION_STRING = "<dest_connect_strng>"
# 为源和目标存储帐户创建 BlobServiceClient 对象
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)
# 列出源存储帐户中的所有容器
source_containers = source_blob_service_client.list_containers()
# 遍历源存储帐户中的每个容器
for source_container in source_containers:
print(f"处理容器 '{source_container.name}'...")
# 在目标存储帐户中创建一个新容器(如果尚不存在)
destination_container = destination_blob_service_client.get_container_client(source_container.name)
if not destination_container.exists():
print(f"在目标存储帐户中创建容器 '{source_container.name}'...")
destination_container.create_container()
# 获取当前源容器中所有 Blob 的列表
source_container_client = source_blob_service_client.get_container_client(source_container.name)
source_blobs = source_container_client.list_blobs()
# 遍历当前源容器中的每个 Blob
for source_blob in source_blobs:
# 检查目标容器中是否已存在该 Blob
destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob.name)
source_blob_client = source_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob_client.url)
destination_blob.start_copy_from_url(source_url=source_blob_client.url)
print(f"已将 Blob '{source_blob.name}' 复制到目标存储帐户的容器 '{source_container.name}'。")
控制台:
上述代码已执行成功,从一个存储帐户复制了相同的结构到另一个存储帐户中,使用 Synapse。
门户:
在门户中,我可以看到目标帐户的结构与源帐户相同。
英文:
I tried in my environment and got below results:
Initially, I got an same error in my environment.
> I got an error -- AttributeError: 'BlobProperties' object has no
> attribute 'url' -- while in the this notebook
The above error occurs due to source_blob
object is of type BlobProperties
, which doesn't have a url
attribute. Instead, you should use the source_blob_client
object you created earlier to get the source blob URL.
Code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings
# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = "<src_connect_strng>"
DESTINATION_CONNECTION_STRING = "<dest_connect_strng>"
# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)
# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()
# Iterate through each container in the source storage account
for source_container in source_containers:
print(f"Processing container '{source_container.name}'...")
# Create a new container in the destination storage account (if it doesn't exist already)
destination_container = destination_blob_service_client.get_container_client(source_container.name)
if not destination_container.exists():
print(f"Creating container '{source_container.name}' in the destination storage account...")
destination_container.create_container()
# Get a list of all blobs in the current source container
source_container_client = source_blob_service_client.get_container_client(source_container.name)
source_blobs = source_container_client.list_blobs()
# Iterate through each blob in the current source container
for source_blob in source_blobs:
# Check if the blob already exists in the destination container
destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob.name)
source_blob_client = source_blob_service_client.get_blob_client(source_container.name, source_blob.name)
print(source_blob_client.url)
destination_blob.start_copy_from_url(source_url=source_blob_client.url)
print(f"Copied blob '{source_blob.name}' to container '{source_container.name}' in the destination storage account.")
Console:
The above code executed and successfully copied same structure from one storage account to another storage account using synapse.
Portal:
In portal I can able to see the destination account as same structure as source account.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论