如何使用Azure客户端下载Parquet文件(实际上是目录)?

huangapple go评论66阅读模式
英文:

How to download a parquet "file" (actually directory) using the Azure Client?

问题

I am using the az storage fs file download to download the contents for a parquet directory like this:

az storage fs file download
   --path myname/1/batch-repo/form/Fulfillment/2022/01/02/batch-form-Fulfillment.parquet/
   --account-name my-storage-account --f my-container

The download was attempted but apparently the az cli is not aware this is a parquet and can not handle it - either at the directory level or individual files:

> ValueError: This pipeline didn't have the RawDeserializer policy; can't deserialize

Is there any workaround to download the contents of a parquet file?

英文:

I am using the az storage fs file download to download the contents for a parquet directory like this:

az storage fs file download 
   --path myname/1/batch-repo/form/Fulfillment/2022/01/02/batch-form-Fulfillment.parquet/  
   --account-name my-storage-account --f my-container

The download was attempted but apparently the az cli is not aware this is a parquet and can not handle it - either at the directory level or individual files:

> ValueError: This pipeline didn't have the RawDeserializer policy; can't deserialize

Is there any workaround to download the contents of a parquet file?

答案1

得分: 1

After reproducing from my end, I received same error while downloading a directory using the same script as yours.

如何使用Azure客户端下载Parquet文件(实际上是目录)?

我在自己的端上重现后,使用与您相同的脚本下载目录时出现了相同的错误。

I can see that the individual files are getting downloaded with the below script.

az storage fs file download -f container --path dir1/part-00004-a9e77425-5fb4-456f-ba52-f821123bd193-c000.snappy.parquet --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"

我可以看到使用下面的脚本可以下载单个文件。

az storage fs file download -f container --path dir1/part-00004-a9e77425-5fb4-456f-ba52-f821123bd193-c000.snappy.parquet --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"

However, if you are trying to download at directory level you must use az storage fs directory download. Below is the complete script that worked for me.

然而,如果您尝试在目录级别下载,您必须使用 az storage fs directory download。以下是对我有效的完整脚本。

az storage fs directory download -f container -d folder1 -s dir1 --account-name adls76224157 --account-name <ACCOUNT_NAME> --account-key "<ACCOUNT_KEY>"

Results:
如何使用Azure客户端下载Parquet文件(实际上是目录)?

结果:
如何使用Azure客户端下载Parquet文件(实际上是目录)?

Below is the structure of my files

以下是我的文件结构

如何使用Azure客户端下载Parquet文件(实际上是目录)?

如何使用Azure客户端下载Parquet文件(实际上是目录)?

英文:

After reproducing from my end, I received same error while downloading a directory using the same script as yours.

如何使用Azure客户端下载Parquet文件(实际上是目录)?

I can see that the individual files are getting downloaded with the below script.

az storage fs file download -f container --path dir1/part-00004-a9e77425-5fb4-456f-ba52-f821123bd193-c000.snappy.parquet --account-name &lt;ACCOUNT_NAME&gt; --account-key &quot;&lt;ACCOUNT_KEY&gt;&quot;

However, if you are trying to download at directory level you must use az storage fs directory download. Below is the complete script that worked for me.

az storage fs directory download -f container -d folder1 -s dir1 --account-name adls76224157 --account-name &lt;ACCOUNT_NAME&gt; --account-key &quot;&lt;ACCOUNT_KEY&gt;&quot;

Results:

如何使用Azure客户端下载Parquet文件(实际上是目录)?

Below is the structure of my files

如何使用Azure客户端下载Parquet文件(实际上是目录)?

huangapple
  • 本文由 发表于 2023年5月11日 12:26:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76224157.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定