英文:
No FileSystem for scheme "az" error when trying to read csv from ADLS Gen2 using PySpark
问题
import pandas as pd
import pyspark.pandas as ps
我正在尝试使用pyspark pandas API来比较两个类似的脚本的性能(一个使用pandas,另一个通过pandas接口使用pyspark)。然而,我在从我们的ADLS Gen 2存储中导入数据到pyspark时遇到了问题。
当我运行以下代码时,它按预期工作:
df_pandas = pd.read_csv(f"az://container/path/to/file.csv", sep=';', dtype=str)
然而,当我使用pyspark pandas API运行相同的代码时:
df_spark = ps.read_csv(f"az://container/path/to/file.csv", sep=';', dtype=str)
但是,当我运行这个代码时,会抛出以下错误:
Py4JJavaError: An error occurred while calling o1840.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "az"
我在网上查找并找到其他人遇到了类似问题,他们在使用AWS时遇到了问题,但我不确定如何在Azure上解决这个问题。我尝试将“az”替换为“abfs”,但然后我会收到以下错误:
An error occurred while calling o1852.load.
: abfs://container/path/to/file.csv has invalid authority.
顺便说一下,我是从Azure Synapse笔记本上运行这些代码的。
英文:
import pandas as pd
import pyspark.pandas as ps
I am trying to use the pyspark pandas api to compare performance between two similar scripts (one using pandas and one using pyspark through the pandas interface). However, I have trouble importing my data in pyspark from our ADLS Gen 2 storage.
When I run the following code it works as expected:
df_pandas = pd.read_csv(f"az://container/path/to/file.csv",sep=';', dtype=str)
However when I run the same using the pyspark pandas api:
df_spark = ps.read_csv(f"az://container/path/to/file.csv",sep=';', dtype=str)
However, when I run this the following error gets thrown:
Py4JJavaError: An error occurred while calling o1840.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "az"
I have looked online and found others with similar problems using AWS but I'm not sure how to solve it for Azure. I tried replacing az
with abfs
but I then get the error:
An error occurred while calling o1852.load.
: abfs://container/path/to/file.csv has invalid authority.
I'm running these from Azure Synapse notebooks btw.
答案1
得分: 1
> 从ADLS Gen2读取csv文件。
代码:
import pandas
df = pandas.read_csv('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<file_path>', storage_options = {'account_key' : 'account_key_value'})
输出:
英文:
I reproduce same in environment.I got this output.
> Reading csv files from ADLS Gen2.
Code:
import pandas
df = pandas.read_csv('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<file_path>', storage_options = {'account_key' : 'account_key_value'})
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论