2023年1月9日 19:34:53go评论111阅读模式

英文:

No FileSystem for scheme "az" error when trying to read csv from ADLS Gen2 using PySpark

问题

import pandas as pd
import pyspark.pandas as ps

我正在尝试使用pyspark pandas API来比较两个类似的脚本的性能（一个使用pandas，另一个通过pandas接口使用pyspark）。然而，我在从我们的ADLS Gen 2存储中导入数据到pyspark时遇到了问题。

当我运行以下代码时，它按预期工作：

df_pandas = pd.read_csv(f"az://container/path/to/file.csv", sep=';', dtype=str)

然而，当我使用pyspark pandas API运行相同的代码时：

df_spark = ps.read_csv(f"az://container/path/to/file.csv", sep=';', dtype=str)

但是，当我运行这个代码时，会抛出以下错误：

Py4JJavaError: An error occurred while calling o1840.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "az"

我在网上查找并找到其他人遇到了类似问题，他们在使用AWS时遇到了问题，但我不确定如何在Azure上解决这个问题。我尝试将“az”替换为“abfs”，但然后我会收到以下错误：

An error occurred while calling o1852.load.
: abfs://container/path/to/file.csv has invalid authority.

顺便说一下，我是从Azure Synapse笔记本上运行这些代码的。

英文:

import pandas as pd
import pyspark.pandas as ps

I am trying to use the pyspark pandas api to compare performance between two similar scripts (one using pandas and one using pyspark through the pandas interface). However, I have trouble importing my data in pyspark from our ADLS Gen 2 storage.

When I run the following code it works as expected:

df_pandas = pd.read_csv(f&quot;az://container/path/to/file.csv&quot;,sep=&#39;;&#39;, dtype=str)

However when I run the same using the pyspark pandas api:

df_spark = ps.read_csv(f&quot;az://container/path/to/file.csv&quot;,sep=&#39;;&#39;, dtype=str)

However, when I run this the following error gets thrown:

Py4JJavaError: An error occurred while calling o1840.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme &quot;az&quot;

I have looked online and found others with similar problems using AWS but I'm not sure how to solve it for Azure. I tried replacing az with abfs but I then get the error:

An error occurred while calling o1852.load.
: abfs://container/path/to/file.csv has invalid authority.

I'm running these from Azure Synapse notebooks btw.

答案1

得分: 1

> 从ADLS Gen2读取csv文件。

代码：

import pandas 
df = pandas.read_csv('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<file_path>', storage_options = {'account_key' : 'account_key_value'})

输出：

“az” 方案的文件系统错误，尝试使用 PySpark 从 ADLS Gen2 读取 CSV 时发生。

更多信息请参考链接1和链接2。

英文:

I reproduce same in environment.I got this output.

> Reading csv files from ADLS Gen2.

Code:

import pandas 
df = pandas.read_csv(&#39;abfss://&lt;container_name&gt;@&lt;storage_account_name&gt;.dfs.core.windows.net/&lt;file_path&gt;&#39;, storage_options = {&#39;account_key&#39; : &#39;account_key_value&#39;})

Output:

“az” 方案的文件系统错误，尝试使用 PySpark 从 ADLS Gen2 读取 CSV 时发生。

For more information refer this link1 and link2.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“az” 方案的文件系统错误，尝试使用 PySpark 从 ADLS Gen2 读取 CSV 时发生。

问题

答案1

在Web聊天中进行的测试与应用服务不起作用。

Azure Blob Storage PUT请求没有返回结果代码

Build failing on Azure – The reference assemblies for .NET Framework, Version=v4.7.2 were not found.

如何在VS Code中部署Azure函数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论