英文:
Duck DB Not implemented Error: Writing to HTTP files not implemented
问题
Using DuckDB,我正在尝试将一个数据帧(来自我的VS代码)写入Azure存储帐户中的Parquet文件。我遇到了错误:Not implemented Error: Writing to HTTP files not implemented。
但是,在形成数据帧时(我正在为Azure存储帐户的Blob容器中的csv文件数据形成数据帧),在我的VS代码中读取csv文件时运行良好。
azure_storage_path = 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name = 'https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'
conn = duckdb.connect()
conn.execute('install httpfs')
conn.execute('load httpfs')
df = conn.execute("""
CREATE OR REPLACE TABLE some_table AS
SELECT *
FROM '""" + table_name + """'
LIMIT 10
""").df()
##错误发生在下面一行##
conn.execute("COPY (FROM some_table) TO '""" + azure_storage_path + """/ParquetFile.parquet' (FORMAT 'parquet')")
我的目标是将csv文件形成Azure容器中的Parquet文件。
英文:
Using duck db, I am trying to write a data frame (from my VS code) into a parquet (in Azure storage accounts). I am getting the error as Not implemented Error: Writing to HTTP files not implemented.
However, while forming the data frame (which I am forming for the data in a csv file kept in the Azure storage accounts blob container), it is working well to read the csv file, in my VS code.
azure_storage_path= 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name='https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'
conn = duckdb.connect()
conn.execute('install httpfs')
conn.execute('load httpfs')
df = conn.execute("""
CREATE OR REPLACE TABLE some_table AS
SELECT *
FROM '"""+table_name+"""'
LIMIT 10
""").df()
##Error occurs in below line##
conn.execute("COPY (FROM some_table) TO '"+azure_storage_path+"/ParquetFile.parquet' (FORMAT 'parquet')")
My target is to form the csv as a parquet in Azure container
答案1
得分: 1
错误是正确的,HTTP 并不真正擅长抽象文件系统(也不是为此设计的)。
相反,您可以使用fsspec支持(全面披露,我添加了这个支持)
import duckdb
from fsspec import filesystem
# 如果没有安装适当的文件系统接口,这一行将抛出异常
duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))
duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")
英文:
The error is correct, HTTP doesn't really do a good job of abstracting filesystems (nor is it designed to).
Instead, you can use the fsspec support (which, full disclosure, I added)
import duckdb
from fsspec import filesystem
# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))
duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论