Duck DB 未实现错误:不支持写入 HTTP 文件

huangapple go评论65阅读模式
英文:

Duck DB Not implemented Error: Writing to HTTP files not implemented

问题

Using DuckDB,我正在尝试将一个数据帧(来自我的VS代码)写入Azure存储帐户中的Parquet文件。我遇到了错误:Not implemented Error: Writing to HTTP files not implemented

但是,在形成数据帧时(我正在为Azure存储帐户的Blob容器中的csv文件数据形成数据帧),在我的VS代码中读取csv文件时运行良好。

azure_storage_path = 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name = 'https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'

conn = duckdb.connect()
conn.execute('install httpfs')
conn.execute('load httpfs')
df = conn.execute("""
    CREATE OR REPLACE TABLE some_table AS
    SELECT *
    FROM '""" + table_name + """'
    LIMIT 10
    """).df()

##错误发生在下面一行##
conn.execute("COPY (FROM some_table) TO '""" + azure_storage_path + """/ParquetFile.parquet' (FORMAT 'parquet')")

我的目标是将csv文件形成Azure容器中的Parquet文件。

英文:

Using duck db, I am trying to write a data frame (from my VS code) into a parquet (in Azure storage accounts). I am getting the error as Not implemented Error: Writing to HTTP files not implemented.

However, while forming the data frame (which I am forming for the data in a csv file kept in the Azure storage accounts blob container), it is working well to read the csv file, in my VS code.

azure_storage_path= 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name='https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'

conn = duckdb.connect()
conn.execute('install httpfs') 
conn.execute('load httpfs')
            df = conn.execute("""
                              CREATE OR REPLACE TABLE some_table AS
	                            SELECT *
	                            FROM '"""+table_name+"""'
	                            LIMIT 10
                             """).df()
##Error occurs in below line##
conn.execute("COPY (FROM some_table) TO '"+azure_storage_path+"/ParquetFile.parquet' (FORMAT 'parquet')")    

My target is to form the csv as a parquet in Azure container

答案1

得分: 1

错误是正确的,HTTP 并不真正擅长抽象文件系统(也不是为此设计的)。

相反,您可以使用fsspec支持(全面披露,我添加了这个支持)

import duckdb
from fsspec import filesystem

# 如果没有安装适当的文件系统接口,这一行将抛出异常
duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))

duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")
英文:

The error is correct, HTTP doesn't really do a good job of abstracting filesystems (nor is it designed to).

Instead, you can use the fsspec support (which, full disclosure, I added)

import duckdb
from fsspec import filesystem

# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))

duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")

huangapple
  • 本文由 发表于 2023年7月18日 14:48:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76710157.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定