英文:
Create a parquet file from CSV represented as string using duckdb
问题
以下是您要翻译的内容:
import io
buffer = io.BytesIO()
csv_data = 'col1,col2\n1,2\n3,4`
我想知道如何使用 duckdb(https://duckdb.org/docs/data/parquet/overview.html)将 parquet 文件写入内存中的 buffer
,其中文件将包含来自 csv_data
变量的列/行数据。
我正在使用 duckdb 版本 0.7.1
(但我不一定要使用这个版本)。
编辑
建议尝试以下方法:
import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
但出现以下错误:
In [1]: import duckdb
In [2]: from io import BytesIO
...:
In [3]: csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
...:
In [4]: duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
TypeError: read_csv(): incompatible function arguments. The following argument types are supported:
1. (name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) -> duckdb.DuckDBPyRelation
Invoked with: <_io.BytesIO object at 0x7f21ed64d620>; kwargs: header=True
希望这可以帮助您解决问题。如果您需要更多帮助,请告诉我。
英文:
Given the following:
import io
buffer = io.BytesIO()
csv_data = 'col1,col2\n1,2\n3,4`
I want to know how I can use duckdb ( https://duckdb.org/docs/data/parquet/overview.html ) to write a parquet file to the buffer
in memory, where file will contain the column/row data from the csv_data
variable.
I'm using duckdb version 0.7.1
(I'm not fixed to this version though).
edit
Suggested to try the following:
import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
Which failed with:
In [1]: import duckdb
In [2]: from io import BytesIO
...:
In [3]: csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
...:
In [4]: duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
TypeError: read_csv(): incompatible function arguments. The following argument types are supported:
1. (name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) -> duckdb.DuckDBPyRelation
Invoked with: <_io.BytesIO object at 0x7f21ed64d620>; kwargs: header=True
答案1
得分: 1
你可以使用 read_csv
读取它,然后使用 write_parquet
将其写入 Parquet 格式。
import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
注意 - 这在版本 0.7.1
上不起作用,但在版本 0.8.0
上起作用。
英文:
You can read it with read_csv
and write it to parquet with write_parquet
import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
Note - this does not work on version 0.7.1
, but does work on 0.8.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论