将CSV字符串表示转换为Parquet文件,使用DuckDB。

huangapple go评论56阅读模式
英文:

Create a parquet file from CSV represented as string using duckdb

问题

以下是您要翻译的内容:

import io
buffer = io.BytesIO()
csv_data = 'col1,col2\n1,2\n3,4`

我想知道如何使用 duckdb(https://duckdb.org/docs/data/parquet/overview.html)将 parquet 文件写入内存中的 buffer,其中文件将包含来自 csv_data 变量的列/行数据。

我正在使用 duckdb 版本 0.7.1(但我不一定要使用这个版本)。

编辑

建议尝试以下方法:

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

但出现以下错误:


In [1]: import duckdb

In [2]: from io import BytesIO
   ...:

In [3]: csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
   ...:

In [4]: duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

TypeError: read_csv(): incompatible function arguments. The following argument types are supported:
    1. (name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) -> duckdb.DuckDBPyRelation

Invoked with: <_io.BytesIO object at 0x7f21ed64d620>; kwargs: header=True

希望这可以帮助您解决问题。如果您需要更多帮助,请告诉我。

英文:

Given the following:

import io
buffer = io.BytesIO()
csv_data = 'col1,col2\n1,2\n3,4`

I want to know how I can use duckdb ( https://duckdb.org/docs/data/parquet/overview.html ) to write a parquet file to the buffer in memory, where file will contain the column/row data from the csv_data variable.

I'm using duckdb version 0.7.1 (I'm not fixed to this version though).

edit

Suggested to try the following:

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

Which failed with:


In [1]: import duckdb

In [2]: from io import BytesIO
   ...:

In [3]: csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
   ...:

In [4]: duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

TypeError: read_csv(): incompatible function arguments. The following argument types are supported:
    1. (name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) -> duckdb.DuckDBPyRelation

Invoked with: <_io.BytesIO object at 0x7f21ed64d620>; kwargs: header=True

答案1

得分: 1

你可以使用 read_csv 读取它,然后使用 write_parquet 将其写入 Parquet 格式。

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

注意 - 这在版本 0.7.1 上不起作用,但在版本 0.8.0 上起作用。

英文:

You can read it with read_csv and write it to parquet with write_parquet

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

Note - this does not work on version 0.7.1, but does work on 0.8.0

huangapple
  • 本文由 发表于 2023年5月11日 06:58:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定