英文:
PostgreSQL copy command ERROR: Invalid exponent, Value 'e', Pos 187, Type: Decimal
问题
我有一个存储在AWS S3中的Parquet文件,我正在尝试将其数据复制到Redshift表中。我通过在AWS Glue中对Parquet文件进行抓取来生成表的DDL。
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;
我遇到了以下错误:ERROR:Invalid digit, Value '.', Pos 0, Type: Double
。然后,我将数据类型从double更改为numeric,现在我遇到了这个错误:ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal
。
该表具有文本、numeric和bigint列数据类型,但我不知道哪一列引起了这个错误。我不理解这个错误消息的含义。我会感激一些指导。
英文:
I have a parquet file in AWS S3 and I am trying to copy its data into a Redshift table. I created this table by crawling the parquet file in AWS Glue to generate the table DDL
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;
I was getting the following error: ERROR:Invalid digit, Value '.', Pos 0, Type: Double
Then I changed the data type from double to numeric and now I am getting this error: ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal
The table has text, numeric and bigint column data types, but I do not know what column is causing this error. I don't understand the meaning of this error message. I would appreciate some guidance.
答案1
得分: 1
从Parquet文件复制数据到表格时,您需要指定格式。
FORMAT AS PARQUET
:
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;
Parquet文件以与.csv文件不同的方式存储数据。
这是当我使用pandas的pd.read_parquet('path/file.parquet', engine='fastparquet')
读取Parquet文件时获得的字段值:439.0
而这是我尝试插入表格的相同值:?..7.????.17?.?..v???A????
当我尝试将其插入具有double precision数据类型的列时,显然会出现许多错误。
英文:
When copying data from a parquet file into a table you need to specify the format.
FORMAT AS PARQUET
:
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;
Parquet files store data in a different way as .csv files do.
This is a field value I get when I use pandas pd.read_parquet('path/file.parquet', engine='fastparquet')
to read the parquet file: 439.0
And this is the same value I was trying to insert into the table: ?..7.????.17?.?..v???A????
I was obiously getting many errors when trying to insert this into a column with datatype double precision.
答案2
得分: 0
我怀疑Parquet文件中的数据以指数格式存储。例如,数字123可以表示为1.23e2(1.23乘以10的2次方)。我认为Redshift不理解这种格式。
如果我是正确的,你可以将该文件复制到Redshift时将该列声明为varchar,然后将其转换为所需的数据类型。
英文:
I suspect that the data in the parquet file is stored in an exponent format. For example the number 123 can be represented as 1.23e2 (1.32 X 10^2). I expect Redshift is not understanding this format.
If I am right you can COPY the file into Redshift with this column as a varchar and then cast it to the desired data type.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论