PostgreSQL复制命令错误:无效的指数,值’e’,位置187,类型:十进制数

huangapple go评论77阅读模式
英文:

PostgreSQL copy command ERROR: Invalid exponent, Value 'e', Pos 187, Type: Decimal

问题

我有一个存储在AWS S3中的Parquet文件,我正在尝试将其数据复制到Redshift表中。我通过在AWS Glue中对Parquet文件进行抓取来生成表的DDL。

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;

我遇到了以下错误:ERROR:Invalid digit, Value '.', Pos 0, Type: Double。然后,我将数据类型从double更改为numeric,现在我遇到了这个错误:ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal

该表具有文本、numeric和bigint列数据类型,但我不知道哪一列引起了这个错误。我不理解这个错误消息的含义。我会感激一些指导。

英文:

I have a parquet file in AWS S3 and I am trying to copy its data into a Redshift table. I created this table by crawling the parquet file in AWS Glue to generate the table DDL

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;

I was getting the following error: ERROR:Invalid digit, Value '.', Pos 0, Type: Double
Then I changed the data type from double to numeric and now I am getting this error: ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal

The table has text, numeric and bigint column data types, but I do not know what column is causing this error. I don't understand the meaning of this error message. I would appreciate some guidance.

答案1

得分: 1

从Parquet文件复制数据到表格时,您需要指定格式。
FORMAT AS PARQUET

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;

Parquet文件以与.csv文件不同的方式存储数据。

这是当我使用pandas的pd.read_parquet('path/file.parquet', engine='fastparquet')读取Parquet文件时获得的字段值:439.0
而这是我尝试插入表格的相同值:?..7.????.17?.?..v???A????

当我尝试将其插入具有double precision数据类型的列时,显然会出现许多错误。

英文:

When copying data from a parquet file into a table you need to specify the format.
FORMAT AS PARQUET :

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;

Parquet files store data in a different way as .csv files do.

This is a field value I get when I use pandas pd.read_parquet('path/file.parquet', engine='fastparquet') to read the parquet file: 439.0
And this is the same value I was trying to insert into the table: ?..7.????.17?.?..v???A????

I was obiously getting many errors when trying to insert this into a column with datatype double precision.

答案2

得分: 0

我怀疑Parquet文件中的数据以指数格式存储。例如,数字123可以表示为1.23e2(1.23乘以10的2次方)。我认为Redshift不理解这种格式。

如果我是正确的,你可以将该文件复制到Redshift时将该列声明为varchar,然后将其转换为所需的数据类型。

英文:

I suspect that the data in the parquet file is stored in an exponent format. For example the number 123 can be represented as 1.23e2 (1.32 X 10^2). I expect Redshift is not understanding this format.

If I am right you can COPY the file into Redshift with this column as a varchar and then cast it to the desired data type.

huangapple
  • 本文由 发表于 2023年7月11日 07:39:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76657920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定