2023年7月11日 07:39:41go评论148阅读模式

英文:

PostgreSQL copy command ERROR: Invalid exponent, Value 'e', Pos 187, Type: Decimal

问题

我有一个存储在AWS S3中的Parquet文件，我正在尝试将其数据复制到Redshift表中。我通过在AWS Glue中对Parquet文件进行抓取来生成表的DDL。

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;

我遇到了以下错误：ERROR:Invalid digit, Value '.', Pos 0, Type: Double。然后，我将数据类型从double更改为numeric，现在我遇到了这个错误：ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal。

该表具有文本、numeric和bigint列数据类型，但我不知道哪一列引起了这个错误。我不理解这个错误消息的含义。我会感激一些指导。

英文:

I have a parquet file in AWS S3 and I am trying to copy its data into a Redshift table. I created this table by crawling the parquet file in AWS Glue to generate the table DDL

COPY table_name FROM &#39;s3://bucket/folder/file_name.parquet&#39;
credentials &#39;aws_iam_role=...&#39;
NULL AS &#39;NULL&#39;
EMPTYASNULL
delimiter &#39;,&#39;
region &#39;region_name&#39;
IGNOREHEADER 1;

I was getting the following error: ERROR:Invalid digit, Value '.', Pos 0, Type: Double
Then I changed the data type from double to numeric and now I am getting this error: ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal

The table has text, numeric and bigint column data types, but I do not know what column is causing this error. I don't understand the meaning of this error message. I would appreciate some guidance.

答案1

得分: 1

从Parquet文件复制数据到表格时，您需要指定格式。
FORMAT AS PARQUET：

COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;

Parquet文件以与.csv文件不同的方式存储数据。

这是当我使用pandas的pd.read_parquet('path/file.parquet', engine='fastparquet')读取Parquet文件时获得的字段值：439.0
而这是我尝试插入表格的相同值：?..7.????.17?.?..v???A????

当我尝试将其插入具有double precision数据类型的列时，显然会出现许多错误。

英文:

When copying data from a parquet file into a table you need to specify the format.
FORMAT AS PARQUET :

COPY table_name FROM &#39;s3://bucket/folder/file_name.parquet&#39;
credentials &#39;aws_iam_role=...&#39;
FORMAT AS PARQUET;

Parquet files store data in a different way as .csv files do.

This is a field value I get when I use pandas pd.read_parquet('path/file.parquet', engine='fastparquet') to read the parquet file: 439.0
And this is the same value I was trying to insert into the table: ?..7.????.17?.?..v???A????

I was obiously getting many errors when trying to insert this into a column with datatype double precision.

答案2

得分: 0

我怀疑Parquet文件中的数据以指数格式存储。例如，数字123可以表示为1.23e2（1.23乘以10的2次方）。我认为Redshift不理解这种格式。

如果我是正确的，你可以将该文件复制到Redshift时将该列声明为varchar，然后将其转换为所需的数据类型。

英文:

I suspect that the data in the parquet file is stored in an exponent format. For example the number 123 can be represented as 1.23e2 (1.32 X 10^2). I expect Redshift is not understanding this format.

If I am right you can COPY the file into Redshift with this column as a varchar and then cast it to the desired data type.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PostgreSQL复制命令错误：无效的指数，值’e’，位置187，类型：十进制数

问题

答案1

答案2

使用CLI将文件上传到AWS S3，而不是使用本地文件，而是提供一个URL。

Upload file to Amazon S3 without SDK: 将文件上传到Amazon S3而无需使用SDK。

Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory

AWS S3预签名URL策略

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论