Azure Synapse pyspark将STRING数据类型翻译为外部表的varchar(8000)。

huangapple go评论79阅读模式
英文:

Azure Synapse pyspark translates STRING datatype into varchar(8000) for external table

问题

我尝试在Azure Synapse中使用PySpark笔记本加载外部表,但数据类型似乎不匹配。当我初始化表格时,我执行以下操作(简化示例):

CREATE OR REPLACE TABLE LANDING_DB.Opportunity (
 AccountId varchar(255),
 Id varchar(255),
 Name STRING,
 Description STRING
)
USING DELTA LOCATION 'abfss://XXX@XXX.dfs.core.windows.net/Landing/ABC/Opportunity/'

通过

spark.sql()

然而,外部表格使用'varchar(8000)'作为Name和Description列的数据类型。我想要使用varchar(max),特别是Description列可能包含大量文本。

我是不是漏掉了一些设置,还是这是一个错误?

数据库是Synapse实例上的无服务器池。

在此提前感谢您的帮助。

最好,Sven

英文:

I try to load an external table in Azure Synpase using a PySpark notebook but the datatypes seem to mismatch. When I initialize the table I execute (stripped down example):

CREATE OR REPLACE TABLE LANDING_DB.Opportunity (
 AccountId varchar(255),
 Id varchar(255),
 Name STRING,
 Description STRING
)
USING DELTA LOCATION 'abfss://XXX@XXX.dfs.core.windows.net/Landing/ABC/Opportunity/'

via

spark.sql()

Yet the external table uses 'varchar(8000)' as datatype for the Name and Description column. I would like to use varchar(max) as especially the Description column can have a lot of text.

Am I missing some settings or is this a bug?

The database is the serverless pool on the synapse instance.

Help would be appreciated, thank you in advance

Best, Sven

答案1

得分: 1

首先注意提示:尽量将VARCHAR()尽量缩短。如果可能的话,避免使用VARCHAR(MAX),因为它可能会影响性能。

似乎它会自动将STRING转换为varchar(8000),因为它会推断长度。请参阅https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md#check-inferred-data-types

您可以使用varchar(x)而不是使用STRING,然后设置一个更高的数字。也许您知道Description字段的最大可能长度是多少?varchar()接受一个整数作为参数,整数的最大值是2,147,483,647。我不认为您的描述会那么长?所以最好设置一个更低的数字,这也不会被达到。

英文:

First of all note the tip: Try to make VARCHAR() as short as possible. Avoid VARCHAR(MAX) if possible because it can impair performance.

It seems that it makes STRING automatically to varchar(8000) because it infers the length. See https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md#check-inferred-data-types

Instead of using STRING you can use varchar(x) and then put a higher number. Maybe you know what the maximum could be in the Description? varchar() takes an integer as the argument, and the max value of an integer is 2,147,483,647. I don't think your description can get that long? So better put a lower number which will also not be reached.

huangapple
  • 本文由 发表于 2023年6月19日 19:16:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506104.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定