英文:
Azure Synapse pyspark translates STRING datatype into varchar(8000) for external table
问题
我尝试在Azure Synapse中使用PySpark笔记本加载外部表,但数据类型似乎不匹配。当我初始化表格时,我执行以下操作(简化示例):
CREATE OR REPLACE TABLE LANDING_DB.Opportunity (
AccountId varchar(255),
Id varchar(255),
Name STRING,
Description STRING
)
USING DELTA LOCATION 'abfss://XXX@XXX.dfs.core.windows.net/Landing/ABC/Opportunity/'
通过
spark.sql()
然而,外部表格使用'varchar(8000)'作为Name和Description列的数据类型。我想要使用varchar(max),特别是Description列可能包含大量文本。
我是不是漏掉了一些设置,还是这是一个错误?
数据库是Synapse实例上的无服务器池。
在此提前感谢您的帮助。
最好,Sven
英文:
I try to load an external table in Azure Synpase using a PySpark notebook but the datatypes seem to mismatch. When I initialize the table I execute (stripped down example):
CREATE OR REPLACE TABLE LANDING_DB.Opportunity (
AccountId varchar(255),
Id varchar(255),
Name STRING,
Description STRING
)
USING DELTA LOCATION 'abfss://XXX@XXX.dfs.core.windows.net/Landing/ABC/Opportunity/'
via
spark.sql()
Yet the external table uses 'varchar(8000)' as datatype for the Name and Description column. I would like to use varchar(max) as especially the Description column can have a lot of text.
Am I missing some settings or is this a bug?
The database is the serverless pool on the synapse instance.
Help would be appreciated, thank you in advance
Best, Sven
答案1
得分: 1
首先注意提示:尽量将VARCHAR()尽量缩短。如果可能的话,避免使用VARCHAR(MAX),因为它可能会影响性能。
似乎它会自动将STRING转换为varchar(8000),因为它会推断长度。请参阅https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md#check-inferred-data-types
您可以使用varchar(x)而不是使用STRING,然后设置一个更高的数字。也许您知道Description字段的最大可能长度是多少?varchar()接受一个整数作为参数,整数的最大值是2,147,483,647。我不认为您的描述会那么长?所以最好设置一个更低的数字,这也不会被达到。
英文:
First of all note the tip: Try to make VARCHAR() as short as possible. Avoid VARCHAR(MAX) if possible because it can impair performance.
It seems that it makes STRING automatically to varchar(8000) because it infers the length. See https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/best-practices-serverless-sql-pool.md#check-inferred-data-types
Instead of using STRING you can use varchar(x) and then put a higher number. Maybe you know what the maximum could be in the Description? varchar() takes an integer as the argument, and the max value of an integer is 2,147,483,647. I don't think your description can get that long? So better put a lower number which will also not be reached.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论