英文:
Databricks is treating NULL as string
问题
我正在使用Databricks Unity Catalog,并且有一个要求需要上传CSV文件,处理它,然后加载到最终表中。然而,在Databricks中上传文件时,它会将NULL数据转换为字符串'NULL',这导致了问题。您有任何解决此问题的想法吗?
英文:
I am using Databricks Unity Catalog, and I have a requirement to upload a CSV file, process it, and load it into a final table. However, when uploading the file in Databricks, it converts NULL data to the string 'NULL', which is causing an issue. Do you have any ideas on how I can resolve this problem?
答案1
得分: 1
CSV 文件在定义上没有任何方法来指定 null
值 - 一切都被视为字符串。如果您在 CSV 中有一些占位符值,那么在读取 CSV 数据时,您可以传递 nullValue
参数以指定哪些字符串将被视为 null(参见 文档):
df = spark.read.csv(path, nullValue="null")
或者将其指定为选项:
df = spark.read.format("csv") \
.option("nullValue", "null")
.load(path)
英文:
CSV files by definition doesn't have any way to specify null
values - everything is treated as a string. If you have some placeholder value inside your CSV, then you can pass the nullValue
parameter when reading the CSV data to specify what strings would be treated as nulls (see doc):
df = spark.read.csv(path, nullValue="null")
or specify it as option:
df = spark.read.format("csv") \
.option("nullValue", "null")
.load(path)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论