Databricks 将 NULL 视为字符串。

huangapple go评论59阅读模式
英文:

Databricks is treating NULL as string

问题

我正在使用Databricks Unity Catalog,并且有一个要求需要上传CSV文件,处理它,然后加载到最终表中。然而,在Databricks中上传文件时,它会将NULL数据转换为字符串'NULL',这导致了问题。您有任何解决此问题的想法吗?

英文:

I am using Databricks Unity Catalog, and I have a requirement to upload a CSV file, process it, and load it into a final table. However, when uploading the file in Databricks, it converts NULL data to the string 'NULL', which is causing an issue. Do you have any ideas on how I can resolve this problem?

答案1

得分: 1

CSV 文件在定义上没有任何方法来指定 null 值 - 一切都被视为字符串。如果您在 CSV 中有一些占位符值,那么在读取 CSV 数据时,您可以传递 nullValue 参数以指定哪些字符串将被视为 null(参见 文档):

df = spark.read.csv(path, nullValue="null")

或者将其指定为选项:

df = spark.read.format("csv") \
  .option("nullValue", "null")
  .load(path)
英文:

CSV files by definition doesn't have any way to specify null values - everything is treated as a string. If you have some placeholder value inside your CSV, then you can pass the nullValue parameter when reading the CSV data to specify what strings would be treated as nulls (see doc):

df = spark.read.csv(path, nullValue="null")

or specify it as option:

df = spark.read.format("csv") \
  .option("nullValue", "null")
  .load(path)

huangapple
  • 本文由 发表于 2023年7月3日 21:14:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定