问题

我正在使用Databricks Unity Catalog，并且有一个要求需要上传CSV文件，处理它，然后加载到最终表中。然而，在Databricks中上传文件时，它会将NULL数据转换为字符串'NULL'，这导致了问题。您有任何解决此问题的想法吗？

英文:

I am using Databricks Unity Catalog, and I have a requirement to upload a CSV file, process it, and load it into a final table. However, when uploading the file in Databricks, it converts NULL data to the string 'NULL', which is causing an issue. Do you have any ideas on how I can resolve this problem?

答案1

得分: 1

CSV 文件在定义上没有任何方法来指定 null 值 - 一切都被视为字符串。如果您在 CSV 中有一些占位符值，那么在读取 CSV 数据时，您可以传递 nullValue 参数以指定哪些字符串将被视为 null（参见文档）：

df = spark.read.csv(path, nullValue=&quot;null&quot;)

或者将其指定为选项：

df = spark.read.format(&quot;csv&quot;) \
  .option(&quot;nullValue&quot;, &quot;null&quot;)
  .load(path)

英文:

CSV files by definition doesn't have any way to specify null values - everything is treated as a string. If you have some placeholder value inside your CSV, then you can pass the nullValue parameter when reading the CSV data to specify what strings would be treated as nulls (see doc):

df = spark.read.csv(path, nullValue=&quot;null&quot;)

or specify it as option:

df = spark.read.format(&quot;csv&quot;) \
  .option(&quot;nullValue&quot;, &quot;null&quot;)
  .load(path)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Databricks 将 NULL 视为字符串。

问题

答案1

如何找到 Delta Live Tables 中表的命名空间以进行查询？

Azure Data Factory 可以读取 Delta Lake 格式的数据吗？

使用PySpark中的索引位置或条件，在一个列中提取另一个列中的值。

将给定日期的多个文件夹数据提取到数据框中

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论