2023年6月5日 01:12:46go评论74阅读模式

英文:

Empty Column not being listed in S3 select in databricks

问题

我正在查询一个具有多列的S3中的JSON文件：

SELECT a, b, c FROM json.`s3://my-bucket/file.json.gz`

文件的内容如下：

{a: {}, b: 0, c: 1}
{a: {}, b: 1, c: 2}
{a: {}, b: 2, c: 3}

上面的查询失败并返回以下错误消息：

[UNRESOLVED_COLUMN.WITH_SUGGESTION] 无法解析具有名称 `a` 的列或函数参数。您是否想指定以下之一？ [`b`, `c`]

当我执行以下查询时：

SELECT * FROM json.`s3://my-bucket/file.json.gz`

我只获得了列b和c。

是否有一种方法可以获取列a，并且还可以看到它是一个空的JSON？

英文:

I'm querying a JSON file in S3 with multiple columns:

SELECT a, b, c FROM json.`s3://my-bucket/file.json.gz`

And the file looks like this:

{a: {}, b: 0, c: 1}
{a: {}, b: 1, c: 2}
{a: {}, b: 2, c: 3}

The query above fails and returns

UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`b`, `c`]

And when I perform

SELECT * FROM json.`s3://my-bucket/file.json.gz`

I get only the columns b and c.

Is there a way where I can also get column a, and also see that it is an empty JSON?

答案1

得分: 2

你可以使用Python或Scala语法吗？
在读取JSON文件时，你需要对其施加模式，据我所知，单凭SQL查询是无法做到的。
使用Python语法的解决方案如下：

from pyspark.sql.types import *

# 不确定列a的数据类型应该是什么，因此请应用正确的数据类型。

schema = StructType([
  StructField('a', StringType(), True),
  StructField('b', IntegerType(), True),
  StructField('c', IntegerType(), True),  
])

df = spark.read.schema(schema).json('s3://my-bucket/file.json.gz')

英文:

Can you use Python or Scala syntax?
You need to impose schema on the json file during reading the json files, and as far as I know it's not possible through SQL queries alone.
The solution using Python syntax would look like this:

from pyspark.sql.types import *

# Not sure what the data type for column a is supposed be, so apply the correct data type.

schema = StructType([
  StructField(&#39;a&#39;, StringType(), True),
  StructField(&#39;b&#39;, IntegerType(), True),
  StructField(&#39;c&#39;, IntegerType(), True),  
])

df = spark.read.schema(schema).json(&#39;s3://my-bucket/file.json.gz&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

空列在Databricks的S3选择中未列出

问题

答案1

返回具有每列多个条目的条目。

这是AND和OR运算符与NULL的真值表吗？

Firebird Unicode SQL搜索在某些字符上的LIKE和=操作行为不同。

SQL查询中的ORDER BY和LIMIT花费了很多时间。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论