英文:
No Data Returned From Delta Table Although Delta Files Exist
问题
我使用Databricks
和sql
创建了一个Delta表格,然后使用以下代码将数据(Device列)导入该表格:
bronze_path = '/mnt/Databricks/bronze/devices/'
df.select('Device').write.format("delta").mode("append").save(bronze_path)
存储底层使用的是Azure Blob Storage
,而Databricks
的运行时版本是12.1。
然而,在查询该表格时,返回了0条记录:
df_read = spark.read.format("delta").load("/mnt/Databricks/bronze/devices/")
display(df_read)
Query returned no results
尽管在存储帐户中查看时,Delta文件已经按预期大小创建:
在这种情况下,出了什么问题,尤其是为什么没有返回错误消息?为什么无法检索数据?
英文:
I created a delta table in Databricks
using sql
as:
%sql
create table nx_bronze_raw
(
`Device` string
)
USING DELTA LOCATION '/mnt/Databricks/bronze/devices/';
Then I ingest data (device column) into this table using:
bronze_path = '/mnt/Databricks/bronze/devices/'
df.select('Device').write.format("delta").mode("append").save(bronze_path)
The underlying storage is Azure Blob Storage
, and the Databricks
runtime is 12.1
The problem is when I query this table it returns 0 records:
df_read = spark.read.format("delta").load("/mnt/Databricks/bronze/devices/")
display(df_read )
Query returned no results
Although, when I look inside the storage account, the delta files are created with the expected size:
What went wrong in this scenario, especially no error is returned ? and why can't I retrieve the data ?
答案1
得分: 0
以下是获取空结果的可能原因:
- 如果您的数据集为空。
- 如果在写入和读取表之间截断表格。
在之前:
之后:
在这种情况下,您可以使用describe
命令查看表的历史并检索特定版本的数据。
在这里,在表格被截断之前选择版本。
df_read = spark.read.format("delta").option("versionAsOf", 3).load("/mnt/Databricks/bronze/devices2/")
display(df_read)
- 有可能数据已写入Delta文件但尚未刷新到表格中。为确保所有更改都可见,您可以尝试运行
OPTIMIZE
命令。
代码:
%sql
optimize raw;
英文:
Following are the possible reasons for getting empty results.
- If you are having empty dataset.
- If you truncate the table in between writing and reading the table.
Before:
After:
In this case, you can describe
history of table and retrieve data of specific version.
Here, selecting the version before it is truncated.
df_read = spark.read.format("delta").option("versionAsOf",3).load("/mnt/Databricks/bronze/devices2/")
display(df_read )
- There is a chance that, the data has been written to the Delta files but hasn't been flushed to the table yet. To ensure that all changes are visible, you can try running
OPTIMIZE
code:
%sql
optimize raw;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论