英文:
Impala: when refresh tables?
问题
使用Impala时,我注意到在内部表上执行多次截断和插入操作时性能下降。问题是:刷新表可以避免这个问题吗?到目前为止,我只在每次复制文件到HDFS以加载到表中时,才对外部表使用刷新。
非常感谢!
Moreno
英文:
using impala I noticed a deterioration in performance when I perform several times truncate and insert operations in internal tables.
The question is: can refreshing the tables avoid the problem?
So far I have used refresh only for external tables every time I copied files to hdfs to be loaded into the tables themselves.
Many thanks in advance!
Moreno
答案1
得分: 0
You can use compute stats
instead of refresh
.
Refresh
通常用于在添加数据文件或更改表元数据时,比如添加列或分区/更改列等情况。它会快速重新加载元数据。还有另一个相关的命令叫做invalidate metadata
,但这比refresh
更昂贵,会在下一个查询中强制Impala重新加载表的元数据。
compute stats
- 这是在大约30%的数据更改时计算表或列的统计信息。虽然这是一项昂贵的操作,但在频繁进行截断和加载时非常有效。
英文:
You can use compute stats
instead of refresh
.
Refresh
is normally used when you add a data file or change something in table metadata - like add column or partition /change column etc. It quickly reloads the metadata. There is another related command invalidate metadata
but this is more expensive than refresh and will force impala to reload metadata when table is called in next query.
compute stats
- This is to compute stats of the table or columns when around 30% data changed. Its expensive operation but effective when you do frequent truncate and load.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论