Impala:何时刷新表格?

huangapple go评论102阅读模式
英文:

Impala: when refresh tables?

问题

使用Impala时,我注意到在内部表上执行多次截断和插入操作时性能下降。问题是:刷新表可以避免这个问题吗?到目前为止,我只在每次复制文件到HDFS以加载到表中时,才对外部表使用刷新。

非常感谢!
Moreno

英文:

using impala I noticed a deterioration in performance when I perform several times truncate and insert operations in internal tables.
The question is: can refreshing the tables avoid the problem?
So far I have used refresh only for external tables every time I copied files to hdfs to be loaded into the tables themselves.

Many thanks in advance!
Moreno

答案1

得分: 0

You can use compute stats instead of refresh.

Refresh通常用于在添加数据文件或更改表元数据时,比如添加列或分区/更改列等情况。它会快速重新加载元数据。还有另一个相关的命令叫做invalidate metadata,但这比refresh更昂贵,会在下一个查询中强制Impala重新加载表的元数据。

compute stats - 这是在大约30%的数据更改时计算表或列的统计信息。虽然这是一项昂贵的操作,但在频繁进行截断和加载时非常有效。

英文:

You can use compute stats instead of refresh.

Refresh is normally used when you add a data file or change something in table metadata - like add column or partition /change column etc. It quickly reloads the metadata. There is another related command invalidate metadata but this is more expensive than refresh and will force impala to reload metadata when table is called in next query.

compute stats - This is to compute stats of the table or columns when around 30% data changed. Its expensive operation but effective when you do frequent truncate and load.

huangapple
  • 本文由 发表于 2023年2月6日 16:04:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358712.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定