"No enum constant org.apache.orc.CompressionKind.ZSTD" When Insert Data to ORC Compress ZSTD Table

huangapple go评论65阅读模式
英文:

"No enum constant org.apache.orc.CompressionKind.ZSTD" When Insert Data to ORC Compress ZSTD Table

问题

我已经在Hive 3.1.3中创建了一个表,如下所示:

创建外部表test_tez_orc_zstd

Id bigint
)存储为orc
Tblproperties(orc.compress=zstd)
位置 '...'

它已经创建好了,然后我想插入一行:

插入到test_tez_orc_zstd
选择1

然后它抛出了以下错误:

没有org.apache.orc.CompressionKind.ZSTD的枚举常量

Hive配置为使用Tez。

如果我对parquet压缩zstd执行相同的操作,它可以正常工作。

我该如何处理这个问题?

英文:

I have created a table in hive 3.1.3 as below;

  1. Create external table test_tez_orc_zstd
  2. (
  3. Id bigint
  4. )stored as orc
  5. Tblproperties(orc.compress=zstd)
  6. Location '...'

It is created, and then I wanted to insert one row;

  1. Insert into test_tez_orc_zstd
  2. Select 1

Then it throwed following error;

  1. No enum constant org.apache.orc.CompressionKind.ZSTD

Hive is configured to use Tez.

If I do same thing for parquet compress zstd it works.

How can I handle this?

答案1

得分: 1

ROOT CAUSE:

Apache Hive版本3.1.3使用orc版本1.5.8,请参见这里1.6.0开始支持orc中的zstd解压缩;https://issues.apache.org/jira/browse/ORC-363。

您可以在这里查看1.5.8的枚举常量,以及在这里查看1.6.0的内容。因此,在这种情况下,我们可以说Hive 3.1.3不支持Tblproperties(orc.compress=zstd)


POSSIBLE SOLUTION:

在Hive中,orc版本已在发布4.0.0-alpha-1中升级到1.6.0以上,详情请参见https://issues.apache.org/jira/browse/HIVE-23553。

这可能具有挑战性,但您可以在发布标签3.1.3之上回溯相关的提交,然后构建项目并替换Hive库中的相关jar文件。

请注意,不仅仅是orc依赖项直接存在于Hive的库中,还包括一些大型jar文件,例如hive-exec

因此,步骤如下:

  1. 克隆hive并检出到发布标签3.1.3
  2. 回溯将orc升级到所需版本的提交。
  3. 构建项目mvn clean package -DskipTests
  4. 在您安装Hive的地方,使用grep查看Hive库中的orc,以查看哪些orc依赖项直接在类路径中,以及哪些大型jar文件包含orc类。
  5. 替换您在前一步中识别出的jar文件。

具有挑战性的部分是orc升级提交可能相当大,并且可能存在冲突。

英文:

ROOT CAUSE:

Apache Hive version 3.1.3 uses orc version 1.5.8, please see here. zstd decompression has been supported in orc starting from 1.6.0; https://issues.apache.org/jira/browse/ORC-363.

You can see 1.5.8 enum constants here and 1.6.0 here. So, in this case we can say that Hive 3.1.3 does not support Tblproperties(orc.compress=zstd).


POSSIBLE SOLUTION:

In Hive, orc version has been upgraded to above 1.6.0 in release 4.0.0-alpha-1 here https://issues.apache.org/jira/browse/HIVE-23553.

This might be challenging, but you can backport related commits on top of release tag 3.1.3, then build the project and replace the related jars in Hive's library.

Please note that not only orc dependencies are in Hive's library directly, but also they are included into some of the fat jars such as hive-exec.

So, steps should be as follows;

  1. Clone hive and checkout to release tag 3.1.3.
  2. Backport the commits that upgrade orc to the desired version.
  3. Build the project mvn clean package -DskipTests.
  4. grep orc in hive library where you installed hive to see which orc dependencies directly in the classpath, and which fat jars have orc classes.
  5. Replace the jars that you identified in the previous step.

The challenging part is that orc upgrade commits can be pretty big, and there might be conflicts.

huangapple
  • 本文由 发表于 2023年6月6日 01:21:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76408708.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定