在FlinkSQL中读取Parquet文件而不使用Hadoop?

huangapple go评论58阅读模式
英文:

Reading Parquet files in FlinkSQL without Hadoop?

问题

  1. 从这里下载jar文件:https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/formats/parquet/,确保它与我所拥有的Flink版本相同,将其放入flink/lib/目录。
  2. 使用./flink/bin/start-cluster.sh启动Flink集群。使用./flink/bin/sql-client.sh启动SQL客户端。
  3. 加载jar文件:add jar '/home/ubuntu/flink/lib/flink-sql-parquet-1.16.0.jar';
  4. 尝试创建具有Parquet格式的表格:create TABLE test2 (order_time TIMESTAMP(3), product STRING, feature INT, WATERMARK FOR order_time AS order_time) WITH ('connector'='filesystem','path'='/home/ubuntu/test.parquet','format'='parquet');
  5. 从test2中选择count(*);
  6. 得到:java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration

可以有人帮助我在FlinkSQL中读取Parquet文件吗?

英文:

Trying to read Parquet files in FlinkSQL.

  1. Download the jar file from here: https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/formats/parquet/, made sure it's the same version as the Flink I have, put it in flink/lib/.
  2. Start the flink cluster using ./flink/bin/start-cluster.sh. Start sql client using ./flink/bin/sql-client.sh
  3. Load the jar fiile: add jar '/home/ubuntu/flink/lib/flink-sql-parquet-1.16.0.jar';
  4. Try to create table with parquet format: create TABLE test2 (order_time TIMESTAMP(3), product STRING, feature INT, WATERMARK FOR order_time AS order_time) WITH ('connector'='filesystem','path'='/home/ubuntu/test.parquet','format'='parquet');
  5. select count(*) from test2;
  6. gets: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration

Can somebody please help me read Parquet files in FlinkSQL please?

答案1

得分: 0

根据 https://issues.apache.org/jira/browse/PARQUET-1126 中所述,Parquet 仍然需要 Hadoop。您需要按照 https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies 中的说明,将 Hadoop 依赖项添加到 Flink 中。

英文:

As outlined in https://issues.apache.org/jira/browse/PARQUET-1126 Parquet still requires Hadoop. You will need to add the Hadoop dependencies to Flink as outlined in https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies

huangapple
  • 本文由 发表于 2023年2月19日 11:35:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75497819.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定