英文:
Reading Parquet files in FlinkSQL without Hadoop?
问题
- 从这里下载jar文件:https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/formats/parquet/,确保它与我所拥有的Flink版本相同,将其放入flink/lib/目录。
- 使用./flink/bin/start-cluster.sh启动Flink集群。使用./flink/bin/sql-client.sh启动SQL客户端。
- 加载jar文件:add jar '/home/ubuntu/flink/lib/flink-sql-parquet-1.16.0.jar';
- 尝试创建具有Parquet格式的表格:
create TABLE test2 (order_time TIMESTAMP(3), product STRING, feature INT, WATERMARK FOR order_time AS order_time) WITH ('connector'='filesystem','path'='/home/ubuntu/test.parquet','format'='parquet');
- 从test2中选择count(*);
- 得到:java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
可以有人帮助我在FlinkSQL中读取Parquet文件吗?
英文:
Trying to read Parquet files in FlinkSQL.
- Download the jar file from here: https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/formats/parquet/, made sure it's the same version as the Flink I have, put it in flink/lib/.
- Start the flink cluster using ./flink/bin/start-cluster.sh. Start sql client using ./flink/bin/sql-client.sh
- Load the jar fiile: add jar '/home/ubuntu/flink/lib/flink-sql-parquet-1.16.0.jar';
- Try to create table with parquet format:
create TABLE test2 (order_time TIMESTAMP(3), product STRING, feature INT, WATERMARK FOR order_time AS order_time) WITH ('connector'='filesystem','path'='/home/ubuntu/test.parquet','format'='parquet');
- select count(*) from test2;
- gets: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
Can somebody please help me read Parquet files in FlinkSQL please?
答案1
得分: 0
根据 https://issues.apache.org/jira/browse/PARQUET-1126 中所述,Parquet 仍然需要 Hadoop。您需要按照 https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies 中的说明,将 Hadoop 依赖项添加到 Flink 中。
英文:
As outlined in https://issues.apache.org/jira/browse/PARQUET-1126 Parquet still requires Hadoop. You will need to add the Hadoop dependencies to Flink as outlined in https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论