英文:
Data Persistence For Apache Flink SQL Streaming Queries
问题
我想使用Flink SQL 查询流数据。我有一个问题:
- 我是否可以动态应用SQL查询而无需重新启动Flink?
- 如果我从Kafka源创建一个表,Flink是否会实际创建该表并永久保存其中的传入数据,还是只会在处理后删除行?
我是Flink的新手,对此的任何帮助都非常感谢。
已经阅读了几篇关于Flink SQL的博客,但没有找到数据是否会在表中持久保存的答案。
英文:
I want to use Flink SQL for querying of streaming data. Question I have is:
- Can I apply SQL queries dynamically without having to restart flink?
- If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?
Am new to flink and any help on this highly appreciated.
Already visited several blog on Flink SQL but did not get answer to whether the data will be persisted in the table or not.
答案1
得分: 1
我可以动态应用SQL查询而无需重新启动flink吗?
每个查询都将创建一个新的Flink作业。用于流查询的作业将无限运行,除非它们应用于有界流,或者被停止。
您可以拥有一个始终运行(除非发生故障)的Flink会话集群,并使用其资源来运行这些查询/作业。新的查询/作业可以随时添加而无需重新启动该会话集群。
如果我从kafka源创建一个表,flink是否会实际创建表并永久保存表中的传入数据,还是只会在处理完后删除行?
Flink的表本身不具有任何存储功能 - 数据仅在表的后备存储中持久化。
如果您创建一个由kafka主题支持的表,然后查询该表,这不会影响底层kafka主题的保留策略,与存储在该主题中的事件相对应的Row对象仅在处理时存在。
英文:
> Can I apply SQL queries dynamically without having to restart flink?
Each query will create a new Flink job. The jobs for streaming queries will run indefinitely, unless they are applied to bounded streams, or are stopped.
You can have a Flink session cluster than is always running (and never restarting (unless something fails)), and use its resources to run those queries/jobs. New queries/jobs can come and go without restarting that session cluster.
> If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?
Flink's tables don't have any storage of their own -- the data is only persisted in the backing store for the table.
If you create a table that is backed by a kafka topic, and then query that table, that has no effect on the retention policy of the underlying kafka topic, and the Row objects that correspond to the events stored in that topic only exist while they are being processed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论