2023年5月24日 22:07:25go评论73阅读模式

英文:

Data Persistence For Apache Flink SQL Streaming Queries

问题

我想使用Flink SQL 查询流数据。我有一个问题：

我是否可以动态应用SQL查询而无需重新启动Flink？
如果我从Kafka源创建一个表，Flink是否会实际创建该表并永久保存其中的传入数据，还是只会在处理后删除行？

我是Flink的新手，对此的任何帮助都非常感谢。

已经阅读了几篇关于Flink SQL的博客，但没有找到数据是否会在表中持久保存的答案。

英文:

I want to use Flink SQL for querying of streaming data. Question I have is:

Can I apply SQL queries dynamically without having to restart flink?
If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?

Am new to flink and any help on this highly appreciated.

Already visited several blog on Flink SQL but did not get answer to whether the data will be persisted in the table or not.

答案1

得分: 1

我可以动态应用SQL查询而无需重新启动flink吗？
每个查询都将创建一个新的Flink作业。用于流查询的作业将无限运行，除非它们应用于有界流，或者被停止。
您可以拥有一个始终运行（除非发生故障）的Flink会话集群，并使用其资源来运行这些查询/作业。新的查询/作业可以随时添加而无需重新启动该会话集群。

如果我从kafka源创建一个表，flink是否会实际创建表并永久保存表中的传入数据，还是只会在处理完后删除行？
Flink的表本身不具有任何存储功能 - 数据仅在表的后备存储中持久化。
如果您创建一个由kafka主题支持的表，然后查询该表，这不会影响底层kafka主题的保留策略，与存储在该主题中的事件相对应的Row对象仅在处理时存在。

英文:

> Can I apply SQL queries dynamically without having to restart flink?

Each query will create a new Flink job. The jobs for streaming queries will run indefinitely, unless they are applied to bounded streams, or are stopped.

You can have a Flink session cluster than is always running (and never restarting (unless something fails)), and use its resources to run those queries/jobs. New queries/jobs can come and go without restarting that session cluster.

> If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?

Flink's tables don't have any storage of their own -- the data is only persisted in the backing store for the table.
If you create a table that is backed by a kafka topic, and then query that table, that has no effect on the retention policy of the underlying kafka topic, and the Row objects that correspond to the events stored in that topic only exist while they are being processed.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Apache Flink SQL 流查询的数据持久性

问题

答案1

KeyBy Stream(source is Kafka) based on 2 fields where the 2 fields can come in any order but belong to same group

将SQL查询结果写入文件，使用Apache Flink。

在使用Flink Presto 1.14.0库时遇到了Async Task Checkpoint失败的错误。

Flink Statefun引导和状态过期

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。