从我的边缘节点推送Kafka消息的最佳方法是什么?

huangapple go评论80阅读模式
英文:

What's the best way to push kafka messages from my edge nodes?

问题

我在主要地区(美国东部)有一个工作人员,负责计算我们边缘位置的流量数据。我想将边缘地区的数据推送到我们的主要kafka地区。

一个例子是波兰、澳大利亚、美国西部。我想将所有这些统计数据推送到美国东部。我不希望在从边缘地区向主要地区写入数据时增加额外的延迟。

另一个选择是创建另一个kafka集群和工作人员作为中继。这将要求我们在每个地区维护单独的集群,并给我们的部署增加更多的复杂性。

我见过Mirror Maker,但我不想镜像任何东西,我想找一个更像是中继系统的解决方案。如果这不是设计上的解决方案,那么如何将我们所有的应用程序指标汇总到主要地区进行计算和排序呢?

谢谢你的时间。

英文:

I have a worker in the primary region (US-East) that computes data on traffic at our edge locations. I want to push the data from an edge region to our primary kafka region.

An example is Poland, Australia, US-West. I want to push all these stats to US-East. I don't want to encurr additional latency during the writes from the edge regions to the primary.

Another option is to create another kafka cluster and worker that acts as a relay. That would require us to maintain individual clusters in each region and would add a lot more complexity to our deployments.

I've seen Mirror Maker, but I don't really want to Mirror anything, I guess I'm looking more for a relay system. If this isn't the designed way to do this, how can I aggregate all of our application metrics to the primary region to be computed and sorted?

Thank you for your time.

答案1

得分: 1

据我所知,以下是您的选择:

  1. 在每个地区设置本地 Kafka 集群,并让边缘节点将数据写入其本地 Kafka 集群,以实现低延迟写入。然后,您可以设置一个镜像制造者(mirror maker),从本地 Kafka 拉取数据到远程 Kafka 进行聚合。
  2. 如果您担心使用高延迟阻塞请求中断应用程序的请求路径,那么您可能希望将生产者配置为异步(非阻塞)地将数据写入远程 Kafka 集群。根据您选择的编程语言,这可能是一个简单或复杂的过程。
  3. 运行一个每个主机的中继(或数据缓冲)服务,可以简单地作为一个日志文件和守护进程,将数据推送到远程 Kafka 集群(如上所述)。或者,运行一个单个实例的 Kafka / Zookeeper 容器(有捆绑在一起的 Docker 镜像),用于缓冲下游拉取的数据。

选项1无疑是解决这个问题的最常见方法,尽管有点过于复杂。我猜测未来 Confluent / Kafka 的开发人员将推出更多工具来支持选项3。

英文:

As far as I know, here are your options:

  1. Setup a local Kafka cluster in each region and have your edge nodes
    write to the their local Kafka cluster for low latency writes. From
    there, you would setup a mirror maker that pulls data from your local Kafka to your remote Kafka for aggregation.
  2. If you're concerned with interrupting your applications request path with high latent blocking requests, then you may want to configure your producers to write asynchronously (non-blocking) to your remote Kafka cluster. Depending on your programming language choice, this could be simple or complex exercise.
  3. Run a per host relay (or data buffer) service that could be as simple as a log file and daemon that pushes to your remote Kafka cluster (as mentioned above). Alternatively, run a single instance Kafka / Zookeeper container (there are docker images that bundle both together) that buffers the data for downstream pulling.

Option 1. is definitely the most standard solution to this problem, albeit a bit heavy handed. I suspect there will be more tooling coming out Confluent / Kafka folks to support option 3. in the future.

答案2

得分: 0

将消息写入本地磁盘上的日志文件。编写一个小型守护进程,读取日志文件并将事件推送到主要的Kafka守护进程。

为了增加吞吐量并限制延迟的影响,您还可以每分钟轮换一次日志文件。然后,使用cron作业将日志文件每分钟同步到您的主要Kafka区域。让导入守护进程在那里运行。

英文:

Write the messages to a local logfile on disk. Write a small daemon which reads the logfile and pushes the events to the main kafka daemon.

To increase througput and limit the effect of latency you could also rotate the logfile every minute. Then rsync the logfile with a cronjob to your main kafka region minutely. Let the import daemon run there.

huangapple
  • 本文由 发表于 2016年11月3日 01:53:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/40386597.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定