2022年1月2日 00:03:17go评论91阅读模式

英文:

Performance of Nats Jetstream

问题

我正在尝试理解Nats Jetstream的扩展性，并有几个问题。

按主题订阅历史消息的效率如何？例如，假设有一个流foo，其中包含1亿条主题为foo.bar的消息，然后有一条主题为foo.baz的单独消息。如果我从流的开头订阅foo.baz，服务器上的某个组件是否需要对foo中的所有消息进行线性扫描，还是能够立即定位到foo.baz消息？
系统的水平扩展性如何？我之所以问这个问题，是因为我在尝试将Jetstream扩展到每秒几千条消息时遇到了问题，无论我投入多少台机器。测试参数如下：
- Nats服务器2.6.3运行在4核8GB的节点上
- 单个流复制3次（磁盘或内存似乎没有区别）
- 500字节的消息负载
- n个发布者，每个发布者每秒发布1k条消息
  瓶颈似乎在发布方面，因为我可以至少以与发布速度相同的速度检索消息。

英文:

I'm trying to understand how Nats Jetstream scales and have a couple of questions.

How efficient is subscribing by subject to historic messages? For example lets say have a stream foo that consists of 100 million messages with a subject of foo.bar and then a single message with a subject foo.baz. If I then make a subscription to foo.baz from the start of the stream will something on the server have to perform a linear scan of all messages in foo or will it be able to immediately seek to the foo.baz message.
How well does the system horizontally scale? I ask because I'm having issues getting Jetstream to scale much above a few thousand messages per second, regardless of how many machines I throw at it. Test parameters are as follows:
- Nats Server 2.6.3 running on 4 core 8GB nodes
- Single Stream replicated 3 times (disk or in-memory appears to make no difference)
- 500 byte message payloads
- n publishers each publishing 1k messages per second
  The bottleneck appears to be on the publishing side as I can retrieve messages at least as fast as I can publish them.

答案1

得分: 8

在NATS JetStream中发布消息与在Core NATS中发布消息略有不同。
是的，你可以将Core NATS消息发布到一个被流记录的主题中，该消息确实会被捕获到流中，但在Core NATS发布的情况下，发布应用程序不需要从nats-server接收到确认回复，而在JetStream发布调用的情况下，nats-server会向客户端发送确认回复，指示消息确实成功持久化和复制（或者没有）。

因此，当你执行js.Publish()时，实际上是进行了一个同步的相对高延迟的请求-响应（特别是如果你的复制是3或5，如果你的流持久化到文件中，并且取决于客户端应用程序和nats-server之间的网络延迟），这意味着如果你只是连续进行这些同步发布调用，你的吞吐量将受到限制。

如果你想要将消息发布到流中以获得吞吐量，你应该使用JetStream发布调用的异步版本（即你应该使用js.AsyncPublish()，它返回一个PubAckFuture）。

然而，在这种情况下，你还必须记住通过限制在任何给定时间内的“正在进行中”的异步发布应用程序的数量来引入一定量的流量控制（这是因为你总是可以异步发布消息比nats-server(s)能够复制和持久化消息要快得多。

如果你不断以尽可能快的速度异步发布（例如在发布某种批处理的结果时），最终会压垮你的服务器，这是你真正想要避免的事情。

你有两种选择来流量控制你的JetStream异步发布：

在获取JetStream上下文时，通过指定最大数量的“正在进行中”的异步发布请求作为选项来限制：即js = nc.JetStream(nats.PublishAsyncMaxPending(100))
进行一个简单的批处理机制，以检查每个一段时间的发布的PubAcks，就像nats bench所做的那样：https://github.com/nats-io/natscli/blob/e6b2b478dbc432a639fbf92c5c89570438c31ee7/cli/bench_command.go#L476

关于预期的性能：使用异步发布允许你真正发挥NATS和JetStream的吞吐量能力。验证或测量性能的一个简单方法是使用nats CLI工具（https://github.com/nats-io/natscli）运行基准测试。

例如，你可以从一个简单的测试开始：nats bench foo --js --pub 4 --msgs 1000000 --replicas 3（在内存流中，有3个副本，每个副本有4个go例程，每个例程都有自己的连接，以100个消息为一批发布128字节的消息），你应该得到比几千条消息每秒更多的消息。

有关如何使用nats bench命令的更多信息和示例，你可以参考这个视频：https://youtu.be/HwwvFeUHAyo

英文:

Publishing in NATS JetStream is slightly different than publishing in Core NATS.
Yes, you can publish a Core NATS message to a subject that is recorded by a stream and that message will indeed be captured in the stream, but in the case of the Core NATS publication, the publishing application does not expect an acknowledgement back from the nats-server, while in the case of the JetStream publish call, there is an acknowledgement sent back to the client from the nats-server that indicates that the message was indeed successfully persisted and replicated (or not).

So when you do js.Publish() you are actually making a synchronous relatively high latency request-reply (especially if your replication is 3 or 5, and more so if your stream is persisted to file, and depending on the network latency between the client application and the nats-server), which means that your throughput is going to be limited if you are just doing those synchronous publish calls back to back.

If you want throughput of publishing messages to a stream, you should use the asynchronous version of the JetStream publish call instead (i.e. you should use js.AsyncPublish() that returns a PubAckFuture).

However in that case you must also remember to introduce some amount of flow control by limiting the number of 'in-flight' asynchronous publish applications you want to have at any given time (this is because you can always publish asynchronously much much faster than the nats-server(s) can replicate and persist messages.

If you were to continuously publish asynchronously as fast as you can (e.g. when publishing the result of some kind of batch process) then you would eventually overwhelm your servers, which is something you really want to avoid.

You have two options to flow-control your JetStream async publications:

specify a max number of in-flight asynchronous publication requests as an option when obtaining your JetStream context: i.e. js = nc.JetStream(nats.PublishAsyncMaxPending(100))
Do a simple batch mechanism to check for the publication's PubAcks every so many asynchronous publications, like nats bench does: https://github.com/nats-io/natscli/blob/e6b2b478dbc432a639fbf92c5c89570438c31ee7/cli/bench_command.go#L476

About the expected performance: using async publications allows you to really get the throughput that NATS and JetStream are capable of. A simple way to validate or measure performance is to use the nats CLI tool (https://github.com/nats-io/natscli) to run benchmarks.

For example you can start with a simple test: nats bench foo --js --pub 4 --msgs 1000000 --replicas 3 (in memory stream with 3 replicas 4 go-routines each with it's own connection publishing 128 byte messages in batches of 100) and you should get a lot more than a few thousands messages per second.

For more information and examples of how to use the nats bench command you can take a look at this video: https://youtu.be/HwwvFeUHAyo

答案2

得分: 4

对于一个 R3 文件存储，您可以期望每秒大约有 25 万个小消息。如果您使用同步发布，那么主要受到应用程序到系统的往返时间（RTT）以及流领导者到最近的追随者的 RTT 的影响。您可以使用窗口化的智能异步发布来获得更好的性能。

使用内存存储可以获得更高的数字，但同样会受到整个系统的 RTT 的影响。

如果您告诉我消息的大小，我们可以向您展示一些针对演示服务器（R1）和 NGS（R1 和 R3）的 nats bench 的结果。

对于关于过滤消费者的原始问题，>=2.8.x 不会执行线性扫描来检索 foo.baz。如果需要，我们也可以展示一个示例。

欢迎加入 Slack 频道（slack.nats.io），那里是一个非常活跃的社区。甚至可以直接给我发私信，我很乐意帮助您。

英文:

For an R3 filestore you can expect ~250k small msgs per second. If you utilize synchronous publish that will be dominated by RTT from the application to the system, and from the stream leader to the closest follower. You can use windowed intelligent async publish to get better performance.

You can get higher numbers with memory stores, but again will be dominated by RTT throughout the system.

If you give me a sense of how large are your messages we can show you some results from nats bench against the demo servers (R1) and NGS (R1 & R3).

For the original question regarding filtered consumers, >= 2.8.x will not do a linear scan to retrieve foo.baz. We could also show an example of this as well if it would help.

Feel free to join the slack channel (slack.nats.io) which is a pretty active community. Even feel free to DM me directly, happy to help.

答案3

得分: 3

这是要翻译的内容：

对此有一个意见会很好。我有一个类似的行为，唯一能够提高发布者吞吐量的方法是降低复制（从3个降到1个），但这不是一个可接受的解决方案。

我尝试增加更多资源（CPU/内存），但无法增加发布速率。

此外，水平扩展也没有任何区别。

在我的情况下，我正在使用Bench工具发布到js。

英文:

Would be good to get an opinion on this. I have a similar behaviour and the only way to achieve higher throughput for publishers is to lower replication (from 3 to 1) but that won't be an acceptable solution.

I have tried adding more resources (cpu/ram) with no success on increasing the publishing rate.

Also, scaling horizontally did not make any difference.

In my situation , i am using Bench tool to publish to js.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Nats Jetstream的性能

问题

答案1

答案2

答案3

Golang – 生成具有一对多关系的映射

注册成员请求在签名验证时失败

Golang html/templates：使用自定义分隔符解析文件

在Go语言中，非捕获闭包会影响性能吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论