英文:
Logstash very low throughput with RabbitMQ
问题
I am using the following setup for my logging pipeline
fluentbit
-> logstash-frontend
-> rmq
-> logstash-backend
-> opensearch
Now, the logstash-frontend
is working fine and able to queue messages into RabbitMQ fine. The problem is that I am getting a very low throughput on logstash-backend.
This causes the Queue to pile up and eventually stall the whole setup.
Here are my configurations:
logstsh-frontend
output {
rabbitmq {
durable => true
exchange => "logstash"
exchange_type => "direct"
persistent => true
host => "opensearch-logging-cluster-rmq"
user => "****"
password => "****"
}
}
logstash-backend
input {
rabbitmq {
ack => false
durable => true
exchange => "logstash"
exchange_type => "direct"
host => "opensearch-logging-cluster-rmq"
user => "****"
password => "****"
threads => 4
}
}
I have also set the following in the logstash-backend
logstash.yaml
pipeline:
batch:
size: 2048
jvm.options
-Xms4g
-Xmx4g
11-13:-XX:+UseConcMarkSweepGC
11-13:-XX:CMSInitiatingOccupancyFraction=75
11-13:-XX:+UseCMSInitiatingOccupancyOnly
NOTE: I am running this whole setup in Google Kubernetes Engine
After starting the whole setup, I can see the Exchange and Queues
As well as connections, but the delivery rate is very slow - in the range of 300 messages/s
Exchange:
Queue:
Also, I see that there are ~70 queues created.I am running 3 replicas of logstash-frontend and backend
Any idea what I am doing wrong here?
英文:
I am using the following setup for my logging pipeline
fluentbit
-> logstash-frontend
-> rmq
-> logstash-backend
-> opensearch
Now, the logstash-frontend
is working fine and able to queue messages into RabbitMQ fine. The problem is that I am getting a very low throughput on logstash-backend.
This causes the Queue to pile up and eventually stall the whole setup.
Here are my configurations:
logstsh-frontend
output {
rabbitmq {
durable => true
exchange => "logstash"
exchange_type => "direct"
persistent => true
host => "opensearch-logging-cluster-rmq"
user => "****"
password => "****"
}
}
logstash-backend
input {
rabbitmq {
ack => false
durable => true
exchange => "logstash"
exchange_type => "direct"
host => "opensearch-logging-cluster-rmq"
user => "****"
password => "****"
threads => 4
}
}
I have also set the following in the logstash-backend
logstash.yaml
pipeline:
batch:
size: 2048
jvm.options
-Xms4g
-Xmx4g
11-13:-XX:+UseConcMarkSweepGC
11-13:-XX:CMSInitiatingOccupancyFraction=75
11-13:-XX:+UseCMSInitiatingOccupancyOnly
NOTE: I am running this whole setup in Google Kubernetes Engine
After starting the whole setup, I can see the Exchange and Queues
As well as connections, but the delivery rate is very slow - in the range of 300 messages/s
Exchange:
Queue:
Also, I see that there are ~70 queues created.I am running 3 replicas of logstash-frontend and backend
Any idea what I am doing wrong here?
答案1
得分: 1
Firstly I think there might be some misconfiguration of Logstash. In RabbitMQ you generally publish to an exchange and and consume from a queue. Why does your logstash-backend specify an exchange and not a queue? I haven't used the Logstash RMQ input plugin, but I'm surprised this would even work i.e. you don't consume from an exchange!
Also, I'm not sure how you're ending up with ~70 queues in RMQ as you're not specifying a routing key in your logstash-frontend (using the key
setting in Logstash config), so I assume the routing key would default to logstash
(based on the Logstash docs - see here) and there should only be 1 queue. It might be worth looking at the bindings (and binding keys) for your "logstash" exchange in RMQ to see what's going on...
WRT performance, it's quite a complex topic and there're a number of things that it could be. A good place to start would be this blog post on RMQ performance.
Here's a good list of RMQ performance optimizations... just to call a few of these out:
- Queues receiving more messages than your consumers can cope with could result in more CPU being used. You could try increasing the number of logstash-backend replicas...
- Queues getting so big that messages are written to disk to free up RAM. I can see in your diagram that some of the queues appear to have millions of messages, so this could be a possibility... ensure RMQ nodes have enough RAM and messages are getting consumed as quickly as they're being produced (and not just sitting there).
- Do you have a RMQ cluster or single instance and are you using durable storage? Clustering and message persistence can impact performance.
- Prefetch settings and acknowledgement batching. I can see you've already turned acknowledgements off, so this should already be optimized.
Another thing not mentioned above:
A queue is a single-threaded resource. If you’ve designed your routing topology in such a way that allows for messages to be spread across multiple queues rather than just hammering all messages into a single queue, then you can take advantage of additional CPU resources and minimize the CPU-hit per message e.g. not sure where all the logs are coming from, but you could specify different routing keys (in logstsh-frontend) based on some criteria (e.g. source application or based on some type of timestamp algorithm) and configure multiple Logstash pipelines (in the logstash-backend) to consume from different queues.
A couple other misc suggestions
- You could consolidate fluentbit and logstash-frontend to just use the OTEL (OpenTelemetry) agent with the RabbitMQ exporter.
- You could also explore Kafka instead of RMQ, but not sure this will be any easier to configure!
FYI. where I work we've implemented something similar with a queuing/streaming layer, but we use OTEL Collector agent
-> AWS Kinesis
-> Logstash
-> Elasticsearch
英文:
Firstly I think there might be some misconfiguration of Logstash. In RabbitMQ you generally publish to an exchange and and consume from a queue. Why does your logstash-backend specify an exchange and not a queue? I haven't used the Logstash RMQ input plugin, but I'm surprised this would even work i.e. you don't consume from an exchange!
Also, I'm not sure how you're ending up with ~70 queues in RMQ as you're not specifying a routing key in your logstash-frontend (using the key
setting in Logstash config), so I assume the routing key would default to logstash
(based on the Logstash docs - see here) and there should only be 1 queue. It might be worth looking at the bindings (and binding keys) for your "logstash" exchange in RMQ to see what's going on...
WRT performance, it's quite a complex topic and there're a number of things that it could be. A good place to start would be this blog post on RMQ performance.
Here's a good list of RMQ performance optimizations... just to call a few of these out:
- Queues receiving more messages than your consumers can cope with could result in more CPU being used. You could try increasing the number of logstash-backend replicas...
- Queues getting so big that messages are written to disk to free up RAM. I can see in your diagram that some of the queues appear to have millions of messages, so this could be a possibility... ensure RMQ nodes have enough RAM and messages are getting consumed as quickly as they're being produced (and not just sitting there).
- Do you have a RMQ cluster or single instance and are you using durable storage? Clustering and message persistence can impact performance.
- Prefetch settings and acknowledgement batching. I can see you've already turned acknowledgements off, so this should already be optimized.
Another thing not mentioned above:
A queue is a single-threaded resource. If you’ve designed your routing topology in such a way that allows for messages to be spread across multiple queues rather than just hammering all messages into a single queue, then you can take advantage of additional CPU resources and minimise the CPU-hit per message e.g. not sure where all the logs are coming from, but you could specify different routing keys (in logstsh-frontend) based on some criteria (e.g. source application or based on some type of timestamp algorithm) and configure multiple Logstash pipelines (in the logstash-backend) to consume from different queues.
A couple other misc suggestions
- You could consolidate fluentbit and logstash-frontend to just use the OTEL (OpenTelemetry) agent with the RabbitMQ exporter.
- You could also explore Kafka instead of RMQ, but not sure this will be any easier to configure!
FYI. where I work we've implemented something similar with a queuing/streaming layer, but we use OTEL Collector agent
-> AWS Kinesis
-> Logstash
-> Elasticsearch
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论