Logstash与RabbitMQ的吞吐量非常低。

huangapple go评论56阅读模式
英文:

Logstash very low throughput with RabbitMQ

问题

I am using the following setup for my logging pipeline

fluentbit -> logstash-frontend -> rmq -> logstash-backend -> opensearch

Now, the logstash-frontend is working fine and able to queue messages into RabbitMQ fine. The problem is that I am getting a very low throughput on logstash-backend.

This causes the Queue to pile up and eventually stall the whole setup.

Here are my configurations:

logstsh-frontend

    output {
      rabbitmq {
        durable => true
        exchange => "logstash"
        exchange_type => "direct"
        persistent => true
        host => "opensearch-logging-cluster-rmq"
        user => "****"
        password => "****"
      }
    }

logstash-backend

    input {
      rabbitmq {
        ack => false
        durable => true
        exchange => "logstash"
        exchange_type => "direct"
        host => "opensearch-logging-cluster-rmq"
        user => "****"
        password => "****"
        threads => 4
      }
    }  

I have also set the following in the logstash-backend

logstash.yaml

pipeline:
      batch:
        size: 2048

jvm.options

    -Xms4g
    -Xmx4g
    11-13:-XX:+UseConcMarkSweepGC
    11-13:-XX:CMSInitiatingOccupancyFraction=75
    11-13:-XX:+UseCMSInitiatingOccupancyOnly

NOTE: I am running this whole setup in Google Kubernetes Engine

After starting the whole setup, I can see the Exchange and Queues
As well as connections, but the delivery rate is very slow - in the range of 300 messages/s

Exchange:

Logstash与RabbitMQ的吞吐量非常低。

Queue:

Logstash与RabbitMQ的吞吐量非常低。

Also, I see that there are ~70 queues created.I am running 3 replicas of logstash-frontend and backend

Any idea what I am doing wrong here?

英文:

I am using the following setup for my logging pipeline

fluentbit -> logstash-frontend -> rmq -> logstash-backend -> opensearch

Now, the logstash-frontend is working fine and able to queue messages into RabbitMQ fine. The problem is that I am getting a very low throughput on logstash-backend.

This causes the Queue to pile up and eventually stall the whole setup.

Here are my configurations:

logstsh-frontend

    output {
      rabbitmq {
        durable => true
        exchange => "logstash"
        exchange_type => "direct"
        persistent => true
        host => "opensearch-logging-cluster-rmq"
        user => "****"
        password => "****"
      }
    }

logstash-backend

    input {
      rabbitmq {
        ack => false
        durable => true
        exchange => "logstash"
        exchange_type => "direct"
        host => "opensearch-logging-cluster-rmq"
        user => "****"
        password => "****"
        threads => 4
      }
    }  

I have also set the following in the logstash-backend

logstash.yaml

pipeline:
      batch:
        size: 2048

jvm.options

    -Xms4g
    -Xmx4g
    11-13:-XX:+UseConcMarkSweepGC
    11-13:-XX:CMSInitiatingOccupancyFraction=75
    11-13:-XX:+UseCMSInitiatingOccupancyOnly

NOTE: I am running this whole setup in Google Kubernetes Engine

After starting the whole setup, I can see the Exchange and Queues
As well as connections, but the delivery rate is very slow - in the range of 300 messages/s

Exchange:

Logstash与RabbitMQ的吞吐量非常低。

Queue:

Logstash与RabbitMQ的吞吐量非常低。

Also, I see that there are ~70 queues created.I am running 3 replicas of logstash-frontend and backend

Any idea what I am doing wrong here?

答案1

得分: 1

Firstly I think there might be some misconfiguration of Logstash. In RabbitMQ you generally publish to an exchange and and consume from a queue. Why does your logstash-backend specify an exchange and not a queue? I haven't used the Logstash RMQ input plugin, but I'm surprised this would even work i.e. you don't consume from an exchange!

Also, I'm not sure how you're ending up with ~70 queues in RMQ as you're not specifying a routing key in your logstash-frontend (using the key setting in Logstash config), so I assume the routing key would default to logstash (based on the Logstash docs - see here) and there should only be 1 queue. It might be worth looking at the bindings (and binding keys) for your "logstash" exchange in RMQ to see what's going on...

WRT performance, it's quite a complex topic and there're a number of things that it could be. A good place to start would be this blog post on RMQ performance.

Here's a good list of RMQ performance optimizations... just to call a few of these out:

  • Queues receiving more messages than your consumers can cope with could result in more CPU being used. You could try increasing the number of logstash-backend replicas...
  • Queues getting so big that messages are written to disk to free up RAM. I can see in your diagram that some of the queues appear to have millions of messages, so this could be a possibility... ensure RMQ nodes have enough RAM and messages are getting consumed as quickly as they're being produced (and not just sitting there).
  • Do you have a RMQ cluster or single instance and are you using durable storage? Clustering and message persistence can impact performance.
  • Prefetch settings and acknowledgement batching. I can see you've already turned acknowledgements off, so this should already be optimized.

Another thing not mentioned above:

A queue is a single-threaded resource. If you’ve designed your routing topology in such a way that allows for messages to be spread across multiple queues rather than just hammering all messages into a single queue, then you can take advantage of additional CPU resources and minimize the CPU-hit per message e.g. not sure where all the logs are coming from, but you could specify different routing keys (in logstsh-frontend) based on some criteria (e.g. source application or based on some type of timestamp algorithm) and configure multiple Logstash pipelines (in the logstash-backend) to consume from different queues.

A couple other misc suggestions

  • You could consolidate fluentbit and logstash-frontend to just use the OTEL (OpenTelemetry) agent with the RabbitMQ exporter.
  • You could also explore Kafka instead of RMQ, but not sure this will be any easier to configure!

FYI. where I work we've implemented something similar with a queuing/streaming layer, but we use OTEL Collector agent -> AWS Kinesis -> Logstash -> Elasticsearch

英文:

Firstly I think there might be some misconfiguration of Logstash. In RabbitMQ you generally publish to an exchange and and consume from a queue. Why does your logstash-backend specify an exchange and not a queue? I haven't used the Logstash RMQ input plugin, but I'm surprised this would even work i.e. you don't consume from an exchange!

Also, I'm not sure how you're ending up with ~70 queues in RMQ as you're not specifying a routing key in your logstash-frontend (using the key setting in Logstash config), so I assume the routing key would default to logstash (based on the Logstash docs - see here) and there should only be 1 queue. It might be worth looking at the bindings (and binding keys) for your "logstash" exchange in RMQ to see what's going on...

WRT performance, it's quite a complex topic and there're a number of things that it could be. A good place to start would be this blog post on RMQ performance.

Here's a good list of RMQ performance optimizations... just to call a few of these out:

  • Queues receiving more messages than your consumers can cope with could result in more CPU being used. You could try increasing the number of logstash-backend replicas...
  • Queues getting so big that messages are written to disk to free up RAM. I can see in your diagram that some of the queues appear to have millions of messages, so this could be a possibility... ensure RMQ nodes have enough RAM and messages are getting consumed as quickly as they're being produced (and not just sitting there).
  • Do you have a RMQ cluster or single instance and are you using durable storage? Clustering and message persistence can impact performance.
  • Prefetch settings and acknowledgement batching. I can see you've already turned acknowledgements off, so this should already be optimized.

Another thing not mentioned above:

A queue is a single-threaded resource. If you’ve designed your routing topology in such a way that allows for messages to be spread across multiple queues rather than just hammering all messages into a single queue, then you can take advantage of additional CPU resources and minimise the CPU-hit per message e.g. not sure where all the logs are coming from, but you could specify different routing keys (in logstsh-frontend) based on some criteria (e.g. source application or based on some type of timestamp algorithm) and configure multiple Logstash pipelines (in the logstash-backend) to consume from different queues.

A couple other misc suggestions

  • You could consolidate fluentbit and logstash-frontend to just use the OTEL (OpenTelemetry) agent with the RabbitMQ exporter.
  • You could also explore Kafka instead of RMQ, but not sure this will be any easier to configure!

FYI. where I work we've implemented something similar with a queuing/streaming layer, but we use OTEL Collector agent -> AWS Kinesis -> Logstash -> Elasticsearch

huangapple
  • 本文由 发表于 2023年4月19日 23:09:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056123.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定