Correct way of Logstash persisted queue performance testing

Hello,

I am trying to measure Logstash 6.6 performance to find the performance impact of enabling persisted queues on Logstash 6.6.

  1. To eliminate any influence of elasticsearch environment - all tests are being outputted to /dev/null and there are no additional filters configured in logstash:

    output {
    file {
    path => "/dev/null"
    }
    }

  2. I have tried using following input configurations:

  • TCP input

    input {
    tcp {
    port => 5000
    codec => json
    }
    }

  • generator plugin

    input {
    generator {
    lines => [JSON log lines go here]
    count => 100000
    threads => 2
    }
    }

  • stdin input

    input {
    stdin {
    }
    }

  1. I have tested both memory and persisted queues for each of mentioned types of input(TCP, stdin and generator plugin). Persisted memory queue is located on a tmpfs partition in memory to eliminate any influence of HDD performance.

  2. I use monitoring API and Grafana to measure the number of events that are being processed by logstash - using :9600/_node/stats/pipelines/ and checking the events "in" and "out" stats for measurements

  3. I have used the same corpus of log lines for sending them via stdin and TCP connections (via the same python script) - and the same lines for generator plugin

  4. Empirically I was able to see that best performance for my environment (for each queue type) appears to be achieved with following settings:

For queue type memory:

  queue.type: memory
  pipeline.workers: 2
  pipeline.output.workers: 2
  pipeline.batch.size: 4000
  pipeline.batch.delay: 50

For queue type persisted:

  queue.type: persisted
  path.queue: "/var/log/logstash/data/memory/"
  queue.page_capacity: 10mb
  queue.drain: false
  queue.max_events: 0
  queue.max_bytes: 1.5gb
  queue.checkpoint.writes: 0
  queue.checkpoint.acks: 0
  pipeline.workers: 2
  pipeline.output.workers: 2
  pipeline.batch.size: 130
  pipeline.batch.delay: 50

As can be seen from the results below enabling persisted queue significantly decreases pure logstash performance:

  • stdin input + memory queue 104.6 K events/s max

  • generator input + memory queue 119.0 K events/s max

  • TCP input + memory queue 36.2 K events/s max

  • stdin input + persisted queue 35.9 K events/s max

  • generator input + persisted queue 48.4 K events/s max

  • TCP input + persisted queue 19.2 K events/s max

Please note that log lines corpus, pipeline worker and batch size configurations and JVM settings remained the same between "memory" and "persisted" queue tests of the same input.

Is such performance impact expected for persisted queue - or something is off in my tests? What is recommended way of comparing memory and persisted queue performance?

@alexpanas, your testing methodology looks pretty good to me and the throughput ratio that you are seeing between memory queues and persisted queues is similar to the ratio we've seen in our internal testing. At its current state of development, the persistent queues feature does have a performance cost.

@danhermann, thank you for confirmation.

Are there any recommendations (besides queue.checkpoint.writes: 0 and queue.checkpoint.acks: 0 ) to improve persisted queue performance - or the ratio I was able to get is pretty much the best it could be?

Am I also understanding correctly that the throughput achievable with queue.checkpoint.writes: 1 and queue.checkpoint.acks: 1 can be used as a worst-case scenario marker for worst possible performance of persisted queue?

@alexpanas, those are the two main software settings that affect PQ performance and yes, setting them both to 1 is definitely the worst-case scenario. Beyond those two, we've found that increasing queue.page_capacity beyond the default 64MB is generally counterproductive. And finally, of course, hardware matters so placing the PQ files on a fast SSD will perform better than placing them on a slower HDD.

@danhermann , thanks for the info.

I have also noticed that pipeline.batch.size tends to behave differently for memory and persisted queues - increasing this parameter to 4000 increased the performance of memory queue - but significantly decreased the performance of persisted queue (generator input was able to push only 39.4 K events/s max instead of 48.4 K for 130 value I ended up using in my tests)

Is this expected? Are there any guidelines on how to calculate optimal value of this parameter without trial-and-error process (which would require experimenting on production environment, which I am not fond of)?

@alexpanas, it is certainly true, as you've found, that higher batch sizes tend to work better for memory queues. With smaller batch sizes, we see significant lock contention on the memory queues which reduces throughput.

It's harder to find optimal sizes with the PQ, though. Generally speaking, they are smaller, but the optimal size depends a lot on the distribution of input events and how quickly they can be processed on the output side. I would suggest experimenting in a QA or staging environment to find the optimal batch size there. I would then use that same batch size in production and if your throughput in production was similar to what you observed in your QA or staging environment, I would assume that you are very close to the optimal batch size in your production environment as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.