I have a logstash instance which is taking in quite a bit of traffic via the tcp input. Logstash is currently cpu-bound on this particular server, which I understand is not at all an unusual situation. By cpu-bound I mean that I have six CPUs on my server, 6 worker threads, and each worker thread is using 95-100% of CPU time (one worker thread per CPU). Memory utilization and I/O are on the low side.
I'd like to better understand what exactly is going on when CPU bound. I assume that, since the machine is running at capacity, some incoming events will either a) be cached somewhere or b) be refused/dropped. If they are cached, where is the cache? If they are dropped, can I detect this situation somehow (is anything logged, for example)? Or is it something else entirely?
When an output or the processing pipeline is not able to keep up, e.g. due to saturating CPU, Logstash will apply back-pressure and simply stop reading from the inputs. For a lot of inputs this is not a problem as e.g. Filebeat simply can stop reading files and wait until the blockage has resolved. For other types of inputs this may cause issues and/or data loss. For UDP input it is likely to lead to data loss as the buffers fill up, and for TCP based inputs it may lead to problems for the application sending data.
Whenever there are inputs present that do not handle back-pressure well, it is quite common to introduce a message queue into the architecture as this can act as a buffer. The Logstash instance collecting the data in this case tend to do relatively little processing in order to ensure good throughput, and most of the heavy processing can be left to the Logstash instance(s) that read and process data off the message queue.
I am not aware of any way to detect it. A new monitoring API has recently been introduced in Logstash 5.0, but as far as I am aware it does not yet include this type of information.
If this is a problem, I would recommend introducing a queueing mechanism to the pipeline or scale out to reduce the load on the current node.
Thanks again. Yes, a queue or any type of buffer could definitely allow for detection on the client side. But I believe it's a critical piece of information to be able to detect this on the server (logstash) side as well. I plan to open a logstash issue on github to this end. I will update this thread when I do so.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.