Hope you can help me.
i'm running logstash 5.0.1 on a Blade server with 40 cores and 256GB of RAM.
I'm getting a LOT of logs into a single file (around 50.000 lines per minute) from a service we're running in the same server and I'm using metrics to get some metrics:
I believe the rate numbers are the number of events per second averaged over the period specified, so it seems you are processing 1500 events per second, which equals 90000 events per minute. Logstash 5.0 has a new monitoring API, which may give you more information about processing rates.
Well, what I see after I start logstash is that the date/time field in the log starts to fall behind the timestamp. After 10 minutes running, i get a whole minute delay.
Is this by looking at the output to stdout? If this is the case, try writing to a file instead. As far as I recall, writing output to stdout is not very performant compared to many other outputs.
Although it is described that the CPU utilization is 5%, is it not 100% when looking at a single core?
Also, do software and hardware interrupts occur?
If you have the above trends,
It may be better to check RSS (Receive Side Scaling) and RFS (Receive Flow Steering)
One problem that load is concentrated too much on a single core is called TCP reordering problem.
This is because if the NIC does not support multiple queues (MSI-X or RSS)
The hardware interrupt from the NIC occurs because it is fixed to a single CPU.
The reason why hardware interrupt processing is fixed is that if multiple CPU interrupts are randomly applied, packets are processed in parallel,
This is because it is necessary to rearrange the packets when there is packet order guarantee like TCP, which may degrade performance.
On the Kernel side there is RFS as turning on RSS equivalent function.
It liiks like you now have a lot of pipeline workers but only a single output worker for Elasticsearch. You need to tune the pipeline as a whole. With this setup it is possible that Elasticsearch output is the bottleneck.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.