Logstash doesn't use all CPU available

Hi guys,

Hope you can help me.
i'm running logstash 5.0.1 on a Blade server with 40 cores and 256GB of RAM.

I'm getting a LOT of logs into a single file (around 50.000 lines per minute) from a service we're running in the same server and I'm using metrics to get some metrics:

input {
file {
path => "/var/log/myfile.log"
start_position => "end"
sincedb_path => "/dev/null"
}
}

filter {
if [message] {
metrics {
meter => [ "%{host}" ]
add_tag => "metric"
}
}
}

output {
if "metric" in [tags] {
stdout {
codec => rubydebug
}
}
}

The problem is I'm getting around 1500 events per minute which I think is really slow.

{
"@version" => "1",
"@timestamp" => "2016-11-16T03:03:21.393Z",
"message" => "bb-warrns-ats-3",
"bb-warrns-ats-3" => {
"count" => 23356,
"rate_1m" => 1537.44797520174,
"rate_5m" => 1514.5634994611603,
"rate_15m" => 1510.598886839833
},
"tags" => [
[0] "metric"
]
}

CPU usage while running logstash doesn't pass 5%
Memory doesn't pass 15%

Does anybody have an idea what could be wrong?

Thanks in advance

How many pipeline workers?

I believe the rate numbers are the number of events per second averaged over the period specified, so it seems you are processing 1500 events per second, which equals 90000 events per minute. Logstash 5.0 has a new monitoring API, which may give you more information about processing rates.

Hi, i'm using "-w 40" which is the total de processors available on the server

Hi, yes you're right, I realised that right after i posted the thread. Thank you. I'm investigating what can i get from that API.

Being 90.000 records per minute is a very good number but makes the situation weirder because what could be causing the slowness then?

How are you observing the slowness?

Well, what I see after I start logstash is that the date/time field in the log starts to fall behind the timestamp. After 10 minutes running, i get a whole minute delay.

Is this by looking at the output to stdout? If this is the case, try writing to a file instead. As far as I recall, writing output to stdout is not very performant compared to many other outputs.

Yes, correct

I edited my post as you replied. As far as I recall writing to stdout is not very fast and could very well be what is slowing down processing.

Hi, i installed elasticsearch and i'm sending everything to it.

output {
elasticsearch {
hosts => "localhost"
index => "stats"
}
}

i'm still seeing an increasing delay between the timestamp and the log date.

Something tells me the problem is not fast enough reading of the file.

Any advice to improve input and output?

Thanks

Hi Oliver_Hernadez.

Although it is described that the CPU utilization is 5%, is it not 100% when looking at a single core?
Also, do software and hardware interrupts occur?
If you have the above trends,
It may be better to check RSS (Receive Side Scaling) and RFS (Receive Flow Steering)

One problem that load is concentrated too much on a single core is called TCP reordering problem.
This is because if the NIC does not support multiple queues (MSI-X or RSS)
The hardware interrupt from the NIC occurs because it is fixed to a single CPU.
The reason why hardware interrupt processing is fixed is that if multiple CPU interrupts are randomly applied, packets are processed in parallel,
This is because it is necessary to rearrange the packets when there is packet order guarantee like TCP, which may degrade performance.

On the Kernel side there is RFS as turning on RSS equivalent function.

# echo "f" > /sys/class/net/ethX/queues/rx-0/rps_cpus
# echo 4096 > /sys/class/net/ethX/queues/rx-0/rps_flow_cnt
# echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

I hope to be helpful.

It liiks like you now have a lot of pipeline workers but only a single output worker for Elasticsearch. You need to tune the pipeline as a whole. With this setup it is possible that Elasticsearch output is the bottleneck.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.