Performance capped - low events received rate


I have a logstash 5.6.9 that seems to have its performance capped and/or very low events received rate
It is a Virtual machine so event throughing resources at it ( doubling CPU, RAM ) does not help
I also tried playing with pipeline settings and configuring output to file ( instead of Elasticsearch)

The attached image show with 2 CPU, 10GB RAM, 4 GB HEAP, file output and
pipeline.workers: 8
pipeline.batch.size: 500

The input is taken from NxLOG - I have a script that dumps data into a log file

Any help will be appreciated especially in regards to proper method for testing performance ( i.e. is there a "push data script" and how to troubleshoot this

Many thanks
Steven

Well, you haven't given any indication of what sort of events you are processing, or what platform you are on, or anything else, but ...

At 16:36 the number of events processed drops to zero and latency is not even reported. This makes me wonder if you are having GC problems, although with a 4 GB heap that sounds unlikely unless you are processing very large events.

With 2 CPUs I would keep pipeline.workers at 2. Why are you increasing the pipeline.batch.size? That will increase resource demand. If anything I would reduce it from the default (125).

Investigating a performance problem like this is not really specific to logstash. The performance is constrained by some resource. Heap, CPU, physical memory (if the system is swapping), or IO are the most likely candidates, in roughly that order.

Hi,

Thanks for taking the trouble to provide advice

My concern is that it seems to be a ‘plateau’ / resources cap that prevent logstastash to perform

Where should I look to find out the culprit – i.e. is there a log/log level that will contain details about resource constraints ?

As I mentioned, this happens IRRESPECTIVE of any pipeline settings OR physical resources allocated to the VM running logstash

The logstash.conf is also simplified at maximum ( see below)

The gaps in the graph are due to me stopping / restarting logstash after making changes

Logstash have been configured with LimitNOFILE=65535

The VM is running Centos 7 with all limits increased as below

Atop is not showing any overused resources

cat /etc/sysctl.d/elk.conf

vm.swappiness=1 # turn off swapping

net.core.somaxconn=65535 # up the number of connections per port

vm.max_map_count=262144 #(default) http://www.redhat.com/magazine/001nov04/features/vm

fs.file-max=518144

[root@elk-test ~]# ulimit -Sa

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 39265

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 65535

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 39265

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

[root@elk-test ~]# ulimit -Ha

core file size (blocks, -c) unlimited

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 39265

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 65535

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) unlimited

cpu time (seconds, -t) unlimited

max user processes (-u) 39265

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

cat /etc/logstash/conf.d/Medavail.conf

input {

http {

port => 5044

}

}

filter {

json {

"source" => "message"

}

}

filter {

ruby {

init => "require 'atomic'; @sequence = Atomic.new(0)"

code => "event.set('sequence', @sequence.update { |v| v + 1 })"

}

}

output {

stdout {}

file {

path => "/tmp/logstash.out"

}

elasticsearch {

hosts => ["localhost:9200"]

index => ["ingestmedlogs"]

pool_max => 1800

}

Why are you using the http input plugin? If I recall correctly I have mostly in the past seen NxLog used together with a TCP input plugin. If that is a configuration possibility, that might reduce the overhead and improve throughput.

Try something like

input { generator { count => 10 message => '{"text" : "Generated message"}' } }
filter {
    json { source => "message" }
    ruby {
        init => "require 'atomic'; @sequence = Atomic.new(0)"
        code => "event.set('sequence', @sequence.update { |v| v + 1 })"
    }
}
output { stdout {} }

What's writing to the http input? I have seen tools (and I think it was NXLog) that add a 50 millisecond delay between messages when posting http.

Thanks guys
NxLOG is writing to HTPP
I used it so I can implement persistent queues
( which I disabled for the purpose of testing)

Ill try with TCP and see if it makes a difference

Thansk

The type of input used should not impact whether you can use persistent queues or not.

Hi,
So switching from om_http to om_tcp solved the performance issue

Christian, documentation is very clear about persistent queues ( see below excerpt)
Is there something that I misread / not understand ?
"...
These are problems not solved by the persistent queue feature:

  • Input plugins that do not use a request-response protocol cannot be protected from data loss. For example: tcp, udp, zeromq push+pull, and many other inputs do not have a mechanism to acknowledge receipt to the sender. Plugins such as beats and http, which do have an acknowledgement capability, are well protected by this queue
    .."

The TCP input does not offer delivery guarantees. If that is a requirement I would recommend switching to Filebeat as this uses a protocol that does provide acknowledgement and retries on failure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.