Logstash InfluxDB Output Plugin preformance


#1

Hello,

I am using Logstash 5.6.1-1 on Centos 7. I have my input configured as an HTTP input, some filters and an output writing to InfluxDB, which is on another Linux server. My output configuration is as follows:
output {
influxdb {
flush_size => 5000
host => ["x.x.x.x"]
db => "testinflux"
port => "8086"
measurement => "xyz"
allow_time_override => true
use_event_fields_for_data_points => true
coerce_values => {
"type " => "string"
"unit" => "string"
}
send_as_tags => [
"type "
"unit"
]

}
stdout {
codec => rubydebug
}
When I send my data via single process to Logstash via HTTP, I seem to only write 25 points to InfluxDB.
When I send via 2 processes, my InfluxDB seems to get 50 points on average.
When I send via 3 and 4 processes, its 75 points and 100 points on average.
After if I add more processes, it just stays on 100 points written on average (doesnt go up to 125, 150 etc). The averages seem very consistent.

I configured flush_size and changed it from default 100 to 5000. The write averages do not change at all after configuration change.
I changed the following in logstash.yml:
pipeline.workers: 4

pipeline.workers: 2

pipeline.output.workers: 4

pipeline.output.workers: 1

pipeline.batch.size: 2000

pipeline.batch.size: 125

I have 4 CPU and 8GB RAM.

Is there another configuration I need to tweak in order to write more than 100 points to InfluxDB via Logstash?I want to achieve 5000 points on average.


#2

Any ideas? It seems that flush_size configuration does not work. I can see it is taken into account when running logstash in debug mode but doesnt have the desired result. I have also tried running logstash with -w 4 and -b 2000 setting but doesnt change the result.


(Christian Dahlqvist) #3

In newer versions of Logstash, the output plugins are tied to each batch, so it will never exceed the pipeline.batch.size value. If you want larger batches sent to the output, you will therefore need to increase this parameter as well as the output flush size. The changes you made, assuming bold means commented out` should therefore work.

Is this the only output you have in your config? Do you have other types of data coming through the same Logstash instance that could reduce the effective batch size?


#4

Indeed the values in bold were commented out!

So in my logstash.yml I have:
pipeline.batch.size: 2000

in my config file I matched the InfluxDB output to have flush_size 2000 as well.

I ran logstash specifying the path to logstash.yml file, where I changed the pipeline.batch.size value, just in case as follows:
sudo /usr/share/logstash/bin/logstash --path.settings ./ -f /home/datamgr/dev/logstash/logstash-influx-data.conf

There is only one output in config, no other output specified. This is the only data coming in via Logstash at the moment.

I am monitoring the writes via Grafana, its still showing 25 writes per 15 seconds.


#5

Command line, is it possible to show how much data is coming in via input and how much it is being sent via output to InfluxDB as well?


(Christian Dahlqvist) #6

Have a look in the docs for options around monitoring Logstash.


#7

Thanks for the link for monitoring, I will enable that on the side as well. Any other ideas for the performance issue, other than changing pipeline.batch.size and flush_size?
I have only 1 output in use and data coming in is via HTTP input. There is no other data coming in via Logstash at that time, I am only testing with 1 configuration file at that time.


(Christian Dahlqvist) #8

Have you disabled output to stdout?

How are you receiving data into Logstash? Might it be a limitation on the input side? Have you tried pushing a good amount of data into Logstash by other means, e.g. through stdin input, to see if this change the throughput?


#9

Your suggestions proved very useful and I found that it wasnt a limitation on Logstash side. I was able to install x-pack which was great for monitoring.
I have done external test with xml as input and then again as http as input but post data with a curl that posts some json in a loop and see the performance is much better.
So it seems its the sender that has a limitation in the end.

Thank you for your help, much appreciated!


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.