How can I improve Logstash performance?

white_rabbit · July 20, 2018, 11:45am

Hello, I have logs in file and my logstash.conf looks in the way shown below. When I run sudo ./logstash -f /path/logstash.conf -b 100000 -w 1 and monitor elasticsearch node in Kibana I see on the graph in document count that the operation of adding 123000 of rows takes 20 seconds. I have no idea why performance is so terrible. Furthermore even I have one milion of rows in my file it adds only size of one batch, in this case 123000. Is it possible to add whole files in batches to elasticsearch? In this case ten batches with 123000 of rows. The most important for me is the efficiency of adding rows to the database. I tried to use higher number of threads for instance -w equals 10 but I get errorjava.lang.OutOfMemoryError: Java heap space, so I use sudo LS_JAVA_OPTS="-Xmx31g -Xms3g" unfortunately it doesn't improve performance. Any ideas how can I improve performance?

input {
  file {
    path => "/run/shm/elastic/logstash/file"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {

  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}" }
  }

}
output {
    elasticsearch {
        hosts => ["address"]
        index => "logs"
        document_type => "log"
    }
}

Christian_Dahlqvist · July 20, 2018, 11:49am

A larger batch size does not necessarily mean better performance. This looks quite high to me, so I would recommend reducing it to 1000.

Why are you limiting logstash to a single worker thread? That is surely not going to help throughput. Remove this and go with the default value in combination with a smaller batch size.

Ultimately Logstash can not send data faster than Elasticsearch can accept. What is the specification of your Elasticsearch cluster?

white_rabbit · July 20, 2018, 12:08pm

I tried also for instance with -b 1000 -w 7 but the performance was even worse.

Christian_Dahlqvist · July 20, 2018, 12:15pm

Is Elasticsearch running on the same node as Logstash? What throughput are you seeing? What kind of storage do you have?

white_rabbit · July 20, 2018, 12:29pm

No, Elasticsearch is running on different node which is the same as the one with Logstash. Logstash is running on SSD and elasticsearch on HDD. I am not sure where I can check throughput?

Christian_Dahlqvist · July 20, 2018, 12:36pm

What is the specification of the node running Elasticsearch? What does CPU and disk performance metrics look like there during indexing?

white_rabbit · July 20, 2018, 12:57pm

As I have mentioned before these two servers with Elasticsearch and Logstash are the same. When adding data CPU usage of more than 50% has never been achieved. Usually the usage is around 8%.

Christian_Dahlqvist · July 20, 2018, 12:59pm

Indexing into Elasticsearch can often be I/O intensive. What does iostat -x on the Elasticsearch node show during indexing? have you followed these recommendations?

You can install X-Pack Monitoring.

white_rabbit · July 22, 2018, 5:42pm

Yes, I followed these recommendations in the past and the most optimal performance I achieved is indexing 10000 of logs per second using 30 separated processes. I mean only adding logs to elasticsearch without sending them using Logstash. When I send logs using logstash the best performance is equals to 5000 logs per second including sending and indexing in elasticsearch. Do you have an idea how can I improve performance? I monitor node using kibana and grafana. If any additional information is required just let me know. Thanks in advance.

Christian_Dahlqvist · July 22, 2018, 5:52pm

Can you try removing the stdout output as well? As you have a lot of cores on the Logstash machine I would try something like this: -b 1000 -w 40

Christian_Dahlqvist · July 22, 2018, 6:43pm

What does the graph showing indexing rate show?

white_rabbit · July 22, 2018, 7:33pm

Elasticsearch node grafana:

Logstash node grafana:

lenn4rd · July 25, 2018, 5:37pm

The CPUs aren't maxed out which means they're not the bottleneck here. Usually using spinning hard drives on database servers as opposed to flash drives should raise a red flag. There are valid reasons to go with HDDs, e.g. better cost efficiency for big data volumes. It comes with performance penalty regarding throughput though.

Did you monitor I/O stats on the Elasticsearch instance after you started sending data from Logstash? Are there any other processes running which Elasticsearch might compete with for I/O? If that's not the cause you'd need to dig deeper, e.g. if you're running a RAID array. The way the array is set up (things like block size and the like) can have an impact on read/write performance for some use cases.

white_rabbit · July 26, 2018, 11:41am

Now I have performance equals to 45000k per second.

Christian_Dahlqvist · July 26, 2018, 11:49am

Has your Logstash config changed compared to what you posted earlier? Is all data going into a single index?

white_rabbit · July 26, 2018, 12:29pm

I updated config. The current version is in the first post. Yes, all data is going into a single index.

Christian_Dahlqvist · July 26, 2018, 12:52pm

As you are writing into a single index and letting Elasticsearch assign document IDs I can not see any reason for this. Are all documents the same size and structure?

white_rabbit · July 26, 2018, 12:54pm

No, the length of logs can differ at least by a few characters.

white_rabbit · July 26, 2018, 1:34pm

How logstash adds logs to elasticsearch? Does he use bulk from helpers from elasticsearch?

Christian_Dahlqvist · July 26, 2018, 1:37pm

Yes, Logstash uses bulk requests.

Topic		Replies	Views
Elasticsearch and logstash performance drop Elasticsearch	1	435	April 2, 2019
Logstash Peformance Logstash	35	3286	May 19, 2017
Logstash to ElasticSearch Throughput Logstash	6	1730	April 28, 2017
Logstash Batch Size/Workers log message Logstash	14	12264	July 6, 2017
Bottleneck while inputting data into the elasticsearch Logstash	7	3343	December 29, 2016

How can I improve Logstash performance?

Related topics