Logstash is very slow in sending the data to elasticsearch

anisen · January 19, 2017, 8:15am

Hi,

I am parsing an IIS log-file which is in below format:

2014-07-12:00:08:54 100.200.50.0 GET /mypath/mypage.asmx - 80 - 100.100.50.0 Mozilla/4.1+(compatible;+MSIE+4.01;+Windows+NT;+MS+Search+8.0+Robot) - 403 0 0 39

And, my logstash config file looks like this:

input {
file {
path => "D:/myLogFile.log"
type => "iis-log"
start_position => "beginning"
}
}

filter {

grok {
match => {
"message" => '%{DATA:timestamp} %{IPORHOST:clientip} %{NOTSPACE:method} %{NOTSPACE:uri} %{NOTSPACE:csuriquery} %{NOTSPACE:port} %{NOTSPACE:username} %{NOTSPACE:serverip} %{NOTSPACE:agent} %{NOTSPACE:referrer} %{NOTSPACE:status} %{NOTSPACE:sub_status} %{NOTSPACE:win_status} %{NOTSPACE:responsetime}'
}
}

date {
match => [ "timestamp", "YYYY-mm-dd:HH:mm:ss" ]
locale => en
}

}

output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "myindex"
}
}

I am processing around 10 lakh data, for which it is taking around 2 hours, which is much much slower than what I expected.

Kindly suggest how can I increase the performance/ tune the above config file/filter.

Thanks in advance.

Christian_Dahlqvist · January 19, 2017, 8:27am

Depending on which version of Logstash you are using, recommended tuning will differ as the pipeline has gone through changes. I would however recommend increasing the number of workers for the elastic search output a bit as a first step. Start setting this to a reasonably low number and increase this slowly until it no longer improves throughput. A good starting point may be the number of worker threads you have or the number of cores available one the Logstash host.

Nitz · January 19, 2017, 8:38am

+1

We experienced it too. Increasing number of workers didn't helped much.

Would like to get further assistance on this one.

Christian_Dahlqvist · January 19, 2017, 8:43am

Which version of Logstash are you using?

anisen · January 19, 2017, 8:55am

Logstash and Elasticsearch both are of 5.1.1 version.

Moreover, it seems workers are depricated in logstash 5.1.1.
Any alternate to this?

Christian_Dahlqvist · January 19, 2017, 9:06am

The documentation provides some good guidance on troubleshooting performance. Increasing the number of workers in the Elasticsearch output is one of the things discussed there.

What is the hardware specification of you Logstash host and Elasticsearch cluster?

anisen · January 19, 2017, 9:13am

Could you please let me know how I can increase the number of workers?

I had this config:

output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "myindex"
workers => 1
}
}

But, increasing the number of workers to 2 or more, giving the below error message:
"You are using a plugin that doesn't support workers but have set the workers value explicitly! This plugin uses the shared and doesn't need this option"

Christian_Dahlqvist · January 19, 2017, 9:50am

The Elasticsearch output in 5.1.1 apparently no longer need the workers parameter, as this is handled automatically as it is thread-safe. This could however be better documented in my opinion, so I will open an issue.

guyboertje · January 19, 2017, 9:58am

What is 10 lakh data?

guyboertje · January 19, 2017, 9:59am

What is the spec of the machine you are running Logstash on?

guyboertje · January 19, 2017, 10:00am

Are you running Elasticsearch and Logstash on the same machine?

guyboertje · January 19, 2017, 10:03am

Try anchoring your grok pattern to the beginning of the string e.g.

grok {
  match => { "message" => '^%{DATA:timestamp} %{IPORHOST:clientip} %{NOTSPACE:method} %{NOTSPACE:uri} %{NOTSPACE:csuriquery} %{NOTSPACE:port} %{NOTSPACE:username} %{NOTSPACE:serverip} %{NOTSPACE:agent} %{NOTSPACE:referrer} %{NOTSPACE:status} %{NOTSPACE:sub_status} %{NOTSPACE:win_status} %{NOTSPACE:responsetime}' }
}

add a ^ to the beginning of the pattern.
See https://www.elastic.co/blog/do-you-grok-grok

anisen · January 19, 2017, 10:06am

Yes. I'm running Elasticsearch and Logstash on the same machine, which has the below configuration:

Processor: AMD Athlon(tm) || X2 245 Processor @2.90 GHz
RAM : 8 GB

Tried with anchoring as well, but no appreciable difference in performance.

anisen · January 19, 2017, 11:52am

1 million lines of logs.

guyboertje · January 19, 2017, 12:22pm

Are you getting events tagged with _grokparsefailure?

anisen · January 19, 2017, 12:25pm

I Checked.No failures.

guyboertje · January 19, 2017, 12:33pm

1000000 / 2 / 60 / 60 is 138 events per second. You should be seeing ~2000 to 4000 events per second.

Try excluding Elasticsearch by using the stdout output with the dots codec.

You should see the dots stop after about 4 minutes - if so then ES is the bottleneck.

anisen · January 24, 2017, 11:14am

Tried removing elasticsearch and using dots codec, it is taking 7 mins.

But, I need the data in Elasticsearch, so sending data to ES in different machine instead of localhost:

output {
elasticsearch {
action => "index"
hosts => "100.100.0.10:9200"
index => "myindex"
}
}

Still taking a lot of time (around 90 mins). Any suggestion how to improve this?

system · February 21, 2017, 11:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improve performance of Logstash data loading into ES Logstash	16	4237	July 18, 2017
Logstash parse too slow to elasticsearch Logstash	9	2269	March 2, 2018
Slow Data loading to elasticsearch Logstash	15	5227	July 13, 2017
Problem Performance Elasticsearch Elasticsearch	16	1065	April 22, 2017
Slow processing in Logstash with S3 input Logstash	1	1867	July 6, 2017

Logstash is very slow in sending the data to elasticsearch

Related topics