Logstash is very slow in sending the data to elasticsearch

Hi,

I am parsing an IIS log-file which is in below format:

2014-07-12:00:08:54 100.200.50.0 GET /mypath/mypage.asmx - 80 - 100.100.50.0 Mozilla/4.1+(compatible;+MSIE+4.01;+Windows+NT;+MS+Search+8.0+Robot) - 403 0 0 39

And, my logstash config file looks like this:

input {
file {
path => "D:/myLogFile.log"
type => "iis-log"
start_position => "beginning"
}
}

filter {

grok {
match => {
"message" => '%{DATA:timestamp} %{IPORHOST:clientip} %{NOTSPACE:method} %{NOTSPACE:uri} %{NOTSPACE:csuriquery} %{NOTSPACE:port} %{NOTSPACE:username} %{NOTSPACE:serverip} %{NOTSPACE:agent} %{NOTSPACE:referrer} %{NOTSPACE:status} %{NOTSPACE:sub_status} %{NOTSPACE:win_status} %{NOTSPACE:responsetime}'
}
}

date {
match => [ "timestamp", "YYYY-mm-dd:HH:mm:ss" ]
locale => en
}

}

output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "myindex"
}
}

I am processing around 10 lakh data, for which it is taking around 2 hours, which is much much slower than what I expected.

Kindly suggest how can I increase the performance/ tune the above config file/filter.

Thanks in advance.

Depending on which version of Logstash you are using, recommended tuning will differ as the pipeline has gone through changes. I would however recommend increasing the number of workers for the elastic search output a bit as a first step. Start setting this to a reasonably low number and increase this slowly until it no longer improves throughput. A good starting point may be the number of worker threads you have or the number of cores available one the Logstash host.

+1

We experienced it too. Increasing number of workers didn't helped much.

Would like to get further assistance on this one.

Which version of Logstash are you using?

Logstash and Elasticsearch both are of 5.1.1 version.

Moreover, it seems workers are depricated in logstash 5.1.1.
Any alternate to this?

The documentation provides some good guidance on troubleshooting performance. Increasing the number of workers in the Elasticsearch output is one of the things discussed there.

What is the hardware specification of you Logstash host and Elasticsearch cluster?

Could you please let me know how I can increase the number of workers?

I had this config:

output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "myindex"
workers => 1
}
}

But, increasing the number of workers to 2 or more, giving the below error message:
"You are using a plugin that doesn't support workers but have set the workers value explicitly! This plugin uses the shared and doesn't need this option"

The Elasticsearch output in 5.1.1 apparently no longer need the workers parameter, as this is handled automatically as it is thread-safe. This could however be better documented in my opinion, so I will open an issue.

What is 10 lakh data?

What is the spec of the machine you are running Logstash on?

Are you running Elasticsearch and Logstash on the same machine?

Try anchoring your grok pattern to the beginning of the string e.g.

grok {
  match => { "message" => '^%{DATA:timestamp} %{IPORHOST:clientip} %{NOTSPACE:method} %{NOTSPACE:uri} %{NOTSPACE:csuriquery} %{NOTSPACE:port} %{NOTSPACE:username} %{NOTSPACE:serverip} %{NOTSPACE:agent} %{NOTSPACE:referrer} %{NOTSPACE:status} %{NOTSPACE:sub_status} %{NOTSPACE:win_status} %{NOTSPACE:responsetime}' }
}

add a ^ to the beginning of the pattern.
See https://www.elastic.co/blog/do-you-grok-grok

1 Like

Yes. I'm running Elasticsearch and Logstash on the same machine, which has the below configuration:

Processor: AMD Athlon(tm) || X2 245 Processor @2.90 GHz
RAM : 8 GB

Tried with anchoring as well, but no appreciable difference in performance.

1 million lines of logs.

Are you getting events tagged with _grokparsefailure?

I Checked.No failures.

1000000 / 2 / 60 / 60 is 138 events per second. You should be seeing ~2000 to 4000 events per second.

Try excluding Elasticsearch by using the stdout output with the dots codec.

You should see the dots stop after about 4 minutes - if so then ES is the bottleneck.

Tried removing elasticsearch and using dots codec, it is taking 7 mins.

But, I need the data in Elasticsearch, so sending data to ES in different machine instead of localhost:

output {
elasticsearch {
action => "index"
hosts => "100.100.0.10:9200"
index => "myindex"
}
}

Still taking a lot of time (around 90 mins). Any suggestion how to improve this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.