Optimizing process time

not_found · February 10, 2020, 11:07am

Hello everyone,

I'm pretty new in the elastic ecosystem.
For the work, I have to analyze logstash process, understand it and improve it if it's possible.
So I'm wondering a lot of questions.

Firstly, a little bit context :

The logstash is used to transfer data from an ES index A to an ES index B (each located on different servers).

So in the input, we use elasticsearch plugin to query index A.
Then, there is a lot of filters applied on each record/event.
Finally, output using elasticsearch plugin into the index B.

For now, there are approximatively 500k documents from the ES query.
The complete processed time is 2 hours. That's too much for us, and we must launch the process during the night.
So the goal is to reduce the time to 1 hour maximum.

So to reduce this time, few questions :

Is there a way to make a bulk query ? => If I understand well, each event is processed individually, input -> filters -> output. I'm wondering if it could be possible to filter all event and then sending all the record in one time to ES index B.
We also use plugins-filters-translate, to compared some property to a dictionary. The dictionaries are read for each event during the process ? Or once reading, logstash keep it in memory ? => for now, I'm not sure is an issue for the performance, because a dictionary is very light, but with the time, it will growth.
About the configuration, I read the documentation about workers and batch.size, I guess we need to audit our infra during the process to be sure to increase thoses properties ?

Thanks,

mikewillis · February 14, 2020, 4:36pm

It is worth looking at those filters and the order they're used in and thinking about whether you can make them work more efficiently. I once made Logstash process some data 50x faster by looking at the data and altering the Logstash filters it went through based on what I found. E.g. the logs were going through a grok filter which was trying multiple patterns for every event. I realised that one pattern matched much more often than the other patterns did, so I made grok try that pattern first and Logstash got faster. There were a bunch of conditional statements like

 if [field] == "one" {
    do_stuff
 } else if [field] == "two" {
   do_other_stuff
 } else if [field] == "three" {
   do_some_other_stuff
 } else if [field] == "four" {
   do_yet_other_stuff

I realised that =="four" condition would be matched a lot more often than =="two" so I put the =="two" before =="four" and Logstash got faster.

When I was experimenting I used the metrics filter and the stdout plugin to compare how many events per second each variant of the filtering could process as described at
https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.htm
While testing I had all Logstash send all the processed data to

output {
    null{}
}

because it's not necessary to have Logstash output the data anywhere useful to find out if it's filters are faster.

system · March 13, 2020, 4:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Faster speed when indexing log files Logstash	11	5160	July 6, 2017
Logstash performance for indexing to ES Logstash	6	639	April 18, 2017
Slow performance of Logstash elasticsearch filter plugin Logstash	5	1472	October 29, 2019
Logstash input/ouput elasticsearch plugin capped performances Logstash	6	306	June 19, 2021
Slow processing in Logstash with S3 input Logstash	1	1867	July 6, 2017

Optimizing process time

Related topics