Slow performance of Logstash elasticsearch filter plugin

msk_76 · September 30, 2019, 9:47am

I have around 4.5Million records in my input data of logstash to which I am doing a lookup of an existing index in ES using following ES filter plugin. This is just like adding department information to a user_name field.

elasticsearch {
hosts => ["http://10.129.212.45:9200"]
index => "sys_username_mapping"
query => "user_name:%{[user_name]}"
fields => { "email" => "email" "site" => "site" "group" => "group" "division" => "division" "cad_cc" => "cad_cc" "ldap_cc" => "ldap_cc" }
}

After this lookup, I am doing indexing of this complete data in a new index in ES.

If I comment es filter plugin ( i.e without department information) it takes about 5-6 minutes to load all input data in elasticsearch and with having this filter plugin, it 's not even completing in 40 minutes.

Does translate filter can be an alternative to this? Will it perform better than ES filter plugin if I translate ( lookup ) to a text file than an already indexed data?

This user_name to department kind of lookup is important for me.

Please suggest

Badger · September 30, 2019, 1:25pm

An elasticsearch output makes one API call to elasticsearch for each batch of events. By default the batch size is 125. An elasticsearch filter makes one API call to elasticsearch for each event, so it is making 125 times as many calls. Thus it is not surprising to me that it would take more than 10 times as long.

I would expect a translate filter to be very much faster.

msk_76 · October 1, 2019, 4:26am

Thanks Badger for this explanation.

I am surprised when you said that a translate filter where the lookup file is stored on local disk will work faster than a ES query response in case of elasticsearch filter plugin. Because the disk IO throughput for ES cluster( due to parallelism) is multiple time higher than local disk of logstash server. May be I am wrong here.

Christian_Dahlqvist · October 1, 2019, 5:12am

The file the translate filter uses is read into memory so might require a larger heap if it is big but does not result in a lot of disk I/O.

msk_76 · October 1, 2019, 6:30am

Thanks Christian,

Yes, I wasn’t aware of this in memory read of translate filter. However could you also tell how to workaround the lookup file rollover because overwriting/updating
the lookup on disk could cause issue during the time when file/inode is getting updated? Is there any parameter in logstash with translate filter which keeps the last in-memory read of file and refresh it in memory only on our command.

Just for example steps:

Previous lookup file loaded in memory of logstash.

Lookup file updated or replaced or overwritten

Refresh the new file in-memory of logstash by some schedule.

system · October 29, 2019, 6:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Data loading to elasticsearch Logstash	15	5227	July 13, 2017
Optimize logstash filter plugin with million lines of dictionary look up Logstash	16	1473	February 26, 2019
How to use translate plugin with ES index for lookup data Logstash	6	446	August 21, 2019
Optimizing process time Logstash	2	296	March 13, 2020
Perfomance/Best Practice Question: Translate plugin vs ES Plugin Logstash	3	286	April 3, 2019

Slow performance of Logstash elasticsearch filter plugin

Related topics