Slow performance of Logstash elasticsearch filter plugin

I have around 4.5Million records in my input data of logstash to which I am doing a lookup of an existing index in ES using following ES filter plugin. This is just like adding department information to a user_name field.

elasticsearch {
hosts => ["http://10.129.212.45:9200"]
index => "sys_username_mapping"
query => "user_name:%{[user_name]}"
fields => { "email" => "email" "site" => "site" "group" => "group" "division" => "division" "cad_cc" => "cad_cc" "ldap_cc" => "ldap_cc" }
}

After this lookup, I am doing indexing of this complete data in a new index in ES.

If I comment es filter plugin ( i.e without department information) it takes about 5-6 minutes to load all input data in elasticsearch and with having this filter plugin, it 's not even completing in 40 minutes.

Does translate filter can be an alternative to this? Will it perform better than ES filter plugin if I translate ( lookup ) to a text file than an already indexed data?

This user_name to department kind of lookup is important for me.

Please suggest

An elasticsearch output makes one API call to elasticsearch for each batch of events. By default the batch size is 125. An elasticsearch filter makes one API call to elasticsearch for each event, so it is making 125 times as many calls. Thus it is not surprising to me that it would take more than 10 times as long.

I would expect a translate filter to be very much faster.

1 Like

Thanks Badger for this explanation.

I am surprised when you said that a translate filter where the lookup file is stored on local disk will work faster than a ES query response in case of elasticsearch filter plugin. Because the disk IO throughput for ES cluster( due to parallelism) is multiple time higher than local disk of logstash server. May be I am wrong here.

The file the translate filter uses is read into memory so might require a larger heap if it is big but does not result in a lot of disk I/O.

Thanks Christian,

Yes, I wasn’t aware of this in memory read of translate filter. However could you also tell how to workaround the lookup file rollover because overwriting/updating
the lookup on disk could cause issue during the time when file/inode is getting updated? Is there any parameter in logstash with translate filter which keeps the last in-memory read of file and refresh it in memory only on our command.

Just for example steps:

Previous lookup file loaded in memory of logstash.

Lookup file updated or replaced or overwritten

Refresh the new file in-memory of logstash by some schedule.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.