I'm pretty new in the elastic ecosystem.
For the work, I have to analyze logstash process, understand it and improve it if it's possible.
So I'm wondering a lot of questions.
Firstly, a little bit context :
The logstash is used to transfer data from an ES index A to an ES index B (each located on different servers).
So in the input, we use elasticsearch plugin to query index A.
Then, there is a lot of filters applied on each record/event.
Finally, output using elasticsearch plugin into the index B.
For now, there are approximatively 500k documents from the ES query.
The complete processed time is 2 hours. That's too much for us, and we must launch the process during the night.
So the goal is to reduce the time to 1 hour maximum.
So to reduce this time, few questions :
Is there a way to make a bulk query ? => If I understand well, each event is processed individually, input -> filters -> output. I'm wondering if it could be possible to filter all event and then sending all the record in one time to ES index B.
We also use plugins-filters-translate, to compared some property to a dictionary. The dictionaries are read for each event during the process ? Or once reading, logstash keep it in memory ? => for now, I'm not sure is an issue for the performance, because a dictionary is very light, but with the time, it will growth.
About the configuration, I read the documentation about workers and batch.size, I guess we need to audit our infra during the process to be sure to increase thoses properties ?