Is there any way to configure Logstash input (elasticsearch) so that after rebooting Logstash does not load the whole elasticsearch index (input) from the beginning?
So that it will load the index from the last document it reads...
I think it always starts reading from the beginning...
Are there any possibilities for manipulation?
I'm not sure if persisted queue here solves the problem...
It seems the elasticsearch input plugin in Logstash lacks a checkpointing feature for it to remember the last doc that it queried. Something similar exists in the jdbc input plugin - see the tracking_column and last_run_metadata_path on this link - https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
You could try creating a similar last_queried_doc metadata index in Elasticsearch by using the elasticsearch output plugin or store that in any other persistent datastore such as Dynamodb or Mongodb and then during the boot of your Logstash process you could look up that value and inject that value as an environment variable which can be set into the query setting of the Elasticsearch input plugin.
Keep in mind that if you use @timestamp column to track the last doc queried, you possibly will have a situation where some of the docs may have updated (depending on how you write to your index) and they will not get updated in your output index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.