We have a use case where the watcher should skip sifting through all the documents that it has already used for matching the watcher query. The next time the watcher runs, only new documents should be used for matching.
We currently have a sliding window of timestamp where it selects only messages generated in the last X mins.
But, there could be case where, some messages have older timestamps and were delivered to elastic after a delay.
So, what we want to do is, update / tag all documents that the watcher searched through and then have a condition in the watcher to not pick these tagged documents. Is this possible in watcher today?
you could store a document in an action, that stores the execution date of the current watch, and then access it when the watch runs the next time. In combination with a pipeline that adds the current date on index time this might help.
You would however still need to keep the refresh in mind in this setup, which might need to be executed before a watch runs, for that index.
Thanks for the reply Alexander!
We did give this solution a thought, but it wouldnt really address our use case of scanning the messages that were missed due to some glitch in logging pipeline that delivers messages to the elastic. We create one index per day and the watcher looks into the current day's index to match certain queries. The watcher will execute irrespective of whether it found an index or messages for the current date, hence the sliding window will always be forward looking if we were to use the watcher last execution time. What we want is to the watcher to be smart enough to know that, if X is the last message timestamp that it used for matching, it should start looking for messages starting from X timestamp.
May be thats not really possible. As an alternative:
Is it possible to update all the matched documents to add an additional field in the same index in place using the index action of the watcher?
So Alexander, I found this post which seems to accomplish what I am trying to do.
However, since the update_by_query has its own query [in my case match_all], we might have a window where the documents that have not been used in the watcher's search query will get tagged.
Again for this, we can use a sliding time window, but is it possible that due some glitch/issue/delay the documents wont end up getting tagged and pickedup again by the next run of the watcher.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.