Hello! I have an index with a lot of documents and I need to group these documents by an specific field and after this I need to compare if the two last documents (with the most recent timestamp) from each group have a specific field with the same value. Something like this:
Group 1 {
doc1: {"status": 'SUCCESS'},
doc2 {"status": 'SUCCESS'}}
Group 2 {
doc3: {"status": 'FAILED'},
doc4: {"status": 'SUCCESS'}}
Group 3 {
doc5: {"status": 'FAILED'},
doc6: {"status": 'FAILED'}}
- doc1, doc3 and doc5 would be the documents with the most recent timestamp in the corresponding group.
- doc2, doc4 and doc6 would be the documents with the second most recent timestamp in the corresponding group.
The query can be done (as far as I know) using one of the two following options:
- two aggregations (first a term aggregation and inside this first term aggregation create a new aggregation using the top hits command for obtain the two documents with the most recent timestamp).
- a collapse for grouping the documents and inside the collapse using a inner_hits for obtaining the two documents with the most recent timestamp.
My question is: once done this (using aggs or collapse), can I compare for each group of documents if an specific field of both documents differs (then my elasticsearch should return both documents or only the last document) or not (then my elasticsearch should skip these documents)?
In my specific case (following my example above with groups and docs), I have a status field for each document with the value SUCCESS or FAILED, and I need to know if a status change happened or not, so the following actions should be done by the elasticsearch query:
- If the values of the doc.status for each group are different from each other (in the example would be doc3 with doc4), I should return only the last document for that group.
- If the values of the doc.status for each group are equals from each other (in the example would be doc1 with doc2, and doc5 with doc6), I should skip, omit or remove these documents from the elasticsearch query.
The idea would be to have only the last documents with a status change, and the logic should be done in the elasticsearch query for using in a Elastic rule with notification purposes (we should avoid watchers and transforms for doing this).
Can someone help me with this problem?
Many thanks.
Juan M.