Aggregation after dedup across indices

wmcdonald · May 15, 2018, 11:23pm

I have the following setup filebeats -> logstash -> elasticsearch with logstash output {
elasticsearch {
index => "idx-%{+YYYY-MM-dd}"
document_id => "%{myDocID}"
}
where myDocID is created in the filter section by combining the filename and a itemNum fields from the event. An alias called idx-all is created that includes every idx-*.

If filebeats is watching file foo which has 2 lines on monday, containing
001 Status Fail
002 Status Success
Then ES creates 2 docs in the idx-2018-05-14 with ids (foo.001 and foo.002).

I have an aggregation that wants to count the number of different statuses, so it uses idx-all as the index. The agg should return:
Success = 1
Fail = 1

On Tuesday, the foo file gets another line:
001 Status Success
And EX creates 1 doc in the tuesday index idx-2018-05-15 with id (foo.001).

The agg run against idx-all we would like to now be:
Success = 2
Fail = 0
However, since the 001 doc from monday still exists and we're agg using the alias idx-all, I'll get 001 from both days and the count will be:
Success = 2
Fail = 1

I have 10K records on average so I'd prefer to do this in one ES query rather than pulling things back in Java and processing there. I have the agg counting query, and a different query using aggs to try and remove the older duplicates but each item (eg 001, 002) ends up in its own bucket. I don't know how to put these together into a single query, if possible.

What would the single query look like that would first de-dup across indices and then agg across that result set?

system · June 12, 2018, 11:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deduplication filter? Elasticsearch	4	4815	July 6, 2017
Aggregations across multiple indices Elasticsearch	3	6148	July 6, 2017
Aggregation count unique values Elasticsearch	5	11714	March 13, 2018
Indexing same documents with different timestamps and aggregate on single ones Elasticsearch	1	648	May 27, 2019
How to deduplicate and perform aggregations using single Elastic search query? Elasticsearch	1	1729	May 22, 2020

Aggregation after dedup across indices

Related topics