I have an index 'analytics', which contains a list of events ( for eg: CRUD) that occured over a period of time. I am looking to find a set of records that were added and deleted by primary key.
document structure:
id, key, event, timestamp
where key is primary key of record, event is 'create', 'read', 'delete', 'update'.
I want to find the list of primary keys that were both 'created' and 'deleted'. Basically an intersection of two sets ('created') and ('deleted') over the primary key.
This query first filters documents that are only create or delete events. Then it aggregates these documents by key. You want only those keys that have both these events, hence the min_doc_count value is 2.
You may want to tweak the size in the terms aggregation (set to 100 above) per your needs.
BTW, the above syntax works for Elasticsearch 2.x. For older versions of Elasticsearch, you will need to use the filtered query instead of the bool query but everything else will remain the same.
Generally I use |A intersect B| = |A| + |B| - | A U B|, but you have to be careful if you are doing cardinality aggregations, as you can get negative values.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.