I have log 'analytics', which contains a list of events of process logs ( for eg: CRUD) that occured over a period of time. I am looking to find a set of records that were record added but not deleted from system.
document structure:
id, process_id, event, timestamp
where process_id is primary key of record, process events are 'create', 'read', 'delete', 'update'.
This is not easily possible as it requires correlating multiple documents which is essentially a join, something which isn't possible in Elasticsearch.
A way around this I've seen is to ingest the documents twice, once for a "event index" (the way you do it right now) and once as a "state index", using the process_id as the document id so it will always contain the most recent event. Then you can use a regular filter to search for the current state.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.