I want to use Elasticsearch to index event data and as events come in, I want to append to an "events" array in the document. Say it ends up looking like this:
To avoid "cross matching" of the dates for foo and bar you'll need to use the nested data type for the events.
I've not yet figured out how to trim results to those with 2 different events but this should at least avoid confusing matches on the dates of your "foo" events with those of "bar" events::
The use of inner_hits prevents you from seeing non-matching events. Your client would need to trim results that only matched one event,
Having said that, what can be hard to match at query time with searches/aggregations can be trivial at index time using an update script - see entity centric indexing
Useful attributes like an "outage" range field can be derived by a script recognising the 2 event types of interest and storing a derived value when those events come in as an update.
Thanks @Mark_Harwood. I watched the video about entity centric indexing and it looks good and I will try that approach.
I think I can index individual events as they occur (the "doc" will be the event) into an "events" index. To find users that have done "foo exactly twice" (like in my original post), I can periodically update a secondary "user" index and a user doc, that has a "didFooTwice" boolean. Then a simple bool query can get me what I want.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.