Time/Order based query


#1

I am using elasticsearch to import log events. Suppose an event X occurs, followed by event Y. Is there a query that can be performed that will find all documents that match an X event followed by a Y event using the Query DSL? Could aggregations be used, or is this kind of query not supported in elasticsearch?

Thanks.


(Zachary Tong) #2

At the moment, these types of queries are tough for Elasticsearch. Event X may be located on one shard, while Event Y may be on a different shard. Matching/sorting based on sequential causality would mean that both shards (on potentially different nodes) would have to coordinate their actions and communicate, which could be very expensive.

You might be able to accomplish something similar with the new pipeline aggregations, but not likely. These aggs work on the results of other aggregations (e.g. they operate on buckets, not documents), so you'd only be able to calculate stats on the sampled buckets.

You'll probably have better luck by designing some kind of "entity-centric indexing" scheme, where you save the sequential relationship in an "entity" and use that document to determine matches. Mark Harwood has a few presentations on the subject:


#3

Thanks for your reply. I had a feeling that it wouldn't be easy, if at all possible.


(system) #4