You can't do that without building another view of your data, ie "entity centric indexing".
To do this you'd build a document that represents all actions (eg you'd convert an index where each document represents a single click to a new index where each document represents a whole user session, with each click stored in the document as a nested object)
Hey thanks for the entity-centric indexing technique. It is almost what I was looking for.
However if the query is really large and complex (like A->B->......->F->C->N) [around a size of 1500], would it be effective to create entity centric indexes? Also is it efficient/useful if I create the entity centric indexes while querying (i.e not create the entity centric indexes periodically as done by mark in the video)?
Generally the issue entity-centric is tackling is joining related data and it does so by shifting the costs involved from query time to index time.
If the key you join the data on has many unique values or the business logic in any derived properties is complex [1] then generally you will need to look at doing this to avoid overly-long or complex queries.
It would certainly be simpler to search for an indexed token that was ABFCN...
When I said "periodically" I did not necessarily mean overnight consolidation jobs. The update job could be run every second to patch in just the latest events. Think about your browser loading this web page - it is a flurry of activity involving many individual requests to get HTML, CSS, javascript, images etc. I wouldn't rush to update your entity-centric websession document upon receipt of every individual log record containing your session cookie. I could hang back just a second and perform only one update to your session doc with a batch of maybe 20 log file entries pertaining to your latest activity. This would save 19 Lucene updates. I think of it more as "micro-batching". Of course the sensible duration of a batch will depend on the nature of your system.
Cheers
Mark
[1] For a car the distance-driven-while-failed is a property derived from the difference of the mileage reported on the first test result failure on a car followed by the mileage on all test results up to and including a subsequent "pass" test result.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.