Compound Indexes for User Based Data

For user-based-data for which the search query will always have some user reference(eg. user-id) and additional query params is there a way to maintain compound indexes in elasticsearch to reduce the search space.

For our use case where there are millions of records for every user. We have created userId-mod based indexes to created buckets for users and added routing on user-id to restrict search queries to a single shard on elasticsearch for a user. Typical queries will always have user-id as a param followed with 2 or more filters.
eg. user_id = 12345, item_purchase = 'Sony Headphones', item_store = 'Chroma'

AFAIK, the segments for each - i.e user_id, item_purchase and item_store will be searched independently based on a optimised order followed by a skip list merge.
The search space for the fields(eg item_purchase), which could be of low cardinality, would span across user-ids and would pollute the search space adding many more docIds than could possibly match.

Is there a way this can be handled better in elasticsearch. possible any options for compound indexes, where the key in the internal posting list has reference of the user-id as well.

I looked into the documentation and found a documentation touching upon this problem but couldnt get a solution for many users with many records and low cardinality fields -
https://www.elastic.co/guide/en/elasticsearch/guide/current/user-based.html

You should check out the lucene docs, as Lucene is already doing some optimizations around that (and more to come in future versions).

Also, check out index time sorting, see https://www.elastic.co/guide/en/elasticsearch/reference/6.0/index-modules-index-sorting.html

Thanks @spinscale for sharing this info.

  1. Is there any documentation on the effort on adding compound indexes in lucene and also its adoption in elasticsearch? Will keep a check on the progress.

  2. The sorted index feature is in v6.0. What is the timeline to release the stable GA version?

Just FYI: we keep this notice there mainly to give the feature some time to stabilize and especially to make sure that the API is what we need - which allows us to change the API more often.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.