I have a forum and I need to provide full text search across all topics user has access to (global search) or in particular topic(s). Default option is a global search and it's a problem in a long run.
- forum can have unlimited number of topics
- topic can have up to 5k users
- user can have access to any number of topics
- read access is based on a membership. Unique
[UserA, topicA] pair tells if userA can search in the topicA
All topics data is stored in one index
topics. Routing key = topicId. Index rollover strategy - 50M docs threshold to spawn new index (will be time based 1m\2w soon depending on forum activity growth)
Querying in few topics by topics IDs is not a problem - routing key works fine. Global search causes issues.
Currently global search is done as follows:
- query user's topics list (separate index, sharded by userId). No issues here
- run full text search where
topics IN  (terms query)
- sorting is done by topics messages created date and docs score isn't calculated
- users wants to see total count of matches
Issues & ideas:
- it becomes slow
- cluster has a limit to 2k terms query (can't search across all topics). Split query into multiple queries to cluster and do aggregation application side?
- can't use routing key. It's useless in such scenario.
- total count can't be calculated on all corpus of docs going forward because data volume becomes bigger and bigger every day. It's tricky. Add heuristics to approximately calculate total count? Similar what google search does
- perhaps introduce date ranges and allow searching only in the last 2-3 years of data?
I hope it make sense...