_routing keys limit?

(Yandooo) #1


doesn't say anything about _routing keys limit that can be passed in the query? I suspect it's limited by URL length? What if number _routing keys is too big - does it mean one should use application side aggregation issuing multiple queries to ES? Sounds complicated...

(Christian Dahlqvist) #2

Routing allows you to ensure that related documents are located on the same shard within an index. If you are only querying these related documents you can supply a single routing key to only hit the appropriate shard within that index. If you are querying many groups of related documents, you issue the query without routing parameter so that it hits all shards rather than submit multiple routing keys.

(Yandooo) #3

HI @Christian_Dahlqvist

thanks for the quick answer.

let's assume we have unlimited number of groups. User can be a member of any group (ACL). Data is shared by groupId. Obviously querying 1 single group passing _routing is super fast. Is it still optimal to query 2 groups, 3 groups using appropriate _routing keys? In theory it's still ok to continue passing routing keys as long as number of keys less than number ES cluster nodes\shards.

(Christian Dahlqvist) #4

I guess it depends on how many shards you have per index. What does your use case look like?

(Yandooo) #5

I have a forum and I need to provide full text search across all topics user has access to (global search) or in particular topic(s). Default option is a global search and it's a problem in a long run.

  • forum can have unlimited number of topics
  • topic can have up to 5k users
  • user can have access to any number of topics
  • read access is based on a membership. Unique [UserA, topicA] pair tells if userA can search in the topicA

All topics data is stored in one index topics. Routing key = topicId. Index rollover strategy - 50M docs threshold to spawn new index (will be time based 1m\2w soon depending on forum activity growth)

Querying in few topics by topics IDs is not a problem - routing key works fine. Global search causes issues.

Currently global search is done as follows:

  • query user's topics list (separate index, sharded by userId). No issues here
  • run full text search where topics IN [] (terms query)
  • sorting is done by topics messages created date and docs score isn't calculated
  • users wants to see total count of matches

Issues & ideas:

  • it becomes slow
  • cluster has a limit to 2k terms query (can't search across all topics). Split query into multiple queries to cluster and do aggregation application side?
  • can't use routing key. It's useless in such scenario.
  • total count can't be calculated on all corpus of docs going forward because data volume becomes bigger and bigger every day. It's tricky. Add heuristics to approximately calculate total count? Similar what google search does
  • perhaps introduce date ranges and allow searching only in the last 2-3 years of data?

I hope it make sense...

(Christian Dahlqvist) #6

I am not sure what is the best way to model this, but it does not instinctively sound like the use of routing has any place here.

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.