Context Suggester across multiple indices with same schema

We're evaluating ElasticSearch for our use-case where customer can search over a data repository with multiple topics. Sometimes the user will select a topic to filter the results and sometimes he might want to query all topics.
Let's assume a topic is identified by a string which we can use as context.

Let's assume a topic is identified by a string which we can use as context.

We're going with separate index per topic.
I see lot of forum topics around multiple indices with different schema but din't find any reference for same schema.

I see lot of forum topics around multiple indices with different schema but din't find any reference for same schema.

  1. With multiple indices, does the context suggester natively support searching suggestions and sorting results correctly across all indices or sorting needs to be done at application level.

  2. Within a single topic, we have other contexts to filter by like username etc. However, we'd also want to be able to query without any context at all.
    I see that this capability has been deprecated in ES due to latency considerations. One workaround I was thinking of was to add a dummy context (like topicName) which is the same for every record in the index.
    Is there a more cleaner way to achieve this use-case ?
    Also, it forces me to use _msearch API because the suggest query to every index will be different since I have to provide separate topicId for query to every index.

  3. Does having multiple indices slow down queries considerably (due to routing concerns) compared to having a single large index ?
    We're choosing multiple indices because the size of repository for all topics can be hundreds of terabytes.

How many topics do you estimate you need to support?

Total number of topics in cluster (worst case) : 10000 (same as no. of indices)

However, we will limit the number of topics we query / suggest for user even if he chooses all topics, we wanted to set a limit of 25 - 50 topics but ideally would want to derive that based on performance as its obviously better to query as many topics as possible.

By the way, reg. point #2 in original post, I see 4 different ways of implementing it

(a) Have a dummy context which is same across all indices (like hard coded "ALL")

(b) Use topicName as context. This implies that when I submit _search to 25 indices (each with different topic), I'll have to mention the list of 25 topics as context.
This is cleaner than (a) but not sure whether multiple contexts can slow down the query compared to a single context.

(b.1) Use topicName as context. Use _msearch and mention 25 suggest queries as input.
This way, each suggest query will only have one context (one for that index).

(c) Have 2 suggesters for every index. One context based and one regular completion suggester.
This duplicates the "input" that is indexed in suggester. More so because I am generating all the word n-grams and indexing them in input.
Definitely don't want to duplicate this :slight_smile:
Want to check whether I can just have a common input field for both suggesters and reference that in both.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.