We have different document structures/schema that we on-board into different indices. We have ~50 such indices, and one of our primary use cases is to perform search across all these document types i.e. across all 50 indices.
Data size within each index is ~10-20 GB, thus each index easily fits into a single shard.
I am looking for ways to optimize the performance in search across these 50 indices. We have a particular common field across all these indices which is available within a user's search request, and could be used for sharding within each index if we had more than one shards per index. Not sure if we could make use of it to somehow optimize the performance for this multi-index search, or any other alternate options.
Is it mapping conflicts that drive the use of separate indices per document type? If so, how does it work querying these fields across all indices?
If this is a convention rather than a requirement and most searches are filtering on the common field you are mentioning it might make sense to store all types in a single index with e.g. 30-50 primary shards and use routing to only target one shard per filtered query.
The multi-index search use-case is mostly a text search use case, such that any kind of field specific filtering/sorting requires user to narrow down into one schema-type i.e. index in our case. Though we will have requirements to assign weights for different fields within an index going forward in text search , where I think _index https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index-field.html might be able to help.
Searches within a schema i.e. within an index can have varied filters on various fields, apart from this common field which I mentioned could be used for sharding.
Apart from conflicts, another primary concern for storing these different schema in different indices, is to avoid mapping explosion with too many fields within an index. We have 50 different schema today, but it will continue to grow further.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.