Is it effective to divide data into multiple indexes instead of multiple shards/nodes?

I have ~100GB data and would like to build a search application with the data.
The each document has a field like category id, besides ordinal text fields.
Users usually filter the documents specifying the category ids.

In this case, are there any benefits (such as improvement in performance) in dividing the data into multiple indexes by the category ids (like index_category_1, index_category_2,...) ?

When users specify a category ids, the application searches only from the indexes with the ids.

As another question, does the search performance get worse when the entire data size of a single index gets too large?
(If so, we have to consider dividing data into multiple indexes anyway)

I think you could see a performance boost when you split your indices in lets say its 5 categorys because you would only need to search through 100GB/5=20GB worth of data. So this could be a good idea.
For your other question: Index size isn't the most important factor when it comes to speed. Shardsize is. You should set up your index so that each shard has 20-50GB of data and not more.

1 Like

Generally, search speed is best when there are fewer shards.
Realistically it depends on many factors. If this is important to your use case, then you would be best off testing different approaches.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.