Many indexes vs many shards

(Thiago Benvenuto) #1

Hi everyone,

I've been using Elasticsearch as a transactional data search engine and it's working well. But now I am designing the next iteration of our platform and I could use some help of more expert users.

Our searches are usually partitioned by country, this means that 95% of the searches are done with a country filter. We have some countries that are more searched than others. And we have some countries with much more data than others.

My main doubt is: would I benefit more by having more shards to a single index (since my data is transactional we don't use date-based indexes) and using routing or would I benefit more by having multiple indexes and use query with many indexes when necessary?

Here is a practical example:
I want to partition my data using the following logic: "us-ca", "us-fl", "us-others", "fr", "ca".
Then if I need to query entire word I would search with "index-*", if I need to search for us data: "index-us-*" and so on.

Why I am considering multiple indexes: option to use hot-warm strategy for load balancing the cluster and the option to break a big country in smaller chunks.

Why I am considering routing: it is easier to maintain a single index and the number of shards wouldn't grow that much.

Can some of you share your feedbacks?

Note: each "bucket" would have around 1M documents with almost entire document indexed. A document have around 30-40 properties and 4-5 big text fields.

(Junaid) #2


Splitting your indices by country is a natural fit for your transactional use case and I would recommend it. It provides you horizontal scaling opportunity and also you can use routing in the future perhaps on the basis of city or some other parameter as per requirements.

However, you don't necessarily have to create separate indices for all countries. I would suggest you to create an index (index-us) for countries with larger data and point an alias (alias-us) to the created index. You should always use aliases for reading/writing to your indices. For countries with less data, you can create a single index and point each country's alias to this index. In this way, you can save a lot of shard resources for countries with less data.

(Thiago Benvenuto) #3

Thanks for your feedback. I would to explore your suggestion. So in your opinion a good approach would be to split data in index per country (with the alias strategy to avoid having index for irrelevant countries) and also routing inside each country let's say per state?

With this approach I could split a country data (index) into smaller chunks (shards).

(Junaid) #4

Yes, using alias strategy you'll only create separate indices for countries with sufficient number of documents and avoid creating many small indices.

Routing I would consider as optional. I am not a big fan of routing because it tends to introduce imbalance across shards in an index.

BTW, I would recommend you to go through this link for shard sizing.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.