Many indexes vs many shards

brazabr · June 19, 2018, 11:15pm

Hi everyone,

I've been using Elasticsearch as a transactional data search engine and it's working well. But now I am designing the next iteration of our platform and I could use some help of more expert users.

Our searches are usually partitioned by country, this means that 95% of the searches are done with a country filter. We have some countries that are more searched than others. And we have some countries with much more data than others.

My main doubt is: would I benefit more by having more shards to a single index (since my data is transactional we don't use date-based indexes) and using routing or would I benefit more by having multiple indexes and use query with many indexes when necessary?

Here is a practical example:
I want to partition my data using the following logic: "us-ca", "us-fl", "us-others", "fr", "ca".
Then if I need to query entire word I would search with "index-*", if I need to search for us data: "index-us-*" and so on.

Why I am considering multiple indexes: option to use hot-warm strategy for load balancing the cluster and the option to break a big country in smaller chunks.

Why I am considering routing: it is easier to maintain a single index and the number of shards wouldn't grow that much.

Can some of you share your feedbacks?

Note: each "bucket" would have around 1M documents with almost entire document indexed. A document have around 30-40 properties and 4-5 big text fields.

mjunaidmuzammil · June 20, 2018, 7:10am

Hi,

Splitting your indices by country is a natural fit for your transactional use case and I would recommend it. It provides you horizontal scaling opportunity and also you can use routing in the future perhaps on the basis of city or some other parameter as per requirements.

However, you don't necessarily have to create separate indices for all countries. I would suggest you to create an index (index-us) for countries with larger data and point an alias (alias-us) to the created index. You should always use aliases for reading/writing to your indices. For countries with less data, you can create a single index and point each country's alias to this index. In this way, you can save a lot of shard resources for countries with less data.

brazabr · June 20, 2018, 2:16pm

Thanks for your feedback. I would to explore your suggestion. So in your opinion a good approach would be to split data in index per country (with the alias strategy to avoid having index for irrelevant countries) and also routing inside each country let's say per state?

With this approach I could split a country data (index) into smaller chunks (shards).

mjunaidmuzammil · June 20, 2018, 5:42pm

Yes, using alias strategy you'll only create separate indices for countries with sufficient number of documents and avoid creating many small indices.

Routing I would consider as optional. I am not a big fan of routing because it tends to introduce imbalance across shards in an index.

BTW, I would recommend you to go through this link for shard sizing.

system · July 18, 2018, 5:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards/Routing Design for my use case Elasticsearch	1	348	July 6, 2017
How to distribute documents across shards equally, using _routing in Elasticsearch Elasticsearch	10	630	April 25, 2022
Is it effective to divide data into multiple indexes instead of multiple shards/nodes? Elasticsearch	3	448	August 23, 2020
Sharding and routing, need advices Elasticsearch	1	310	July 6, 2017
If you have capacity, is 1 index with 5 shards better than 5 indices with 1 shard each? Elasticsearch	7	133	May 1, 2024

Many indexes vs many shards

Related topics