I've been using Elasticsearch as a transactional data search engine and it's working well. But now I am designing the next iteration of our platform and I could use some help of more expert users.
Our searches are usually partitioned by country, this means that 95% of the searches are done with a country filter. We have some countries that are more searched than others. And we have some countries with much more data than others.
My main doubt is: would I benefit more by having more shards to a single index (since my data is transactional we don't use date-based indexes) and using routing or would I benefit more by having multiple indexes and use query with many indexes when necessary?
Here is a practical example:
I want to partition my data using the following logic: "us-ca", "us-fl", "us-others", "fr", "ca".
Then if I need to query entire word I would search with "index-*", if I need to search for us data: "index-us-*" and so on.
Why I am considering multiple indexes: option to use hot-warm strategy for load balancing the cluster and the option to break a big country in smaller chunks.
Why I am considering routing: it is easier to maintain a single index and the number of shards wouldn't grow that much.
Can some of you share your feedbacks?
Note: each "bucket" would have around 1M documents with almost entire document indexed. A document have around 30-40 properties and 4-5 big text fields.