Elasticsearch index policy creation best practice/performance

He folks,

I am designing a search system based on ElasticSearch, after reading a lot I have seen that some systems such as logs use a policy of multiple indexes to save the same content, similar to mylogs-12-02-2020 and are creating an index by day, then to search, they perform the searches in all the indices that comply with the mylogs- * pattern, each of those indices has its primary shards and replicas. My question would be regarding the performance of the searches, which would be more performant to look at an index of 5 million documents, with n shards or look for 50 indexes of 100,000 documents. Does anyone have any experience of the best practice to follow?

I am assuming that my system will have an approximate growth of 200,000 documents per day.

What is the best practice, separate in multiple indexes or have a single index with several primary shards in different nodes (so that they do not compete for the same resources when searching / indexing)?

When doing a search on mylogs-* elastic does it parallel to the indexes and within each index in its shards?

"Too many shards" is a common performance bottleneck, so be careful not to fall into that trap. There are some recommendations about sizing your shards correctly in this article:

Also consider using ILM, specifically rollover, instead of daily indices. If you are indexing just 200k documents per day then a correctly-sized single-shard index might contain many weeks of data.

It depends a little bit on exactly what your documents are and what kind of searches you're performing, but in general I would expect the performance sweet spot to be much closer to "1 shard of 5M documents" than "50 shards of 100k documents". Even 5M documents still sounds like a very small shard to me.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.