How to handle lots of small indices

fillg1 · May 4, 2021, 11:19am

Hi folks
We are using Elasticsearch as a kind of backend for our application. Due to security and privacy reasons, we create a per user index which is opened/ closed by the application. This size of the index is small, ~1MB data or even less than that. Our node setup should be quite standard, 1 master with 3 data nodes 16GB Ram etc.
If we create lots of these small indices ~5000 we see the elastic search getting in trouble, with constantly garbage collections, org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, etc.
This happens even when our application idles, we see that all indices in the cluster are closed, so we assume that this has something todo with the number of indices, when the number of indices is low (~100) everything is working fine and fast. We are using elastic search 7.6.2 .
Any Ideas?

dadoonet · May 4, 2021, 12:08pm

Welcome!

The first thing to do is to upgrade to the latest version (7.12.1).

Then, I'd recommend not creating so many indices, which means so many shards, because it will consume too many resources (HEAP).
Instead, I'd use on single index for all users and use filtered aliases instead.

fillg1 · May 4, 2021, 1:31pm

Hi David
Thanks for your answer.
Having such a large number of indices ist a kind of customer requirement or architectural restriction. We are aware that opening and closing an index will take some time, but the application (the elasticsearch client) controls which index with user specific data will be needed.
Is there no way to reduce the heap memory usage of a closed index?
We see in the node data dir that the cumulated node data is usually below 1-2 GB for 1000 Indizes and this already causes some trouble in our test systems.

Christian_Dahlqvist · May 4, 2021, 1:41pm

It is not all about heap usage. One issue with very large number of indices is that the amount of data stored in the cluster state, e.g. mappings, shard locations etc, grows which can slow down cluster state updates and eventually become the bottleneck. This can be especially problematic if you use dynamic mappings which can result in a large number of cluster state updates.

Abhishek_Shinde · May 4, 2021, 4:29pm

Is there any limit on creating number of indexes on a single node?

Christian_Dahlqvist · May 4, 2021, 4:31pm

The default max number of shards per node is 1000. Are you using dynamic mappings or is the format for all users well understood and static?

fillg1 · May 4, 2021, 6:23pm

All our indices have the the format and its static

fillg1 · May 5, 2021, 12:10pm

Oh Sorry,
I just got informed by my colleagues that we are using dynamic mappings for some of our data. Are there any restrictions related to that?.

nukarak · May 5, 2021, 12:13pm

We also did some additional tests and found, that the index settings and the mappings of a closed index are still accessible via API and that it seems, that these still are loaded into memory?
Is this right ? Is this really needed ?
We were under the assumption, that the only information about a closed index needed by the master was what shards it has and where these are located.

system · June 2, 2021, 12:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Too Many Open Indices Elasticsearch	1	288	July 6, 2017
Open index management and segment memory exhaustion Elasticsearch	2	557	July 5, 2017
Maximum limit for index Elasticsearch	7	1123	July 6, 2017
How many indices can elasticsearch handle Elasticsearch	11	6099	July 5, 2017
How many indices in 0.13? Elasticsearch	2	349	July 6, 2017

How to handle lots of small indices

Related topics