We are using Elasticsearch as a kind of backend for our application. Due to security and privacy reasons, we create a per user index which is opened/ closed by the application. This size of the index is small, ~1MB data or even less than that. Our node setup should be quite standard, 1 master with 3 data nodes 16GB Ram etc.
If we create lots of these small indices ~5000 we see the elastic search getting in trouble, with constantly garbage collections, org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, etc.
This happens even when our application idles, we see that all indices in the cluster are closed, so we assume that this has something todo with the number of indices, when the number of indices is low (~100) everything is working fine and fast. We are using elastic search 7.6.2 .
The first thing to do is to upgrade to the latest version (7.12.1).
Then, I'd recommend not creating so many indices, which means so many shards, because it will consume too many resources (HEAP).
Instead, I'd use on single index for all users and use filtered aliases instead.
Thanks for your answer.
Having such a large number of indices ist a kind of customer requirement or architectural restriction. We are aware that opening and closing an index will take some time, but the application (the elasticsearch client) controls which index with user specific data will be needed.
Is there no way to reduce the heap memory usage of a closed index?
We see in the node data dir that the cumulated node data is usually below 1-2 GB for 1000 Indizes and this already causes some trouble in our test systems.
It is not all about heap usage. One issue with very large number of indices is that the amount of data stored in the cluster state, e.g. mappings, shard locations etc, grows which can slow down cluster state updates and eventually become the bottleneck. This can be especially problematic if you use dynamic mappings which can result in a large number of cluster state updates.
Is there any limit on creating number of indexes on a single node?
The default max number of shards per node is 1000. Are you using dynamic mappings or is the format for all users well understood and static?
All our indices have the the format and its static
I just got informed by my colleagues that we are using dynamic mappings for some of our data. Are there any restrictions related to that?.
We also did some additional tests and found, that the index settings and the mappings of a closed index are still accessible via API and that it seems, that these still are loaded into memory?
Is this right ? Is this really needed ?
We were under the assumption, that the only information about a closed index needed by the master was what shards it has and where these are located.