Hi,
We use Elasticsearch(7.x) to store application/system logs and our indices are time based where new index is created everyday. We have a requirment where different indices are created for every log types and also for every diff application per day. This results in too many indices created everyday and with having retention period of 15 days, this is increasing the default cluster.max_shards_per_node value of 1000. Each index is not too big in size( few in gbs/mbs/kbs also)
Can you please suggest us can we safely exceed this limit by increasing any env parameters like JVM. Any guidance on memory usage based on shards will be very helpful.
We run this higher, like 2-3K for the same daily index reason without issues, but does depend on having enough RAM (i.e. don't do it with a 1GB heap), and watching for any issues or circuit breakers. Depends on your loads, queries, etc.
Note you can freeze indexes to save RAM, and we also just close indexes to retain but not use them unless needed, letting us keep months of very rarely-used data (just uses disk space).
We use 2GB jvm for data nodes and 1GB for master nodes.
Mark, we have only one shard per index. Because of the requirement having indices created everyday, i'm afraid we won't be allowed to shrink the indices.
Can you please let us know if there is any high level analogy which we can create about the size of shards to the memory that will be required.
I did look at a blog where they suggest to have 1GB for around 20 Shards (where each shard has around 20 to 40GB data).
While in our case the shards size is low
Any help or guidance on the dimensioning would be very much appreciated.
Having lots of small shards is very inefficient and can cause a large cluster state as well as performance issues in addition to heap pressure. If you have that little heap I would recommend limiting the shard count as outlined in the blog post by revisiting your requirements or add resources to your cluster. These limitations and guidelines are there for a reason. I have numerous times seen users overshard clusters to the point where they can no longer operate or be fixed, resulting in data loss.
The initial setting of the resource is 2GB for data node. But we are ok to increase the memory 2x or 4x etc . But we just want to know if there is any high level mapping between the memory and shards data in a node.
Agreed - we run a big system this way with dozens of daily indexes and you can't scale it for any length of time beyond a couple of months - much better to use ILM (which wasn't available to us), or at least move to monthly indexes, one per service/type, etc. And we run with 8-16GB Heaps; on 2GB you can't really do any sizable counts - our challenge was modifying all the clients to use an alias or ILM stuff, but bit the bullet and do it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.