Hi is there any guide line for above topic, I've read many posts and people are saying different thing.
some says each shard size is better not to exceed heap size which is around 30 to 32gb and
some says not to exceed 50GB per shard
if that is the case, if my daily injection for index A is 600GB (currently i configure to use 4 shards), based on above guideline, i will need to use 12 to 20 shards per index!! but I only have 4 nodes in my cluster. will that be too much for a 4 node cluster?
how can I achieve a good balance among the number of node, shard count and shard size to maximize my elk cluster performance?
You do need to make sure you don't have lots of shards.>
hmm..how many is considered lots of? what is the limit?
from below article it says 1.5 to 3 times of node no., for my case, i can have only 12 shards (3*4)...
Yeah, which doesn't really make a lot of sense does it
Honestly, with time series data, we see anywhere from a few shards to hundreds (or even thousands). When you start getting past 500 shards per ~32GB heap you start running into major issues with the resources required to maintain those shards simply taking more than the actual querying and analysis of the data inside them.
The optimal number and size of shards depend a lot on the use-case, data, queries and query patterns. Elasticsearch can be used for a lot of different types of use cases, and they all require different types of optimisations. The recommendations you quote are not generally applicable, and I would guess they might be more applicable to search use-cases. For logging and analytics use-cases where time-based indices are used we generally see considerably larger number of shards per node as larger data volumes are involved. In order to find the ideal size and number of shards for a cluster, we generally recommend perform benchmarks with real data and queries in order to size the cluster.
The optimal number and size of shards depend a lot on the use-case, data, queries and query patterns. Elasticsearch can be used for a lot of different types of use cases, and they all require different types of optimisations. The recommendations you quote are not generally applicable, and I would guess they might be more applicable to search use-cases. For logging and analytics use-cases where time-based indices are used we generally see considerably larger number of shards per node as larger data volumes are involved. In order to find the ideal size and number of shards for a cluster, we generally recommend perform benchmarks with real data and queries in order to size the cluster.
hi Christian, thanks for the reply.
our use case is we inject all different kind of application log files using ELK and build dashboard to show the trending (time series data) and we do have our in-house alerting system integrated with elasticsearch with real-time query and alerting. to conclude, we need the data to be available as real-time as possible.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.