What should be the number of shards?

Christian_Dahlqvist · November 24, 2016, 7:47am

Lets include the separate thread where you provided cluster details here: Configuration of elasticsearch in production environment

Based on this it looks like you are wasting a lot of powerful hardware on dedicated master and client nodes that could better be used as additional data nodes. The specification for the master nodes is overkill as they are expected to do little work. 2 CPU cores and 4GB RAM (3GB heap) is sufficient even for reasonably large clusters. As Nik pointed out, dedicated client nodes are also not necessarily required in many log analytics use cases. I would suspect you would benefit from having 3 smaller master nodes and more data nodes.

When doing heavy indexing in Elasticsearch CPU and RAM are important, but it is often the latency and throughput of the storage that can limit performance. Based on your description it is difficult to tell whether it is the indexing layer or Elasticsearch itself that is the bottleneck. Look at the cluster when it is indexing and try to identify what is limiting performance, e.g. CPU or long IO wait. If nothing stands out, try measuring the performance of your indexing pipeline without writing to Elasticsearch. If the throughput increases when removing Elasticsearch from the equation, it is Elasticsearch that is the bottleneck.

Look at the following video for a discussion around sizing and benchmarking.

Topic		Replies	Views
Sharding Strategy Elasticsearch	18	4346	November 22, 2017
Too big a shard vs Too many shards Elasticsearch	7	37185	March 22, 2017
Balance between number of indices and shards per index Elasticsearch	2	454	July 6, 2017
Shards per node question Elasticsearch	3	743	September 12, 2017
Shard Recommendation for Elasticsearch Elasticsearch	4	320	July 6, 2017

What should be the number of shards?

Related topics