Elastic search Design - Room for improvement

nages · May 22, 2020, 3:47am

We are starting designing a cluster and come up with the following optimal configuration . Please suggest if there is any scope for improvement or save some budget if it is over optimized.

100 fields - 1 MB per field(including inverted index) - 125 MB( to be safer side) per document Total - 4M documents corresponds to 500 GB - 25GB per shard - 20 shards - total 40 shards including ( r = 1) - we had seen somewhere a shard size of 25GB works well in most of the scenarios.
Also it seems heap max 32 GB per each shard RAM) ( jvm uses compressed pointers ) works well - which translates to 64 GB (rest 50% for FS cache) .So , considering 256 GB RAM - this translates 2 shards per machine (128GB) - this translates 20 Data nodes per cluster(2 shards per each data node ) , 3 master nodes (HA ) , 1 coordinating node

Please add your recommendations

Christian_Dahlqvist · May 22, 2020, 3:52am

Elasticsearch is generally not optimized for handling documents that large. Why are you documents that large? What do they contain? What is the use case?

nages · May 22, 2020, 4:01am

do you see document of size 1MB is large considering in and around 100 fields ? and ES is not optimized for thse cases ?

Christian_Dahlqvist · May 22, 2020, 4:06am

I would consider documents over a few MB in size to be quite large and you mentioned them potentially being tens of MB in size, which I suspect potentially could be problematic and difficult to work with.

This does sound like a quite unusual use case, so I would recommend you test and benchmark it to find out how well it works.

nages · May 22, 2020, 5:44am

sorry i mean 100 MB per document ( not per field) -

Christian_Dahlqvist · May 22, 2020, 5:47am

I was talking about document size, not field size.

What kind of data is this? What is the use case? How are you intending to search and use the results?

system · June 19, 2020, 5:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scaling an Elasticsearch cluster to >100 billion documents, 25k/sec indexing rate Elasticsearch	2	1274	July 6, 2017
Scaling Elasticsearch for 40GB of data Elasticsearch	5	1215	July 6, 2017
Design indexes with big data Elasticsearch	16	2279	July 31, 2020
Trying to optimize Elasticsearch cluster Elasticsearch	3	1027	February 20, 2017
How big can/should you scale Elasticsearch Elasticsearch	3	1656	July 6, 2017

Elastic search Design - Room for improvement

Related topics