How much minimum ram is required for these indicies?

I'm having trouble figuring out how much ram i need per data node to accommodate my indexes.
I search in all these indexes at once and i use highlighting when searching so that probably adds to the memory footprint.

At the moment i have 2 indexes :

Index 1: Holds about 230 GB on the primary shards(50.7 milion documents) and 661.3 GB total data(primary + replicas) of data spread among 5 primary shards and 2 replicas (15 total shards)
Index 2 : Holds about 17.1 GB on the primary shard(323.8k documents) and 51.2 GB total data(primary + replicas) of data spread among 5 primary shards and 2 replicas (15 total shards)

I don't really know how much minimum RAM is required for this setup to run optimally or sub-optimally(budget is llimited)

I plan on continuing with a 3 data node and 3 master node cluster.

My main mystery is how much RAM i should provision on the data nodes.

Thanks in advance.

Welcome!

17gb of data: I'd probably try first with only one shard for the second index.
Then, if you are budget limited, I'd just use 3 nodes in total instead of 6 machines. And because it's limited, I'd define only one replica.

For HEAP, well it always depends.

My advice would be to try on cloud.elastic.co with only primary shards.
You have a 14 days free offer to test it. That'd probably give you a good idea.
Then if you are going to buy your own hardware instead of using a ready to use and scalable offer like cloud.elastic.co, use what you found out with this to determine what are exactly your needs.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

Just in case anyone is wondering what i did with my cluster setup i will briefly describe what i went with and hopefully it might help someone that may be searching for the same type of answer.

I went with 3 master and 3 data nodes.

My cluster is a mix of read and write but i need my reads to be as fast as possible.
Considering this i went with with 2 replicas and that seems to have helped a lot in my case considering the following hardware I choose.

I'm hosting my cluster on AWS since i got a decent amount of credits to start with there and the hardware i choose is this:

  • 3 t3.medium master nodes
  • 3 c5.xlarge data nodes

The master nodes are 2 CPU and 4GB of RAM each where half of the RAM(2GB) is dedicated to jvm heap size.

The data nodes are 4CPU and 8GB of RAM each where like the master nodes half or the RAM(4GB) is dedicated to jvm heap size.

One thing i also seemed to need was more CPU or faster ones as highlighting was taxing the data nodes with CPU demands (hence why i went with amazons c5 instance). My reads could go from 2 read per second till about 10-12 reads per second on the indexes. The reads are actual full sentences like the one I'm writing right now and not just a keyword or tags and such and they are searched on booth the indexes where they can take up to 10 seconds for one request to return a response. Moving forward it seems like CPU count is where i will have to invest in order to reduce the time of these request (i did some testing with 8CPU and 16GB RAM data nodes to confirm this, but due to budget constrains i cannot keep going with that setup)

RAM doesn't seem to be my main concern at the moment based on my testing but in the future when i might have to reindex the indexes for more primary shards RAM will probably be my main concern.

Also thank you @dadoonet for the links provided. They helped me understand that i was not looking at the big picture here . Although I'm not applying them right now, the references you provided will definitely help me in the future.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.