Just in case anyone is wondering what i did with my cluster setup i will briefly describe what i went with and hopefully it might help someone that may be searching for the same type of answer.
I went with 3 master and 3 data nodes.
My cluster is a mix of read and write but i need my reads to be as fast as possible.
Considering this i went with with 2 replicas and that seems to have helped a lot in my case considering the following hardware I choose.
I'm hosting my cluster on AWS since i got a decent amount of credits to start with there and the hardware i choose is this:
- 3 t3.medium master nodes
- 3 c5.xlarge data nodes
The master nodes are 2 CPU and 4GB of RAM each where half of the RAM(2GB) is dedicated to jvm heap size.
The data nodes are 4CPU and 8GB of RAM each where like the master nodes half or the RAM(4GB) is dedicated to jvm heap size.
One thing i also seemed to need was more CPU or faster ones as highlighting was taxing the data nodes with CPU demands (hence why i went with amazons c5 instance). My reads could go from 2 read per second till about 10-12 reads per second on the indexes. The reads are actual full sentences like the one I'm writing right now and not just a keyword or tags and such and they are searched on booth the indexes where they can take up to 10 seconds for one request to return a response. Moving forward it seems like CPU count is where i will have to invest in order to reduce the time of these request (i did some testing with 8CPU and 16GB RAM data nodes to confirm this, but due to budget constrains i cannot keep going with that setup)
RAM doesn't seem to be my main concern at the moment based on my testing but in the future when i might have to reindex the indexes for more primary shards RAM will probably be my main concern.
Also thank you @dadoonet for the links provided. They helped me understand that i was not looking at the big picture here . Although I'm not applying them right now, the references you provided will definitely help me in the future.