30 million documents in one index best hardware to use

Hi elasticsearch lovers,

I really appreciate the fact that these types of questions get asked a lot in the community and a lot of these get answered. I have been through a lot of similar threads and i learn something new every time. However the threads i came across aren't suitable for what i am trying to achieve, therefore decided to post my question here. Thanks a ton for your time in advance. I really appreciate this.

I would like to launch an ES cluster on AWS ec2 and struggling to choose the right instance, number of nodes and if the nodes should be separate instances on ec2 etc. My exact use case is as follows.

I will have just over 30 million documents in one index. This number won't change much. I will index the data once and perform searches. I'd like the search to be quick. Also the data isn't likely to grow, may be a 1000 a year hardly.

I know the recommendation is to run a cluster with at least 3 nodes, does this apply to my use case really? instead

  1. Can i go with a 2 node cluster?
  2. Do i need a separate ec2 instance for each of my nodes?
  3. Which instance type should i go for? currently i've been testing on a t3.small and have around 4 million documents indexed and it works just fine.

The search queries are not complex either. Simple query on one filed most of the time.

Thanks a lot again. Look forward to hearing from you guys!

Safe,
B/

This is recommended in order to get a highly available cluster. Elasticsearch is based on consensus and requires a majority of nodes to be available in order to elect a master.

A 2-node cluster will not be highly available, which means it will not serve updates or indexing of new documents if one of the nodes is not available.

Yes, at least if you are interested in resilience and high availability. If you are not interested in high availability you can just deploy a single node on a single instance.

This depends on the size of your indexed data and the query volumes you will need to serve.

Hi many thanks for the swift reply.

ok, i understand it all comes down to availability. So in my case as the size of the data will be around the 15gb max and as i mentioned it is highly unlikely to expand. Will a general purpose ec2 instance work in this case? like 2 x t.3 medium for my data nodes and perhaps a t3.large for the master/data node work? I am mindful about the costs to be honest. I'd like to keep it as low as possible to start with. I am assuming very little traffic the first year though. So will it be sensible to run smaller instances to start with and then scale later?

Regards,
B/

For high availability you will need 3 master-eligible nodes. The smallest footprint is probably 2 master/data nodes and a smaller dedicated master node that does not hold any data. The instance type to use will depend on the expected load and t3 instances can easily run out of CPU credits which can cause problems.

Ok, thats useful. many thanks for your time.

Adding that you can also easily run a full cluster in few minutes at cloud.elastic.co. It can run on the same region where your AWS instance is running.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.