Migration to ElasticSearch: Advices on how many shards, replicas and instance size

ViniciusSPaiva · May 30, 2018, 3:06pm

Hello there!

This is my first post on the forums and I'll be working with Elastic Search also for the first time, so bear with me a little

We have a cloud application that indexes and searches file contents (PDFs, Office, etc) and we currently use the AWS service, CloudSearch.

The service is good, as we don't really need to configure anything other than the fields but the price was starting to become really expensive. So we decided to move to the Elastic Search service also provided by AWS.

Our index size is approximately 30GB (ever growing) and the number of searches is currently small (a couple of dozens per day) but returns a lot of results (1000) without pagination.

So the first thing that is different is that I need to decide the number of nodes, shards, replicas, master nodes, instance types...

I saw some articles talking about a size of 50GB per shard, and that too many on a small instance can be very ineffective.

I was thinking of something like the setup below.
Any advice would be extremely helpful, my biggest doubt is about the number of shards.

Instance type: t2.medium (2 vCPU, 4 GiB) - As the number of searches are small, I was also thinking about a t2.small (1 vCPU, 2 GiB) initially.
Two nodes (one master, one replica)
Two shards (Because of the small instances, I thought too many shards would be bad)
One replica (For security and availability)

Thanks!!

dadoonet · May 30, 2018, 3:45pm

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is the only way to have access to X-Pack. Think about what is there yet like Security, Monitoring, Reporting and what is coming like Canvas, SQL...

ViniciusSPaiva · May 30, 2018, 5:36pm

I took a look on some articles, yes. But was still a little insecure, because you choose the quantity of shards and replicas in index creation and you can't change it afterwards, right?

About the cloud from Elastic, I didn't know about it. But it has a problem similar to CloudSearch: the storage is linked with the node size. When I increase the storage, my cluster CPU and memory also increases.

I would like to have the option to just increase the storage, because our index will grow indefinitely but our current use (searches) is small.

dadoonet · June 1, 2018, 9:34am

We now have the Shrink API and the Split API though.

But it has a problem similar to CloudSearch: the storage is linked with the node size. When I increase the storage, my cluster CPU and memory also increases.

This is going to change anytime soon. So you will be able in the near future to choose depending basically on your use case what kind of node you would prefer. I don't know the exact date for this though.

Wayne_Taylor · June 1, 2018, 1:02pm

Hi there,

Is that 30gb total as that seems tiny. I have some indexes that I ingest at 50gb per month and rotate them in that manner so index name is like index-yyyy-mm.

I'd recommend you look at your volume and determine the ingest size per day before deciding on your index strategy.

On elastic cloud I'd recommend we've used for over a year

system · June 29, 2018, 1:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recommended shard size and data node instance type in AWS Elasticsearch	2	1026	March 23, 2019
Proper shard and replica settings Elasticsearch	6	111	August 5, 2024
AWS Elasticsearch Best Practice for Indexes Elasticsearch	3	405	December 18, 2019
How many shards do I need to have? Elasticsearch	5	474	May 12, 2019
Trying to optimize Elasticsearch cluster Elasticsearch	3	963	February 20, 2017

Migration to ElasticSearch: Advices on how many shards, replicas and instance size

Related topics