Migration to ElasticSearch: Advices on how many shards, replicas and instance size

Hello there!

This is my first post on the forums and I'll be working with Elastic Search also for the first time, so bear with me a little :slight_smile:

We have a cloud application that indexes and searches file contents (PDFs, Office, etc) and we currently use the AWS service, CloudSearch.

The service is good, as we don't really need to configure anything other than the fields but the price was starting to become really expensive. So we decided to move to the Elastic Search service also provided by AWS.

Our index size is approximately 30GB (ever growing) and the number of searches is currently small (a couple of dozens per day) but returns a lot of results (1000) without pagination.

So the first thing that is different is that I need to decide the number of nodes, shards, replicas, master nodes, instance types... :scream:

I saw some articles talking about a size of 50GB per shard, and that too many on a small instance can be very ineffective.

I was thinking of something like the setup below.
Any advice would be extremely helpful, my biggest doubt is about the number of shards.

  • Instance type: t2.medium (2 vCPU, 4 GiB) - As the number of searches are small, I was also thinking about a t2.small (1 vCPU, 2 GiB) initially.
  • Two nodes (one master, one replica)
  • Two shards (Because of the small instances, I thought too many shards would be bad)
  • One replica (For security and availability)

Thanks!!

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is the only way to have access to X-Pack. Think about what is there yet like Security, Monitoring, Reporting and what is coming like Canvas, SQL...

I took a look on some articles, yes. But was still a little insecure, because you choose the quantity of shards and replicas in index creation and you can't change it afterwards, right?

About the cloud from Elastic, I didn't know about it. But it has a problem similar to CloudSearch: the storage is linked with the node size. When I increase the storage, my cluster CPU and memory also increases.

I would like to have the option to just increase the storage, because our index will grow indefinitely but our current use (searches) is small.

We now have the Shrink API and the Split API though.

But it has a problem similar to CloudSearch: the storage is linked with the node size. When I increase the storage, my cluster CPU and memory also increases.

This is going to change anytime soon. So you will be able in the near future to choose depending basically on your use case what kind of node you would prefer. I don't know the exact date for this though.

Hi there,

Is that 30gb total as that seems tiny. I have some indexes that I ingest at 50gb per month and rotate them in that manner so index name is like index-yyyy-mm.

I'd recommend you look at your volume and determine the ingest size per day before deciding on your index strategy.

On elastic cloud I'd recommend we've used for over a year

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.