IP Optimized Instance for master node?


Our cluster setup is going to have ~8-10 data nodes (r3,x2 memory optimized EC2 instances), each having ~2000 shards (per data node) and ~50,000 filtered aliases. What is the recommended configuration for master nodes?

Since we are going to have large # of of shards per data nodes and having lot of filtered aliases, does it makes sense to use IO optimized master nodes in EC2?

In addition, are there any issues with using this much filtered aliases in a cluster?

(Mark Walkom) #2

My first question is why so many shards? Chances are you will run into problems with this before any potential master sizing ones.


Our use-cases are such that we can't combine data from different sources in an indexed if the data source is generating significant amount of data. If we combine them, we bear the risk of making individual shards very large.

For sources which are generating very less amount of data, we are combining them into one index and using filtered alias concept.

Based on above, we are expecting to have ~2000 shards and ~50000 filtered aliases.

What's the recommendation of master node configuration in such cases? OR what potential load master node is going to bear with this?

(Mark Walkom) #4

How big?


~100GB and still growing. However, if we combine more sources into a single index/shards, we will need to have more filtered aliases. That's the reason why we are thinking about a combination of #shards and #filtered aliases.

(system) #6