Thanks for your answer! Too bad, so there is no way to make use of Amazon Spot Instances?
The setup to evaluate:
2 Core Nodes ("Always" available spread over to availability zones)
1 Spot Instance (Not always available)
Default index: 5 Shards, 1 Replica
That should really be possible (by some kind of filters similar to shard allocation), the shutdown of some instance types (spot instances) on AWS does only happen rarelly, so this would be a great way to improve costs...
There is one guy build something like this, but I think it's still not "safe" regarding the replica placement:
I think there is still the risk situation that one core node fails and at the same time the spot instance goes down = cluster broken...
You shouldn't need to force particular nodes to only allocate replica shards here. There is no difference between the data stored on a primary and replica shard. The only difference is that Elasticsearch labels one of them as a Primary. If the primary shard disappears, the master node will change one of the remaining replicas of that shard to a primary. So if your 'spot instance' contained a primary shard and was shutdown, then a replica on one of the other nodes would be promoted to a primary.
As stated in the blog you linked to, you will want to set allocation attributes so that you can guarantee that at least one copy of every shard is located on a 'core node'
The problem is when I use the exact same settings like in the blog post (one core group and one spot group) then the spot group only receives replica shards and the nodes are totally idle (no CPU or IO load). I red that replica shards should be used as read nodes but it doesn't seem to work with the manual shard allocation setting. The core nodes are running at 90% CPU on the same time while the spot instances are sleeping... I'm not sure if this is the intended behaviour..
I have two master nodes with data (false) and master (true) – all other nodes in the cluster has data (true) and master (false). The application only accesses the two master nodes (no direct connection to any data node). But the masters are routing the queries only two one of the core nodes which hangs at high load while the other nodes being lazy.
Having an odd number of masters is much better, you run less chance of a split brain.
If you are using dedicated masters, DO NOT query or index through them. The whole point of splitting them out it to ensure optimal operations of your cluster. Pushing query and indexing through them runs the risk of OOM happening, which negates the whole purpose of splitting them out.