It all depends on your shard setup. There are two types of shards in
Elasticsearch:
- Primary shards: Your data is broken up into primary shards and spread
across your cluster. If you have 5 primary shards, your data is split into
five pieces.
- Replica shards: These are just copies of the above primary shards.
They provide high-availability in case of failure
When you add a node, what happens depends on your primary/replica setup.
We can walk through a few examples:
1 Primary Shard, 0 Replicas
- With one node, one primary shard is allocated on the machine
- When you add a new node...nothing happens. You only have one primary
shard, so no data can move around. The new node goes unused
2 Primary Shards, 0 Replicas
- With one node, both primary shards reside on the single machine.
- When you add a node, one primary shard will move to the new machine.
Elasticsearch tries to maintain a balance of data across nodes to spread
load
- You have zero replicas, so if a node fails, you will lose data.
Primary shards simply split your data up so it can be moved around
1 Primary Shards, 1 Replicas
- With one node, a single primary shard is allocated. The replica
remains unallocated because it doesn't make sense to put a replica next to
a primary
- When you add a node, the a replica of the primary is created on the
new machine. You now have one primary and one replica
- If a node fails, you do not lose data. The replica can be "promoted"
to primary status if something goes wrong
2 Primary Shards, 1 Replicas
- With one node, two primary shards are allocated. No replicas are
allocated
- When you add a node, one primary shard will move to the new machine.
Replicas will also be created for both primaries (but on opposite nodes).
You will now have two primary shards, and a replica for each
- If a node fails, you do not lose data. Replicas are promoted
as necessary
Does that make sense? Basically, primaries control how much you would like
to divide your data (so you can scale to more nodes), while replicas are
how many extra copies you want to keep around to prevent data-loss.
Let me know if you have more questions!
-Zach
On Monday, November 4, 2013 3:00:41 PM UTC-5, Julian Vidal wrote:
I'm having a little trouble understanding how Elasticsearch handles the
data stored in nodes and how it behaves when one of the nodes fails or you
add more nodes.
My questions are:
- Let's say you have one node (default config) with some data and you
add another one. Does ES replicate all the data to the new node or does it
give it a partial set of data?
- I understand that you can config ES so that the data is distributed
against several nodes and it will intelligently query all available nodes
and give you a complete result set. What happens if a node goes down? Do
you loose data? What node is responsible for persisting data?
I know that the answers to these questions may start with "depends on your
config". If that is the case, I'm interested in only these two scenarios:
Scenario 1) The ES instances are in Amazon EC2 and using either the EC2
EBS gateway or the S3 gateway.
Scenario 2) The ES instances are in Amazon EC2 and just using the instance
store (non-ebs store).
Thanks,
Julian.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.