Understanding nodes

Julian_Vidal · November 4, 2013, 8:00pm

I'm having a little trouble understanding how Elasticsearch handles the
data stored in nodes and how it behaves when one of the nodes fails or you
add more nodes.

My questions are:

Let's say you have one node (default config) with some data and you
add another one. Does ES replicate all the data to the new node or does it
give it a partial set of data?
I understand that you can config ES so that the data is distributed
against several nodes and it will intelligently query all available nodes
and give you a complete result set. What happens if a node goes down? Do
you loose data? What node is responsible for persisting data?

I know that the answers to these questions may start with "depends on your
config". If that is the case, I'm interested in only these two scenarios:
Scenario 1) The ES instances are in Amazon EC2 and using either the EC2 EBS
gateway or the S3 gateway.
Scenario 2) The ES instances are in Amazon EC2 and just using the instance
store (non-ebs store).

Thanks,
Julian.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · November 5, 2013, 3:08pm

It depends. As you said "default" config, I guess we assume here also default index settings, which is 5 shards and 1 replica per shard.
So when you start a new node, replicas will be allocated on the new node. That means all data will be replicated.
All nodes have the same behavior unless you change default settings. There is no specific node dedicated to persist data. All data are locally stored.

Don't use S3 gateways. They are deprecated: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway-s3.html
Using EBS is fine as long as you ask for provisioned IOPS.
Local disks are perfectly fine as well. SSD drives are better as you can guess!

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

4 novembre 2013 at 21:00:44, Julian Vidal (julian@julianvidal.com) a écrit:

I'm having a little trouble understanding how Elasticsearch handles the data stored in nodes and how it behaves when one of the nodes fails or you add more nodes.

My questions are:
Let's say you have one node (default config) with some data and you add another one. Does ES replicate all the data to the new node or does it give it a partial set of data?
I understand that you can config ES so that the data is distributed against several nodes and it will intelligently query all available nodes and give you a complete result set. What happens if a node goes down? Do you loose data? What node is responsible for persisting data?
I know that the answers to these questions may start with "depends on your config". If that is the case, I'm interested in only these two scenarios:
Scenario 1) The ES instances are in Amazon EC2 and using either the EC2 EBS gateway or the S3 gateway.
Scenario 2) The ES instances are in Amazon EC2 and just using the instance store (non-ebs store).

Thanks,
Julian.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · November 5, 2013, 3:11pm

It all depends on your shard setup. There are two types of shards in
Elasticsearch:

Primary shards: Your data is broken up into primary shards and spread
across your cluster. If you have 5 primary shards, your data is split into
five pieces.
Replica shards: These are just copies of the above primary shards.
They provide high-availability in case of failure

When you add a node, what happens depends on your primary/replica setup.
We can walk through a few examples:

1 Primary Shard, 0 Replicas

With one node, one primary shard is allocated on the machine
When you add a new node...nothing happens. You only have one primary
shard, so no data can move around. The new node goes unused

2 Primary Shards, 0 Replicas

With one node, both primary shards reside on the single machine.
When you add a node, one primary shard will move to the new machine.
Elasticsearch tries to maintain a balance of data across nodes to spread
load
You have zero replicas, so if a node fails, you will lose data.
Primary shards simply split your data up so it can be moved around

1 Primary Shards, 1 Replicas

With one node, a single primary shard is allocated. The replica
remains unallocated because it doesn't make sense to put a replica next to
a primary
When you add a node, the a replica of the primary is created on the
new machine. You now have one primary and one replica
If a node fails, you do not lose data. The replica can be "promoted"
to primary status if something goes wrong

2 Primary Shards, 1 Replicas

With one node, two primary shards are allocated. No replicas are
allocated
When you add a node, one primary shard will move to the new machine.
Replicas will also be created for both primaries (but on opposite nodes).
You will now have two primary shards, and a replica for each
If a node fails, you do not lose data. Replicas are promoted
as necessary

Does that make sense? Basically, primaries control how much you would like
to divide your data (so you can scale to more nodes), while replicas are
how many extra copies you want to keep around to prevent data-loss.

Let me know if you have more questions!
-Zach

On Monday, November 4, 2013 3:00:41 PM UTC-5, Julian Vidal wrote:

I'm having a little trouble understanding how Elasticsearch handles the
data stored in nodes and how it behaves when one of the nodes fails or you
add more nodes.

My questions are:

Let's say you have one node (default config) with some data and you
add another one. Does ES replicate all the data to the new node or does it
give it a partial set of data?

I understand that you can config ES so that the data is distributed
against several nodes and it will intelligently query all available nodes
and give you a complete result set. What happens if a node goes down? Do
you loose data? What node is responsible for persisting data?

I know that the answers to these questions may start with "depends on your
config". If that is the case, I'm interested in only these two scenarios:
Scenario 1) The ES instances are in Amazon EC2 and using either the EC2
EBS gateway or the S3 gateway.
Scenario 2) The ES instances are in Amazon EC2 and just using the instance
store (non-ebs store).

Thanks,
Julian.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Multiple ES nodes point on same data location Elasticsearch	7	3015	July 6, 2017
Migrate data Elasticsearch	10	356	July 6, 2017
Load Balancing when Node got Down Elasticsearch	9	2409	July 6, 2017
Cluster questions Elasticsearch	7	376	July 6, 2017
Query on Elasticsearch storage Elasticsearch	4	529	March 29, 2018

Understanding nodes

Thanks, Julian.

Related topics

Thanks,
Julian.