Data distribution over shards and replicas

bagui · March 31, 2014, 7:49pm

Hi,

I've started working on elasticsearch and having some doubts about shards
and replicas and how they handle data. I don't have any prior knowledge on
Lucene.
As I know lucene will split data in segments and store in disk, and shard
is the lucene index itself. Some of the doubts which I have is...

There are two way we can do shard allocation, one in cluster level with
config settings and another in index level settings. Suppose in cluster
level I mentioned max shard is 3 and in index level I mentioned 5 shards,
how the shards will be allocated? I have one cluster one node.
Suppose, one index is having 5 shards and 2 replicas and I'm pushing
data in bulk api, how the data will be stored? Is same data will be stored
in 5 shards or the data will split and store in chunks in 5 shards? How
replicas will have backup of data of all 5 shards?
Suppose I have 5 nodes and 10 shards are distributed over the nodes, 2
shards each. So when I index new documents how the data will be stored in
over the nodes?
Suppose the 5th node goes down suddenly which is holding 9th and 10th
shard. Now do I loose all the data stored in 9th and 10th shard or the data
are already copied in rest of the nodes ?

Please explain.

Thanks,
Subhadip

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ac575bd-0d0a-4f5f-972e-7f3c54f2eb85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 31, 2014, 8:01pm

1 - it will use 5 as you've specifically set that, anything you don't
specify will just use the cluster default.
2 - The data isn't replicated into all the shards, it splits the complete
data up into 5 shards. Then each replica set will contain a copy of the
data, which is then also sharded.
3 - ES will distribute data across all shards as best it can. It will also
not store the replicas for a shard on the same node as the primary. So if a
node that holds a primary shard dies, thena secondary shard will be
promoted to primary.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 1 April 2014 06:49, Subhadip Bagui i.bagui@gmail.com wrote:

Hi,

I've started working on elasticsearch and having some doubts about shards
and replicas and how they handle data. I don't have any prior knowledge on
Lucene.
As I know lucene will split data in segments and store in disk, and shard
is the lucene index itself. Some of the doubts which I have is...

There are two way we can do shard allocation, one in cluster level with
config settings and another in index level settings. Suppose in cluster
level I mentioned max shard is 3 and in index level I mentioned 5 shards,
how the shards will be allocated? I have one cluster one node.

Suppose, one index is having 5 shards and 2 replicas and I'm pushing
data in bulk api, how the data will be stored? Is same data will be stored
in 5 shards or the data will split and store in chunks in 5 shards? How
replicas will have backup of data of all 5 shards?

Suppose I have 5 nodes and 10 shards are distributed over the nodes, 2
shards each. So when I index new documents how the data will be stored in
over the nodes?
Suppose the 5th node goes down suddenly which is holding 9th and 10th
shard. Now do I loose all the data stored in 9th and 10th shard or the data
are already copied in rest of the nodes ?

Please explain.

Thanks,
Subhadip

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ac575bd-0d0a-4f5f-972e-7f3c54f2eb85%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4ac575bd-0d0a-4f5f-972e-7f3c54f2eb85%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624anWNJjEnTDQuUqaZ2Wy8mmagth3Occ71QQMR_WO0obtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bagui · April 2, 2014, 1:21pm

Thanks Mark for the prompt reply, I have some more doubts

Suppose one index is running with 3 shards and 1 replica and other index
is running with the cluster settings i.e. 5 shards 2 replica then total 3+1
or 5+2 shards will be available in cluster? I have installed
elasticsearch-head plugin but the replica shard is not showing there.

For data distribution, replica shard also keeps other index documents or it
will be used to keep backup copy of data only.

So documents under same index will be split due to sharding and
distribute over the shards right ? Can we push all the documents for same
index in a particular shard? I don't want to use custom routing as then I
need one field value common for all the documents. How can we find out
which shard is holding which documents?
If I make one index with 2 shards and no replica and the node in cluster
holding this 2 shards dies, then will I lose the data, or the data will
have a copy in cluster level replica? If I have only 1 replica and the node
holds the replica dies then how the backup will happen?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d7d0243-dcd1-4ac7-9fef-1d6e44599ea1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 2, 2014, 9:37pm

1 - Data from both will be available, you've just told ES not to use the
defaults for one index. A replica is not a backup, it's a 1:1 replica so it
will contain the same data as the primary shard.
2 - Not sure, but I don't think so as lucene will try to split things.
Routing is the recommended method for what you want.
3 - Yes, although you are unlikely to have them both on one node unless it
is a single node cluster. What do mean by backup? If you're talking about
replicas instead then the cluster will build a new replica if one dies.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 April 2014 00:21, Subhadip Bagui i.bagui@gmail.com wrote:

Thanks Mark for the prompt reply, I have some more doubts

Suppose one index is running with 3 shards and 1 replica and other
index is running with the cluster settings i.e. 5 shards 2 replica then
total 3+1 or 5+2 shards will be available in cluster? I have installed
elasticsearch-head plugin but the replica shard is not showing there.

For data distribution, replica shard also keeps other index documents or
it will be used to keep backup copy of data only.

So documents under same index will be split due to sharding and
distribute over the shards right ? Can we push all the documents for same
index in a particular shard? I don't want to use custom routing as then I
need one field value common for all the documents. How can we find out
which shard is holding which documents?

If I make one index with 2 shards and no replica and the node in
cluster holding this 2 shards dies, then will I lose the data, or the data
will have a copy in cluster level replica? If I have only 1 replica and
the node holds the replica dies then how the backup will happen?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d7d0243-dcd1-4ac7-9fef-1d6e44599ea1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4d7d0243-dcd1-4ac7-9fef-1d6e44599ea1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YnT8-1mHTJ-%3D8RmZhmn5MZugJ0cL39zdKwifG8o98myw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bagui · April 3, 2014, 4:08am

Thanks a lot Mark. That explains a lot.

By backup I meant copy of same data.

One last question, for fast searching what will be the better selection?
single index multiple shards or multiple index single shard?

Can you please give some reference how lucene splits documents and store in
shards. That will help to get better idea.

Thanks,
Subhadip

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3785b01c-328f-4f4c-8dab-db93b73b2b5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 3, 2014, 9:46am

Depends on your data and cluster configuration, but you probably don't want
a single shard unless it's a tiny, tiny index.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 April 2014 15:08, Subhadip Bagui i.bagui@gmail.com wrote:

Thanks a lot Mark. That explains a lot.

By backup I meant copy of same data.

One last question, for fast searching what will be the better selection?
single index multiple shards or multiple index single shard?

Can you please give some reference how lucene splits documents and store
in shards. That will help to get better idea.

Thanks,
Subhadip

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3785b01c-328f-4f4c-8dab-db93b73b2b5c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3785b01c-328f-4f4c-8dab-db93b73b2b5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YK3iUQ7Gm4-nGtqyorz8F4U3GbJPwwAuOvft_WBrLktQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Newbie question on shard and replicas Elasticsearch	5	412	July 6, 2017
Shard allocate Elasticsearch	3	386	December 14, 2016
What database is used for Elasticsearch Elasticsearch	2	377	July 3, 2018
How many indices could an Elasticsearch cluster include Elasticsearch	4	1798	July 19, 2017
How Shards and Replicas distributed on the cluster Elasticsearch	3	849	December 14, 2018

Data distribution over shards and replicas

Related topics