Cluster state storage question

Robert_Gardam · March 24, 2015, 5:36pm

Hi,
I am starting to look at the size of my cluster state and I started to
notice that the shard information is duplicated.

One grouping seems to be from the view of the index and the other from
which shards live on which host.

I'm sure there's a logical reason for this, i'm just interested to know why?

This probably also compresses pretty well.

Cheers,
Rob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 25, 2015, 10:02pm

Cluster state is compressed by default, but for large clusters, or those
with lots of large mappings, it can also be a problem.

The cluster needs to know about what shards make up an index, as well as
where they are located which is why. As you mentioned it is currently
stored under two separate areas of the cluster state, though it's possible
this could be combined to reduce the size.

On 25 March 2015 at 04:36, Robert Gardam robert.gardam@fyber.com wrote:

Hi,
I am starting to look at the size of my cluster state and I started to
notice that the shard information is duplicated.

One grouping seems to be from the view of the index and the other from
which shards live on which host.

I'm sure there's a logical reason for this, i'm just interested to know
why?

This probably also compresses pretty well.

Cheers,
Rob

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_orh2gZ5%3DXvjRJo6OwsFMwnttCzmF8cJhC8vHqCwC5sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 25, 2015, 10:21pm

I did some more digging here to understand things a bit more than my last
(lame, sorry!) email.

We only send routing_table over the network and then we build routing_nodes
out of it on the other side. However routing_nodes is actually built only
on access, so unless you use it, which happens on master nodes and if you
do get /_cluster/status, it's not actually built.

We need routing_nodes for a quick access to node view of the shard
allocation and we need routing_table for the index view of shard
allocation. This is done to ensure performance.

There is work happening to improve the overall transfer speed of the
cluster state between nodes, essentially to send the delta of a state to
all nodes, rather than the whole state as currently happens.

On 26 March 2015 at 09:02, Mark Walkom markwalkom@gmail.com wrote:

Cluster state is compressed by default, but for large clusters, or those
with lots of large mappings, it can also be a problem.

The cluster needs to know about what shards make up an index, as well as
where they are located which is why. As you mentioned it is currently
stored under two separate areas of the cluster state, though it's possible
this could be combined to reduce the size.

On 25 March 2015 at 04:36, Robert Gardam robert.gardam@fyber.com wrote:

Hi,
I am starting to look at the size of my cluster state and I started to
notice that the shard information is duplicated.

One grouping seems to be from the view of the index and the other from
which shards live on which host.

I'm sure there's a logical reason for this, i'm just interested to know
why?

This probably also compresses pretty well.

Cheers,
Rob

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8QLBc_haaQhr%2BT-arcW44z22x8Q5yevh4i2avGW8e66g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Robert_Gardam · March 26, 2015, 9:17am

All ok!
Thanks for the explanation! I just found it odd that the same information
was displayed twice, but if it's only sent one way that is good. Makes
sense!

Elasticsearch has already become incredibly more stable in the past 12
months!
The future of ES is really exciting!

Thanks again!
Rob

On Wednesday, March 25, 2015 at 11:22:14 PM UTC+1, Mark Walkom wrote:

I did some more digging here to understand things a bit more than my last
(lame, sorry!) email.

We only send routing_table over the network and then we build
routing_nodes out of it on the other side. However routing_nodes is
actually built only on access, so unless you use it, which happens on
master nodes and if you do get /_cluster/status, it's not actually built.

We need routing_nodes for a quick access to node view of the shard
allocation and we need routing_table for the index view of shard
allocation. This is done to ensure performance.

There is work happening to improve the overall transfer speed of the
cluster state between nodes, essentially to send the delta of a state to
all nodes, rather than the whole state as currently happens.

On 26 March 2015 at 09:02, Mark Walkom <markw...@gmail.com <javascript:>>
wrote:

Cluster state is compressed by default, but for large clusters, or those
with lots of large mappings, it can also be a problem.

The cluster needs to know about what shards make up an index, as well as
where they are located which is why. As you mentioned it is currently
stored under two separate areas of the cluster state, though it's possible
this could be combined to reduce the size.

On 25 March 2015 at 04:36, Robert Gardam <robert...@fyber.com
<javascript:>> wrote:

Hi,
I am starting to look at the size of my cluster state and I started to
notice that the shard information is duplicated.

One grouping seems to be from the view of the index and the other from
which shards live on which host.

I'm sure there's a logical reason for this, i'm just interested to know
why?

This probably also compresses pretty well.

Cheers,
Rob

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/43907b97-4ef9-4950-87a0-20edd6be663a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9e805b21-f99f-4892-8629-edfc4443e8be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
2 clusters versus 1 big cluster? Elasticsearch	6	2733	July 6, 2017
The ElasticSearch directory layout Elasticsearch	4	1295	July 6, 2017
Does the cluster state size impact performance? Elasticsearch	4	1974	July 6, 2017
Cluster state update task Elasticsearch	20	2470	May 23, 2019
Primary Shard allocation in the same node has same storage information Elasticsearch	2	784	July 5, 2017

Cluster state storage question

Related topics