Understanding the _status response: _shards


(harryf) #1

Trying to understand what the _status response for an index is really telling me, specifically the _shards node;

"_shards":{"total":10,"successful":5,"failed":0}
...this is from an index running on just one node with default configuration. In the rest of the _status response below I see the description of only 5 shards.

Is it correct that "total": 10 is a default number of shards that elastic search is aiming to create for the whole index?

Why has it chosen only to create 5 _shards so far, given only one node?

When I start a second node, the status of shards becomes;

"_shards":{"total":10,"successful":10,"failed":0}
Are all 10 shards now actually used? Why do I still only see 5 shards described in the rest of the _status response on either node?

If all 10 shards are now being used, what is the impact on routing? Doesn't this mean some of the documents need to be moved from older shards to newer shards?

Finally what is a failed shard? Is that a shard which the cluster knows once existed but is now no longer available as a primary or a replica? If an index contains failed shards, should I avoid indexing further documents until those shards are recovered?

Many thanks.


(Lukáš Vlček) #2

Hi,

On Thu, Dec 30, 2010 at 4:56 PM, harryf hfuecks@gmail.com wrote:

Trying to understand what the _status response for an index is really
telling
me, specifically the _shards node;

"_shards":{"total":10,"successful":5,"failed":0}
...this is from an index running on just one node with default
configuration. In the rest of the _status response below I see the
description of only 5 shards.

Is it correct that "total": 10 is a default number of shards that elastic
search is aiming to create for the whole index?

If ES uses default settings then each index is created with {
number_of_shards : 5, number_of_replicas : 1}, which gives total number of
10 active shards (5 primary shards each having one replica). But note that
ES does not allocate shard and its replica to the same node, this means that
if only one node is started then only 5 primary shards are allocated on that
node and replicas are not allocated at all.

The total number in status response is (I believe) the number of active
shards ES can have but note that ES terminology uses active and primary
shard. You can also use cluster health
APIhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/to
get the high-level summary report regarding active/primary numbers.

Why has it chosen only to create 5 _shards so far, given only one node?

That is the default number of shards per index.

When I start a second node, the status of shards becomes;

"_shards":{"total":10,"successful":10,"failed":0}
Are all 10 shards now actually used? Why do I still only see 5 shards
described in the rest of the _status response on either node?

Because there is second node then primary shard replicas can now be finally
allocated. And yes, all shards are now being used, meaning that if you query
index then the query needs to hit 5 shards only (providing information from
all five shards is needed, like in case of match_all{} query). The query can
hit either primary shard or its replica.

If all 10 shards are now being used, what is the impact on routing? Doesn't
this mean some of the documents need to be moved from older shards to newer
shards?

No, these 5 new shards are just replicas (they contain the same documents
as primary shards).

Finally what is a failed shard? Is that a shard which the cluster knows
once
existed but is now no longer available as a primary or a replica? If an
index contains failed shards, should I avoid indexing further documents
until those shards are recovered?

To learn if you can index and/or search the index you can use the mentioned
status health API (if the status is green or yellow you are safe to use the
cluster or specific index).

The response should contain more details if there were any shard failures. I
am not sure if there is any general rule how shard failures should be
treated and interpreted but I would say you need to be very careful if there
were any failures and you probably want to drop query results and just
examine the reasons of failure. There can be many causes of shard failure
and it does not have to mean that the shard is not available, it can mean
that the query parsing or query execution failed on give shard (and it has
nothing to do with shard being generally healthy or available).

Many thanks.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Understanding-the-status-response-shards-tp2168097p2168097.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

HTH,
Lukas


(harryf) #3

Many thanks - that's a big help. Got me inspired to write up what I figured out so far - https://github.com/elasticsearch/elasticsearch/wiki/Scaling-ElasticSearch - hopefully more or less correct.


(Lukáš Vlček) #4

Hey, cool. As for the "scaling for data" section it would be good to mention
also index aliases as an other technique for overcoming fixed shard number
limitation.
Regards,
Lukáš
Dne 2.1.2011 16:04 "harryf" hfuecks@gmail.com napsal(a):

Many thanks - that's a big help. Got me inspired to write up what I
figured
out so far -
https://github.com/elasticsearch/elasticsearch/wiki/Scaling-ElasticSearch-
hopefully more or less correct.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Understanding-the-status-response-shards-tp2168097p2180135.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Shay Banon) #5

Nice work!. I updated the part on the long term persistency to include the
local gateway (which is the default) and the difference between shared and
local gateways.

On Sun, Jan 2, 2011 at 6:29 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hey, cool. As for the "scaling for data" section it would be good to
mention also index aliases as an other technique for overcoming fixed shard
number limitation.
Regards,
Lukáš
Dne 2.1.2011 16:04 "harryf" hfuecks@gmail.com napsal(a):

Many thanks - that's a big help. Got me inspired to write up what I
figured
out so far -

https://github.com/elasticsearch/elasticsearch/wiki/Scaling-ElasticSearch-

hopefully more or less correct.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Understanding-the-status-response-shards-tp2168097p2180135.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(harryf) #6

As for the "scaling for data" section it would be good to mention also index aliases as an other technique for overcoming fixed shard number limitation.

Ah great - wasn't aware of that - nice feature. Updated the doc.


(system) #7