Routing Logic in ElasticSearch

vaidik · January 7, 2013, 10:37am

Routing seems to be a pretty interesting feature for it can give quite a
performance boost. But, what hashing algorithm is used for deciding the
correct shard? I am asking this because in my use-case, we have multiple
shards per index with 0 replicas. We really don't want replicas because HA
is not an issue in our case. So, if one shard goes down, then what happens
at the time of performing operations issued with the routing key associated
with the shard that failed? I guess while performing search, ES will return
0 counts with 1 failed shard. But, more specifically what happens while
inserting against the failed shard?

Thanks,
Vaidik

--

drewr · January 7, 2013, 5:06pm

Vaidik Kapoor wrote:

Routing seems to be a pretty interesting feature for it can give
quite a performance boost. But, what hashing algorithm is used for
deciding the correct shard?

It's the DjbHashFunction [http://git.io/4rL8JQ].

I am asking this because in my use-case, we have multiple shards
per index with 0 replicas. We really don't want replicas because HA
is not an issue in our case. So, if one shard goes down, then what
happens at the time of performing operations issued with the
routing key associated with the shard that failed? I guess while
performing search, ES will return 0 counts with 1 failed shard.
But, more specifically what happens while inserting against the
failed shard?

This is relatively easy to test. Start three instances of ES (can be
in the same extract directory). Create your index:

curl -s -XPOST localhost:9200/kapoor -d '
{
"settings" : {
"number_of_replicas" : 0,
"number_of_shards" : 3
}
}
'

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

 [...]
 "_shards" : {
    "failures" : [
       {
          "index" : "kapoor",
          "status" : 500,
          "reason" : "No active shards",
          "shard" : 0
       }
    ],
    "failed" : 1,
    "successful" : 2,
    "total" : 3
 },
 [...]

So, no data should be lost when indexing into a degraded index,
however, you won't see any hits from the failed shard(s) when
searching.

-Drew

--

Clinton_Gormley · January 7, 2013, 7:15pm

Hi Drew

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

That's not quite correct. It will hash to the appropriate shard then
try to index to that shard. If the shard exists and is running, then
indexing will succeed. If it is missing, then indexing will fail. It
doesn't hash based on the number of live shards which exist - it uses
just the number of shards which SHOULD exist. Otherwise it wouldn't know
which shard to look at retrieve a doc when the previously missing shard
springs into existence.

clint

--

dadoonet · January 7, 2013, 8:29pm

Just wondering if it could be stored in the transaction log and indexed later when the shard comes up again?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2013 à 20:15, Clinton Gormley clint@traveljury.com a écrit :

Hi Drew

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

That's not quite correct. It will hash to the appropriate shard then
try to index to that shard. If the shard exists and is running, then
indexing will succeed. If it is missing, then indexing will fail. It
doesn't hash based on the number of live shards which exist - it uses
just the number of shards which SHOULD exist. Otherwise it wouldn't know
which shard to look at retrieve a doc when the previously missing shard
springs into existence.

clint

--

Topic		Replies	Views
Question about shard routing Elasticsearch	2	291	July 6, 2017
Hashing algo. for Routing Elasticsearch	4	4136	July 6, 2017
Elasticsearch how to figure out the shard number with the specified routing? Elasticsearch	5	959	July 5, 2017
Determine Shard Id based on routing key Elasticsearch	5	1150	July 6, 2017
Strict routing: only documents with one routing key per shard Elasticsearch	3	819	December 8, 2017

Routing Logic in ElasticSearch

Related topics