Routing Logic in ElasticSearch

Routing seems to be a pretty interesting feature for it can give quite a
performance boost. But, what hashing algorithm is used for deciding the
correct shard? I am asking this because in my use-case, we have multiple
shards per index with 0 replicas. We really don't want replicas because HA
is not an issue in our case. So, if one shard goes down, then what happens
at the time of performing operations issued with the routing key associated
with the shard that failed? I guess while performing search, ES will return
0 counts with 1 failed shard. But, more specifically what happens while
inserting against the failed shard?

Thanks,
Vaidik

--

Vaidik Kapoor wrote:

Routing seems to be a pretty interesting feature for it can give
quite a performance boost. But, what hashing algorithm is used for
deciding the correct shard?

It's the DjbHashFunction [http://git.io/4rL8JQ].

I am asking this because in my use-case, we have multiple shards
per index with 0 replicas. We really don't want replicas because HA
is not an issue in our case. So, if one shard goes down, then what
happens at the time of performing operations issued with the
routing key associated with the shard that failed? I guess while
performing search, ES will return 0 counts with 1 failed shard.
But, more specifically what happens while inserting against the
failed shard?

This is relatively easy to test. Start three instances of ES (can be
in the same extract directory). Create your index:

curl -s -XPOST localhost:9200/kapoor -d '
{
"settings" : {
"number_of_replicas" : 0,
"number_of_shards" : 3
}
}
'

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

 [...]
 "_shards" : {
    "failures" : [
       {
          "index" : "kapoor",
          "status" : 500,
          "reason" : "No active shards",
          "shard" : 0
       }
    ],
    "failed" : 1,
    "successful" : 2,
    "total" : 3
 },
 [...]

So, no data should be lost when indexing into a degraded index,
however, you won't see any hits from the failed shard(s) when
searching.

-Drew

--

Hi Drew

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

That's not quite correct. It will hash to the appropriate shard then
try to index to that shard. If the shard exists and is running, then
indexing will succeed. If it is missing, then indexing will fail. It
doesn't hash based on the number of live shards which exist - it uses
just the number of shards which SHOULD exist. Otherwise it wouldn't know
which shard to look at retrieve a doc when the previously missing shard
springs into existence.

clint

--

Just wondering if it could be stored in the transaction log and indexed later when the shard comes up again?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2013 à 20:15, Clinton Gormley clint@traveljury.com a écrit :

Hi Drew

This will create an index with a single primary shard on each node.
Then kill one of the nodes (Ctrl-C in the shell) and try indexing a
document:

curl -s -XPUT localhost:9200/kapoor/t/1 -d '
{
"name" : "Vaidik Kapoor"
}
'

It should succeed. ES will choose one of the shards that's available
and hash into that. But when you search you'll see one hit and:

That's not quite correct. It will hash to the appropriate shard then
try to index to that shard. If the shard exists and is running, then
indexing will succeed. If it is missing, then indexing will fail. It
doesn't hash based on the number of live shards which exist - it uses
just the number of shards which SHOULD exist. Otherwise it wouldn't know
which shard to look at retrieve a doc when the previously missing shard
springs into existence.

clint

--

--