Wrong configuration can lead to unavailable shards

Florent_Becart · November 18, 2011, 9:37pm

Hi everybody,

I experienced an issue in my use of elasticsearch (0.18.2) on EC2 and I'd
need some advices from the community in order to avoid it for the future.

Here is the way I would create my index:
curl -XPUT localhost:9200/my_index -d '
index:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: standard
filter: [standard, lowercase, my_synonym,
my_ngram]
filter:
my_synonym:
type: synonym
synonyms_path: "analysis/my_synonyms.txt"
my_ngram:
type: nGram
min_gram: 3
max_gram: 7
'

If the file my_synonyms.txt is not available on the master node, the
creation of the index fails (and this is fine to me). Here is the json
response:
{"error":"RemoteTransportException[indices/createIndex]]; nested:
IndexCreationException[[my_index] failed to create index]; nested:
FailedToResolveConfigException[Failed to resolve config path
[analysis/my_synonyms.txt], tried file path [analysis/my_synonyms.txt],
path file [/data/elasticsearch/config/analysis/my_synonyms.txt], and
classpath]; ","status":500}

But if it is on the master node but not on one of the nodes where a shard
is assigned, here is the response:
{"ok":true,"acknowledged":false}

The thing is, in that case, some shards are still in the initializing state
and do not work properly:
"routing_table" : {
"indices" : {
"en_us" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "YN1TTiAHR9G_n6MyR3_wAg",
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
} ],
...

And obviously querying the cluster to index or search doesn't work as
expected anymore. From what I understand, the shards creation is delegated
to nodes of the cluster, and in case the shard can not initialize, the
master node is not notified but instead it waits until the shard creation
query times out.

First, I realize that I made a big mistake by not having synchronized
config folders for the different instances. I wonder if there is any good
practice to avoid this situation. Here are my ideas:

not using any configuration file but only the REST/JSON APIs
(but I can find any explanation about how to describe a multi-lines
synonyms configuration via this API)
synchronizing/sharing the configuration folders between the different
instances

Also, I wish the cluster had a different behavior when this mistake
occured. One thing that bothers me is that what happens varies depending on
who the master of the cluster is - and this is subject to change at any
time. And neither the JSON response nor the logs gave me useful information
to understand what I was doing wrong.

Thanks for your help

Florent

kimchy · November 20, 2011, 8:36am

Right, thats how it works. The index gets created on the master node as a
"trial creation", and then shards get allocated and the index gets created
again where shards end up being allocated.

Synch'ing "config" location is one option, though its tricky, which one do
you sync between the nodes... . An API to "upload" a file to the cluster
and then reference it might make more sense, but its low priority.

I think that the behavior is fine now, you can tell that the index has
problems since when you try to operate against it, or try to see what the
index status, you will see failures if it failed to be allocated on any
node.

On Fri, Nov 18, 2011 at 11:37 PM, Florent Bécart
florent.becart@gmail.comwrote:

Hi everybody,

I experienced an issue in my use of elasticsearch (0.18.2) on EC2 and I'd
need some advices from the community in order to avoid it for the future.

Here is the way I would create my index:
curl -XPUT localhost:9200/my_index -d '
index:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: standard
filter: [standard, lowercase, my_synonym,
my_ngram]
filter:
my_synonym:
type: synonym
*synonyms_path: "analysis/my_synonyms.txt"
*
my_ngram:
type: nGram
min_gram: 3
max_gram: 7
'

If the file my_synonyms.txt is not available on the master node, the
creation of the index fails (and this is fine to me). Here is the json
response:
{"error":"RemoteTransportException[indices/createIndex]]; nested:
IndexCreationException[[my_index] failed to create index]; nested:
FailedToResolveConfigException[Failed to resolve config path
[analysis/my_synonyms.txt], tried file path [analysis/my_synonyms.txt],
path file [/data/elasticsearch/config/analysis/my_synonyms.txt], and
classpath]; ","status":500}

But if it is on the master node but not on one of the nodes where a shard
is assigned, here is the response:
{"ok":true,"acknowledged":false}

The thing is, in that case, some shards are still in the initializing
state and do not work properly:
"routing_table" : {
"indices" : {
"en_us" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "YN1TTiAHR9G_n6MyR3_wAg",
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
} ],
...

And obviously querying the cluster to index or search doesn't work as
expected anymore. From what I understand, the shards creation is delegated
to nodes of the cluster, and in case the shard can not initialize, the
master node is not notified but instead it waits until the shard creation
query times out.

First, I realize that I made a big mistake by not having synchronized
config folders for the different instances. I wonder if there is any good
practice to avoid this situation. Here are my ideas:

not using any configuration file but only the REST/JSON APIs
(but I can find any explanation about how to describe a multi-lines
synonyms configuration via this API)

synchronizing/sharing the configuration folders between the
different instances

Also, I wish the cluster had a different behavior when this mistake
occured. One thing that bothers me is that what happens varies depending on
who the master of the cluster is - and this is subject to change at any
time. And neither the JSON response nor the logs gave me useful information
to understand what I was doing wrong.

Thanks for your help

Florent

Florent_Becart · November 23, 2011, 2:31am

Thanks for the answer. I agree with everything you said. And I'm pleased to
see the guide page for the "Synonym Token Filter" was updated and now
contains everything I needed to know
(http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter.html)

Keep up the good work

Topic		Replies	Views
UnavailableShardsException When Creating Index With Settings Elasticsearch	6	1714	July 6, 2017
Unable to Create Index with Synonym Settings Elasticsearch	1	460	July 6, 2017
Can't create index using custom analyzer with ES 0.90.0RC1 Elasticsearch	3	353	July 6, 2017
Elasticseach failed shard allocation Elasticsearch	8	1353	May 28, 2021
Error loading file for custom synonym file in Elasticsearch Elasticsearch	1	615	July 6, 2017

Wrong configuration can lead to unavailable shards

Related topics