Hi everybody,
I experienced an issue in my use of elasticsearch (0.18.2) on EC2 and I'd
need some advices from the community in order to avoid it for the future.
Here is the way I would create my index:
curl -XPUT localhost:9200/my_index -d '
index:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: standard
filter: [standard, lowercase, my_synonym,
my_ngram]
filter:
my_synonym:
type: synonym
synonyms_path: "analysis/my_synonyms.txt"
my_ngram:
type: nGram
min_gram: 3
max_gram: 7
'
If the file my_synonyms.txt is not available on the master node, the
creation of the index fails (and this is fine to me). Here is the json
response:
{"error":"RemoteTransportException[indices/createIndex]]; nested:
IndexCreationException[[my_index] failed to create index]; nested:
FailedToResolveConfigException[Failed to resolve config path
[analysis/my_synonyms.txt], tried file path [analysis/my_synonyms.txt],
path file [/data/elasticsearch/config/analysis/my_synonyms.txt], and
classpath]; ","status":500}
But if it is on the master node but not on one of the nodes where a shard
is assigned, here is the response:
{"ok":true,"acknowledged":false}
The thing is, in that case, some shards are still in the initializing state
and do not work properly:
"routing_table" : {
"indices" : {
"en_us" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "YN1TTiAHR9G_n6MyR3_wAg",
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "en_us"
} ],
...
And obviously querying the cluster to index or search doesn't work as
expected anymore. From what I understand, the shards creation is delegated
to nodes of the cluster, and in case the shard can not initialize, the
master node is not notified but instead it waits until the shard creation
query times out.
First, I realize that I made a big mistake by not having synchronized
config folders for the different instances. I wonder if there is any good
practice to avoid this situation. Here are my ideas:
- not using any configuration file but only the REST/JSON APIs
(but I can find any explanation about how to describe a multi-lines
synonyms configuration via this API) - synchronizing/sharing the configuration folders between the different
instances
Also, I wish the cluster had a different behavior when this mistake
occured. One thing that bothers me is that what happens varies depending on
who the master of the cluster is - and this is subject to change at any
time. And neither the JSON response nor the logs gave me useful information
to understand what I was doing wrong.
Thanks for your help
Florent