Upgraded to ES 2.2.1 from a very old 0.18 installation and I have run inte a problem.
I have two environments where I have configured an analysis-icu analyzer. In the first envoronment (win 7) everytyhing works perfectly, but in my other environment (Liniux) the unicodeSetFilter parameter is ignored. The elasticsearch.yml file looks like this and is utf-8 encoded:
index :
analysis :
analyzer :
swedishIcuFoldingAnalyzer :
type : custom
tokenizer : standard
filter : [icuFolding, lowercase, swedish_stop]
filter :
swedish_stop :
type : stop
stopwords : _swedish_
icuFolding :
type : icu_folding
unicodeSetFilter : "[^åäöÅÄÖ]"
When I query for the text "modrar" it will match "mödrar" in the Linux environment while on Windows it will not - as expected.
You may be correct with the icu_tokenizer instead of standard, but it makes no difference here.
Also the ordering of filters are a good point, but I believe lowercase should come before stopwords, right? I mean, all the stopwords are in lowercase.
When it comes to the analysis-icu plugin. It does work, The problem is that it is ignoring, or misinterpreting the unicodeSetFilter parameter. This is strange.
Well, it turned out to be a stupid error on my part. I had two instances of ES running apparently on the same machine and restarting the service had no effect since there was another instance blocking the reloads...
It seems ES is quietly just finding the next unoccupied port number and will not even give a warning that the standard port is blocked.
That's correct, this is the intended behavior. There is no "standard port" but a port range (9200-9300, 9300-9400). It is supposed to ease demonstrations with multicast, so you can just start many ES processes on the same machine as you wish, or on the same network, they will find and form a cluster. Since multicast is gone, this behavior seems weird.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.