ae_analyzer:
type: custom
tokenizer: standard
filter: [umlaut_replace]
filter:
umlaut_replace:
type : pattern_replace
pattern: "ä"
replacement: "a"
The exception I get on startup is:
INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");
However the PatternStringFilter is at some org.apache package...
Might this be the cause or am I simply misconfiguring something badly?
Can you do a get settings to see if the type is really there for the filter
(note, settings get munged into key value pairs)? Also, for this usecase,
though I would love to help fixing it, you might want to consider using the
asciifolding filter? (
ae_analyzer:
type: custom
tokenizer: standard
filter: [umlaut_replace]
filter:
umlaut_replace:
type : pattern_replace
pattern: "ä"
replacement: "a"
The exception I get on startup is:
INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");
However the PatternStringFilter is at some org.apache package...
Might this be the cause or am I simply misconfiguring something badly?
Completely my fault. I tested against a 0.16 version of elasticsearch,
where the filter was not included yet. Works smoothly with 0.17. Sorry
for that.
I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.
Thanks for helping, going to hide ashamed behind a rock now
I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.
Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents ( http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html).
This way you would not have to define explicit mappings for every character
and would automatically cover other common cases such as accented chars
(like in Créme).
Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents (http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analys...).
Right. As far as I know this works only, if you want to create a from
ä... in some special cases you might want to create "ae" instead
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.