Index analyzer settings: is there a way to


(Michal Wegorek) #1

Is there an index analyzer setting to:

  1. Treat diacritic letters (in my case polish diacritic letters ą, ć,
    ę, ł, ń, ó, ś, ź, ż) as US alphabet equivalents during search:
    ą -> a
    ć -> c
    ..

What I mean is when I do search with pattern 'abc' i want to see in
results 'abc' as well as 'ąbc', but when I search for 'ąbc' I want ES
to find only 'ąbc'

This settings do not work, I expected asciifolding might be doing the
trick:

index.analysis.analyzer.default.type: standard
index.analysis.analyzer.default.stopwords: none
index.analysis.analyzer.default.tokenizer: standard
index.analysis.analyzer.default.filter: [standard, lowercase, stop,
asciifolding, porter_stem]

ES 18.7

Cheers!
Michal


(Shay Banon) #2

Are you sure you are using a query that also analyzes the search text? (query_string, field, text)? If so, can you gist a recreation (http://www.elasticsearch.org/help)?

On Wednesday, February 22, 2012 at 1:49 PM, Michal Wegorek wrote:

Is there an index analyzer setting to:

  1. Treat diacritic letters (in my case polish diacritic letters ą, ć,
    ę, ł, ń, ó, ś, ź, ż) as US alphabet equivalents during search:
    ą -> a
    ć -> c
    ..

What I mean is when I do search with pattern 'abc' i want to see in
results 'abc' as well as 'ąbc', but when I search for 'ąbc' I want ES
to find only 'ąbc'

This settings do not work, I expected asciifolding might be doing the
trick:

index.analysis.analyzer.default.type: standard
index.analysis.analyzer.default.stopwords: none
index.analysis.analyzer.default.tokenizer: standard
index.analysis.analyzer.default.filter: [standard, lowercase, stop,
asciifolding, porter_stem]

ES 18.7

Cheers!
Michal


(system) #3