Wildcard characters


(Tobias Wallenqvist) #1

Hello All!

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias


(Clinton Gormley) #2

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filter
http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias


(Tobias Wallenqvist) #3

Ah sweet, thanx!

On 20 Feb, 12:32, Clinton Gormley cl...@traveljury.com wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias


(Tobias Wallenqvist) #4

I when ahead and added an asciifolding analyzer to my config file and
it worked great, i had to reindex all my index before it worked.

I have a question though, is it possible to see analyzers that are
active via some command, like _mapping or so?

Have a great day and thanx for the tip.

/Tobias

On 20 Feb, 12:32, Clinton Gormley cl...@traveljury.com wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias


(Shay Banon) #5

Getting the index settings will return the analyzers used, though in a key value format (not the json one). http://www.elasticsearch.org/guide/reference/api/admin-indices-get-settings.html.

On Wednesday, February 22, 2012 at 1:47 PM, Tobias Wallenqvist wrote:

I when ahead and added an asciifolding analyzer to my config file and
it worked great, i had to reindex all my index before it worked.

I have a question though, is it possible to see analyzers that are
active via some command, like _mapping or so?

Have a great day and thanx for the tip.

/Tobias

On 20 Feb, 12:32, Clinton Gormley <cl...@traveljury.com (http://traveljury.com)> wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias


(Barsk) #6

Clinton Gormley skrev 2012-02-20 12:32:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
You need to create a custom analyzer which uses the ascii folding token
filter
http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

There is another approach to the problem. The icu_folding filter
(http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html)
does the same thing, but more thorough. The asciiFoldingFilter is a
light approach to the problem. icu_folding is also faster since it uses
a compiled lookup table for the conversion.

I just contributed a fix to the icu_folding filter that enables it to
skip national characters (like åäö) in case you would like to be able to
search those as-is. icu_folding by default will remove ALL diacritics
including those for åäö -> aao. The characters to skip is configured
with a unicodeSetFilter parameter to the icu_folding filter meaning it
can be tuned to your requirements.

See my other post for more info on that. Hopefully it may end up in the
master branch if it is accepted.

P.S I actually have a version of the asciiFoldingFilter that adds
filtering too if needed, but it can be regarded as obsolete now D.S

/Kristian


(system) #7