Wildcard characters

Tobias_Wallenqvist · February 20, 2012, 9:23am

Hello All!

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

Clinton_Gormley · February 20, 2012, 11:32am

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (Ã©,Ã» ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filter

clint

Is it possible to map Ã¨ as e and Ã» as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with Ã¨ and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

Tobias_Wallenqvist · February 20, 2012, 11:47am

Ah sweet, thanx!

On 20 Feb, 12:32, Clinton Gormley cl...@traveljury.com wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

Tobias_Wallenqvist · February 22, 2012, 11:47am

I when ahead and added an asciifolding analyzer to my config file and
it worked great, i had to reindex all my index before it worked.

I have a question though, is it possible to see analyzers that are
active via some command, like _mapping or so?

Have a great day and thanx for the tip.

/Tobias

On 20 Feb, 12:32, Clinton Gormley cl...@traveljury.com wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

kimchy · February 26, 2012, 9:40am

Getting the index settings will return the analyzers used, though in a key value format (not the json one). Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Wednesday, February 22, 2012 at 1:47 PM, Tobias Wallenqvist wrote:

I when ahead and added an asciifolding analyzer to my config file and
it worked great, i had to reindex all my index before it worked.

I have a question though, is it possible to see analyzers that are
active via some command, like _mapping or so?

Have a great day and thanx for the tip.

/Tobias

On 20 Feb, 12:32, Clinton Gormley <cl...@traveljury.com (http://traveljury.com)> wrote:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.

You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...

clint

Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

Barsk · February 28, 2012, 1:45pm

Clinton Gormley skrev 2012-02-20 12:32:

Hi Tobias

I have a problem where i have words (strings) indexed that contains
special characters (Ã©,Ã» ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
You need to create a custom analyzer which uses the ascii folding token
filter
Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

Is it possible to map Ã¨ as e and Ã» as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with Ã¨ and don't get a hit due to it containing en regular e.

Hope someone have had the same problem..

/Tobias

There is another approach to the problem. The icu_folding filter
(Elasticsearch Platform — Find real-time answers at scale | Elastic)
does the same thing, but more thorough. The asciiFoldingFilter is a
light approach to the problem. icu_folding is also faster since it uses
a compiled lookup table for the conversion.

I just contributed a fix to the icu_folding filter that enables it to
skip national characters (like Ã¥Ã¤Ã¶) in case you would like to be able to
search those as-is. icu_folding by default will remove ALL diacritics
including those for Ã¥Ã¤Ã¶ -> aao. The characters to skip is configured
with a unicodeSetFilter parameter to the icu_folding filter meaning it
can be tuned to your requirements.

See my other post for more info on that. Hopefully it may end up in the
master branch if it is accepted.

P.S I actually have a version of the asciiFoldingFilter that adds
filtering too if needed, but it can be regarded as obsolete now D.S

/Kristian

Topic		Replies	Views
"query_string" Wildcard search with special characters issue Elasticsearch	4	3327	December 2, 2020
ES cannot search for special characters when using wildcard search Elasticsearch	4	339	December 6, 2022
URL with special characters when searched not working in ElasticSearch 5.2.2 Elasticsearch	3	2029	September 7, 2017
Which character/special character break Elastic Search on index time as well as Search? Elasticsearch	5	775	July 6, 2017
Searching on special characters Elasticsearch	3	2930	July 6, 2017

Wildcard characters

Related topics