I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
You need to create a custom analyzer which uses the ascii folding token
filter
clint
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...
clint
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...
clint
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
I have a problem where i have words (strings) indexed that contains
special characters (é,û ex). The problem comes when the users search
for the exact word, but containing e and u, and they don't get any
hits.
You need to create a custom analyzer which uses the ascii folding token
filterhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/a...
clint
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
Is it possible to map è as e and û as u, Is there a solution in
elastic or should i re-parse the data before adding it to the index.
But then i will have the opposite problem where some users search for
the word with è and don't get a hit due to it containing en regular e.
Hope someone have had the same problem..
/Tobias
There is another approach to the problem. The icu_folding filter
(Elasticsearch Platform — Find real-time answers at scale | Elastic)
does the same thing, but more thorough. The asciiFoldingFilter is a
light approach to the problem. icu_folding is also faster since it uses
a compiled lookup table for the conversion.
I just contributed a fix to the icu_folding filter that enables it to
skip national characters (like åäö) in case you would like to be able to
search those as-is. icu_folding by default will remove ALL diacritics
including those for åäö -> aao. The characters to skip is configured
with a unicodeSetFilter parameter to the icu_folding filter meaning it
can be tuned to your requirements.
See my other post for more info on that. Hopefully it may end up in the
master branch if it is accepted.
P.S I actually have a version of the asciiFoldingFilter that adds
filtering too if needed, but it can be regarded as obsolete now D.S
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.