Folding of accented to non-accented only — leaving symbols

Lee_Gee · October 13, 2014, 7:30pm

I now the asciifolding filter docs are really very clear on this, but it
took me an embarrassingly long time to realise I was losing my currency
symbol (£) to the ASCII folding filter.

Other than creating my own character map with the char map filter, does
there exist something of production quality that would translate accented
UTF8 characters of the Latin-alphabet into non-accented characters in the
ASCII range?

TIA
Lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff95c6ec-7907-454e-bd58-774ee173f4e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandre_Rafalovitc · October 13, 2014, 11:29pm

You are probably looking for ICU Folding which is part of ICU plugin:
GitHub - elastic/elasticsearch-analysis-icu: ICU Analysis plugin for Elasticsearch . It's not
explained in details on that page, but you can see a long list of
normalizations from the Lucene's Javadoc:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/icu/ICUFoldingFilter.html

Overall, the explanation language is a little hairy and you may need
to chase through the Unicode pages, but it should be the
production-ready approach in the end.

Regards,
Alex.

On 13 October 2014 15:30, Lee Gee leegee@gmail.com wrote:

I now the asciifolding filter docs are really very clear on this, but it
took me an embarrassingly long time to realise I was losing my currency
symbol (£) to the ASCII folding filter.

Other than creating my own character map with the char map filter, does
there exist something of production quality that would translate accented
UTF8 characters of the Latin-alphabet into non-accented characters in the
ASCII range?

TIA
Lee

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ff95c6ec-7907-454e-bd58-774ee173f4e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEFAe-H-pePOqU6t4B0uD6iyeBdQ%3Dd6Wh498HJgv-M3W4crJsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Asciifolding character filter Elasticsearch	4	795	July 6, 2017
Problems with Ascii Folding text with Accents Elasticsearch	4	1647	July 5, 2017
Configurable ASCIIFolding and CharReplace filters done Elasticsearch	8	1409	July 6, 2017
Indexing non-English text Elasticsearch	11	2733	July 6, 2017
Match queries and ASCII folding Elasticsearch	2	393	December 20, 2022

Folding of accented to non-accented *only* — leaving symbols

Related topics

Folding of accented to non-accented only — leaving symbols