Asciifolding character filter

Mathijs_Biesmans · January 19, 2015, 6:18pm

I'm curious whether there exists an asciifolding character filter, I know
there is a asciifolding token filter and that the analysis chain works as
follows: input text > char_filter > tokenizer > token filter > output
tokens.

The text on
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
mentions: [...]With Western languages, this can be done with the
asciifolding character filter.[...], though the url says
asciifolding-token-filter. An error in the docs?

I also checked the icu-plugin: the icu_normalizer can be used both as a
character filter and a token filter. But the icu_folding filter is only
available as a token filter (that actually incorporates the icu_normalizer).

I'm generating ngrams and shingles, so it seems more logical to aplpy
ascii/icu folding as a character filter. But I can't find one?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · January 19, 2015, 7:39pm

Hey, cool idea. That's fairly easy to implement. I've just added a char
folding char filter into my version of ICU plugin

Jörg

On Mon, Jan 19, 2015 at 7:18 PM, Mathijs Biesmans <
mathijs.biesmans@gmail.com> wrote:

I'm curious whether there exists an asciifolding character filter, I
know there is a asciifolding token filter and that the analysis chain
works as follows: input text > char_filter > tokenizer > token filter >
output tokens.

The text on
Elasticsearch Platform — Find real-time answers at scale | Elastic
mentions: [...]With Western languages, this can be done with the
asciifolding character filter.[...], though the url says
asciifolding-token-filter. An error in the docs?

I also checked the icu-plugin: the icu_normalizer can be used both as a
character filter and a token filter. But the icu_folding filter is only
available as a token filter (that actually incorporates the icu_normalizer).

I'm generating ngrams and shingles, so it seems more logical to aplpy
ascii/icu folding as a character filter. But I can't find one?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF8BYg%2BqzHfodFyp913Cf-NhbvwqHFwRwV34RFJafbW9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mathijs_Biesmans · January 20, 2015, 9:18am

Thanks Jörg. So I guess the character filter didn't existed.

Will this be pushed to official releases? Currently I'm using a hosted
ES-cluster, and I don't want to install custom plugins...

Op maandag 19 januari 2015 20:39:50 UTC+1 schreef Jörg Prante:

Hey, cool idea. That's fairly easy to implement. I've just added a char
folding char filter into my version of ICU plugin

adding ICU folding char filter · jprante/elasticsearch-plugin-bundle@e4294cc · GitHub

Jörg

On Mon, Jan 19, 2015 at 7:18 PM, Mathijs Biesmans <mathijs....@gmail.com
<javascript:>> wrote:

I'm curious whether there exists an asciifolding character filter, I
know there is a asciifolding token filter and that the analysis chain
works as follows: input text > char_filter > tokenizer > token filter >
output tokens.

The text on
Elasticsearch Platform — Find real-time answers at scale | Elastic
mentions: [...]With Western languages, this can be done with the
asciifolding character filter.[...], though the url says
asciifolding-token-filter. An error in the docs?

I also checked the icu-plugin: the icu_normalizer can be used both as
a character filter and a token filter. But the icu_folding filter is
only available as a token filter (that actually incorporates the
icu_normalizer).

I'm generating ngrams and shingles, so it seems more logical to aplpy
ascii/icu folding as a character filter. But I can't find one?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5abd890b-1caf-46a8-afd7-c36a0b79bb05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · January 20, 2015, 11:38pm

I can't tell. Official elasticsearch ICU plugin is lagging behind Lucene
5.0, ICUCollationKeyAnalyzer / collation key field type, API deprecation
updates etc. so I hope it soon will take up pace again to get ready for new
features.

Jörg

On Tue, Jan 20, 2015 at 10:18 AM, Mathijs Biesmans <
mathijs.biesmans@gmail.com> wrote:

Thanks Jörg. So I guess the character filter didn't existed.

Will this be pushed to official releases? Currently I'm using a hosted
ES-cluster, and I don't want to install custom plugins...

Op maandag 19 januari 2015 20:39:50 UTC+1 schreef Jörg Prante:

Hey, cool idea. That's fairly easy to implement. I've just added a char
folding char filter into my version of ICU plugin

Update README.adoc · jprante/elasticsearch-plugin-bundle@164387f · GitHub
e4294cc0f4d45dabf50d840713820f8eb57152b6

Jörg

On Mon, Jan 19, 2015 at 7:18 PM, Mathijs Biesmans mathijs....@gmail.com
wrote:

I'm curious whether there exists an asciifolding character filter, I
know there is a asciifolding token filter and that the analysis chain
works as follows: input text > char_filter > tokenizer > token filter >
output tokens.

The text on Elasticsearch Platform — Find real-time answers at scale | Elastic
current/asciifolding-token-filter.html mentions: [...]With Western
languages, this can be done with the asciifolding character filter.[...],
though the url says asciifolding-token-filter. An error in the docs?

I also checked the icu-plugin: the icu_normalizer can be used both as
a character filter and a token filter. But the icu_folding filter is
only available as a token filter (that actually incorporates the
icu_normalizer).

I'm generating ngrams and shingles, so it seems more logical to aplpy
ascii/icu folding as a character filter. But I can't find one?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1fe8fcec-7d9b-4b92-ad29-d4a7289de8dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5abd890b-1caf-46a8-afd7-c36a0b79bb05%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5abd890b-1caf-46a8-afd7-c36a0b79bb05%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHs1vgPoBgpB7KrfGu4AqikyC_XNWYuu8G77J4vrB%3D_FQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Configurable ASCIIFolding and CharReplace filters done Elasticsearch	8	1409	July 6, 2017
Folding of accented to non-accented only — leaving symbols Elasticsearch	2	340	July 6, 2017
Problems with Ascii Folding text with Accents Elasticsearch	4	1647	July 5, 2017
Lang (czech) analyzer with asciifolding tokenizer or icu_tokenizer Elasticsearch	10	1144	July 6, 2017
Question about asciifolding filter Elasticsearch	3	549	July 6, 2017

Asciifolding character filter

Related topics