Ukrainian official language analyzers in elasticsearch

When (roadmap) Ukrainian will be supported officially by #elasticsearch and in which version. @elastic https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

1 Like

Elasticsearch basically exposes whichever languages analyzers are provided
by Lucene. If a language is not supported by Lucene, it would be best to
start there for contributing.

https://lucene.apache.org/core/5_4_1/analyzers-common/overview-summary.html

There might be 3rd party plugins for such an analyzer. These plugins are
generally not supported by Elasticsearch.

Thanks Ivan.
I do not find Lucene roadmap either for language implementation.
In fact, I am interested on the following languages
Ukrainian, Hebrew and Bahasa.
Seems Lucene does not have those languages today.
Do you know, if future versions of Lucene will bring those languages?

The Lucene mailing lists are the right place to bring that up. There exists a AGPL Hebrew plugin you can try. It does lemmatization rather than stemming, iirc.

If you want to write an Elasticsearch plugin with more language support that is totally possible. Making a great plugin would be hard but making one that is much better than nothing is not so hard. So long as the language uses spaces to separate words.

You can try this : https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer