Bulgarian stemming

Bulgarian stemmer should stem the same way these words, but it does not.
биография => "биограф"
биографията => "биографи"
The correct is "биограф".
The problem is huge because it happens in very common case of stemming.
I wrote the document used to implement the algorithm and there the logic looks correct. I guess there is some mistake in implementation.

Thanks a lot

Elasticsearch uses analyzers from Lucene. Here is the source code of this specific bulgarian stemmer.

Feel free to file an issue in Lucene or create your own patch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.