[ANNOUNCEMENT] Elasticsearch analysis lemmagen plugin update


(Vojtech Hyza) #1

Hi,

back in 2013 I wrote plugin which provides jLemmaGen lematizer with some prebuilt lexicons as elasticsearch token filter. As it turned out, lexicon license was very restrictive. The plugin was usable only for non-commercial research projects. You can take a look at the original thread [ANN] LemmaGen Analysis for ElasticSearch plugin.

Some time ago I found that source data MULTEXT-East free lexicons 4.0 are distributed under Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) which I believe means, that we can generate lexicons from this source, publish them with the same license (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/) and use them with the plugin.

For this reason I removed built-in lexicons from the plugin (beginning with the plugin v6.0.0) and prepared separate repository for the lexicons.

  • free lexicons (CC BY-SA 4.0)
    • Bulgarian
    • Czech
    • English
    • Estonian
    • French
    • Hungarian
    • Romanian
    • Slovak
    • Resian (sl dialect)
    • Slovene
    • Ukrainian
  • non-free lexicons (CC BY-NC 4.0)
    • Farsi / Persian
    • Macedonian
    • Polish
    • Russian
    • Serbian

I also updated plugin to work well with (almost) all elasticsearch 5.x and 6.x versions. But with the beginning of the version 6.0.0 there is need to download particular lexicon from lexicons repository.

I believe this breaking change will allow us to use this elasticsearch plugin even for commercial projects (with the free lexicons).

More information can be found at elasticsearch-analysis-lemmagen and lemmagen-lexicons repositories.

Regards,
Vojta


(Vojtech Hyza) #2

Version for the new elasticsearch 6.3.0 released https://github.com/vhyza/elasticsearch-analysis-lemmagen/releases/tag/v6.3.0