[ANN] LemmaGen Analysis for ElasticSearch plugin

Hello,

I wrote plugin which provides jLemmaGen (https://bitbucket.org/hlavki/jlemmagen) lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3

If you are interested, source code is located at GitHub https://github.com/vhyza/elasticsearch-analysis-lemmagen

Regards,
Vojta

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Vojta,

Very interesting. Do you think you can include license info for both the
code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.vojtech@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbucket) lemmatizer with 14 prebuilt
European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source
Lemmatisation) - http://lemmatise.ijs.si/Software/Version3

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languages

Regards,
Vojta

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but prebuilt
trees are not
(http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html). So I
guess whole plugin can be licensed under Apache 2 license
(Added LICENSE · vhyza/elasticsearch-analysis-lemmagen@144f449 · GitHub)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both the
code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" <hyza.v...@gmail.com <javascript:>>
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source
Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txt) it sounded as if any
work built on top of "RESOURCES" could be used for "Research Purpose" only
and "[...] excludes the development or commercial exploitation of any
product or prototype product incorporating, using or based upon RESOURCES
or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about the
licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.vojtech@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but prebuilt
trees are not (
http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html). So
I guess whole plugin can be licensed under Apache 2 license (
Added LICENSE · vhyza/elasticsearch-analysis-lemmagen@144f449 · GitHub
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both the
code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (Bitbucket
jlemmagenhttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source
Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees are
not affected by license. And on the mail page of LemmaGen project
(http://lemmatise.ijs.si/) is

• it is free - open source licence for all the code included in the project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are part
of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about the
licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza <hyza.v...@gmail.com <javascript:>

wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (
http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Added LICENSE · vhyza/elasticsearch-analysis-lemmagen@144f449 · GitHubhttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both the
code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (Bitbucket
jlemmagenhttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source
Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not sure
this is a valid according the the license V4 of "RESOURCES" for any other
use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/Software)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot for
the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.vojtech@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees are
not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/) is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about the
licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both
the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (Bitbucket
jlemmagenhttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open Source
Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not sure
this is a valid according the the license V4 of "RESOURCES" for any other
use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot for
the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza <hyza.v...@gmail.com <javascript:>

wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees
are not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about the
licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both
the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Vojta,

by @hlavki's email I meant
http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.vojtech@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not sure
this is a valid according the the license V4 of "RESOURCES" for any other
use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot for
the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees
are not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about the
licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both
the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains only
some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from author
of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are NOT
part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all content
of Bitbucket according to Apache License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant
http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza <hyza.v...@gmail.com <javascript:>>wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not
sure this is a valid according the the license V4 of "RESOURCES" for any
other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot for
the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees
are not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

  • it is free - open source licence for all the code included in the
    project,
  • multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about
the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for both
the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Michal,

thats great! Thank you for your work and clarification.

Regards,
Vojta

On Wednesday, November 27, 2013 10:25:39 AM UTC+1, Michal Hlaváč wrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains only
some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from author
of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are NOT
part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all content
of Bitbucket according to Apache License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant
http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not
sure this is a valid according the the license V4 of "RESOURCES" for any
other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees
are not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about
the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21331b73-faac-413d-98ef-21f7d61b2d58%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Michal,

this sounds like a good news.
Tough, I am still confused (sorry, please bear with me). We can eventually
take this discussion off list. The following are my concerns:

  1. Don't we need confirmation from MULTEXT-East license holder(s) instead
    of LemmaGen author?

  2. When you say ".lem is not a dictionary" does it mean it is not a
    "product" in terms of "[...] based upon RESOURCES or any parts of RESOURCES
    "?

Just to make it clear, I am not asking about validity of LemmGen license
[in my opinion it might be questionable but it is not important], I am
questioning directly about license of the dictionaries and work based on
any parts of it.

Regards,
Lukáš

On Wed, Nov 27, 2013 at 10:25 AM, Michal Hlaváč hlavki@hlavki.eu wrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains only
some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from author
of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are NOT
part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all content
of Bitbucket according to Apache License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not
sure this is a valid according the the license V4 of "RESOURCES" for any
other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/Softw
arehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt trees
are not affected by license. And on the mail page of LemmaGen project (
http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons are
part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East V4
license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about
the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYofqVbzhpJK6YZbDM5kOJeSopUVkPuQVKMENVa4d09Zw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Cau Lukas,

you are right, I'll ask them.

m.

On Wednesday, November 27, 2013 11:00:53 AM UTC+1, Lukáš Vlček wrote:

Hi Michal,

this sounds like a good news.
Tough, I am still confused (sorry, please bear with me). We can eventually
take this discussion off list. The following are my concerns:

  1. Don't we need confirmation from MULTEXT-East license holder(s) instead
    of LemmaGen author?

  2. When you say ".lem is not a dictionary" does it mean it is not a
    "product" in terms of "[...] based upon RESOURCES or any parts of
    RESOURCES"?

Just to make it clear, I am not asking about validity of LemmGen license
[in my opinion it might be questionable but it is not important], I am
questioning directly about license of the dictionaries and work based on
any parts of it.

Regards,
Lukáš

On Wed, Nov 27, 2013 at 10:25 AM, Michal Hlaváč <hla...@hlavki.eu<javascript:>

wrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains
only some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from author
of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are
NOT part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all
content of Bitbucket according to Apache
License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not
sure this is a valid according the the license V4 of "RESOURCES" for any
other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt
trees are not affected by license. And on the mail page of LemmaGen project
(http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons
are part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East
V4 license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about
the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Software/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7c67d09-5df9-4fb3-984d-56160e5e916e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Cau Lukas,

unfortunately, you were right :slight_smile: As Tomaz from Dept. of Knowledge
Technologies, Jozef Stefan Institute wrote:

"the overall MULTEXT-East licence does indeed require only non-commercial
use, even for derivatives. And there is no problem with the Slovene one."

So, it means I have to exclude all .lem files from Apache License 2.0

sorry for confusion, m.

On Friday, November 29, 2013 11:00:53 AM UTC+1, Michal Hlaváč wrote:

Cau Lukas,

you are right, I'll ask them.

m.

On Wednesday, November 27, 2013 11:00:53 AM UTC+1, Lukáš Vlček wrote:

Hi Michal,

this sounds like a good news.
Tough, I am still confused (sorry, please bear with me). We can
eventually take this discussion off list. The following are my concerns:

  1. Don't we need confirmation from MULTEXT-East license holder(s) instead
    of LemmaGen author?

  2. When you say ".lem is not a dictionary" does it mean it is not a
    "product" in terms of "[...] based upon RESOURCES or any parts of
    RESOURCES"?

Just to make it clear, I am not asking about validity of LemmGen license
[in my opinion it might be questionable but it is not important], I am
questioning directly about license of the dictionaries and work based on
any parts of it.

Regards,
Lukáš

On Wed, Nov 27, 2013 at 10:25 AM, Michal Hlaváč hla...@hlavki.eu wrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains
only some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from
author of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are
NOT part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all
content of Bitbucket according to Apache
License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find @hlavki's
mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the data
himself and is not using those found in C# project ("I obtained also
licenced dictionaries to build rules tree for 15 languages. Dictionaries
are licenced, but prebuilded trees don't."). In which case I am not
sure this is a valid according the the license V4 of "RESOURCES" for any
other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt
trees are not affected by license. And on the mail page of LemmaGen project
(http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in the
project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons
are part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East
V4 license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused about
the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.comwrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Soft
ware/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6628a358-54b8-4767-ba67-a633081b0c7d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hmm... yea. That is sad. Thanks for getting this cleared out.
I think MULTEXT-East had different license(s) in the past (even GNU) but
unfortunately I see many interesting language packages (and especially
dictionaries) from universities are not really open source (and Apache 2
License) friendly. Not sure why, in the end of the day the research is
funded from public taxes.

Regards,
Lukas

On Mon, Dec 2, 2013 at 10:31 AM, Michal Hlaváč hlavki@hlavki.eu wrote:

Cau Lukas,

unfortunately, you were right :slight_smile: As Tomaz from Dept. of Knowledge
Technologies, Jozef Stefan Institute wrote:

*"the overall MULTEXT-East licence does indeed require only non-commercial
use, even for derivatives. *And there is no problem with the Slovene one.
"

So, it means I have to exclude all .lem files from Apache License 2.0

sorry for confusion, m.

On Friday, November 29, 2013 11:00:53 AM UTC+1, Michal Hlaváč wrote:

Cau Lukas,

you are right, I'll ask them.

m.

On Wednesday, November 27, 2013 11:00:53 AM UTC+1, Lukáš Vlček wrote:

Hi Michal,

this sounds like a good news.
Tough, I am still confused (sorry, please bear with me). We can
eventually take this discussion off list. The following are my concerns:

  1. Don't we need confirmation from MULTEXT-East license holder(s)
    instead of LemmaGen author?

  2. When you say ".lem is not a dictionary" does it mean it is not a
    "product" in terms of "[...] based upon RESOURCES or any parts of
    RESOURCES"?

Just to make it clear, I am not asking about validity of LemmGen license
[in my opinion it might be questionable but it is not important], I am
questioning directly about license of the dictionaries and work based on
any parts of it.

Regards,
Lukáš

On Wed, Nov 27, 2013 at 10:25 AM, Michal Hlaváč hla...@hlavki.euwrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains
only some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from
author of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are
NOT part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all
content of Bitbucket according to Apache
License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find
@hlavki's mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the
data himself and is not using those found in C# project ("I
obtained also licenced dictionaries to build rules tree for 15 languages.
Dictionaries are licenced, but prebuilded trees don't."). In which
case I am not sure this is a valid according the the license V4 of
"RESOURCES" for any other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt
trees are not affected by license. And on the mail page of LemmaGen project
(http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in
the project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons
are part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East
V4 license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused
about the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.comwrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Soft
ware/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/grou
ps/opt_out.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6628a358-54b8-4767-ba67-a633081b0c7d%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZGKFMWHbo7ErYWSy5jzw7kG3uNrFWu-WphiE2Sh3LBsw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ou, that's a pity. In that case I guess this plugin is pretty useless,
because even for free dictionary download (located at
Index of /ME/download) there is README with following text:

"This directory contains the download files of the MULTEXT-East
Resouces that are available for research use, according to the license
agreement at http://nl.ijs.si/ME/license/mteV4-license.html"

So, it seems to me, that is "academic sandbox" with no practical use (in
real project) and with no ability to buy licence...I think its really
weird, because as Lukas said, its funded using public taxes...

Anyway, thank you for making this clear.

On Monday, December 2, 2013 10:31:21 AM UTC+1, Michal Hlaváč wrote:

Cau Lukas,

unfortunately, you were right :slight_smile: As Tomaz from Dept. of Knowledge
Technologies, Jozef Stefan Institute wrote:

*"the overall MULTEXT-East licence does indeed require only non-commercial
use, even for derivatives. *And there is no problem with the Slovene one.
"

So, it means I have to exclude all .lem files from Apache License 2.0

sorry for confusion, m.

On Friday, November 29, 2013 11:00:53 AM UTC+1, Michal Hlaváč wrote:

Cau Lukas,

you are right, I'll ask them.

m.

On Wednesday, November 27, 2013 11:00:53 AM UTC+1, Lukáš Vlček wrote:

Hi Michal,

this sounds like a good news.
Tough, I am still confused (sorry, please bear with me). We can
eventually take this discussion off list. The following are my concerns:

  1. Don't we need confirmation from MULTEXT-East license holder(s)
    instead of LemmaGen author?

  2. When you say ".lem is not a dictionary" does it mean it is not a
    "product" in terms of "[...] based upon RESOURCES or any parts of
    RESOURCES"?

Just to make it clear, I am not asking about validity of LemmGen license
[in my opinion it might be questionable but it is not important], I am
questioning directly about license of the dictionaries and work based on
any parts of it.

Regards,
Lukáš

On Wed, Nov 27, 2013 at 10:25 AM, Michal Hlaváč hla...@hlavki.euwrote:

Hi Vojta & Lukas,

I understand your confusion and it's quite simple. All binaries are
licensed under Apache License 2.0 including .lem binary files in
lemmagen-lang.jar
.lem file is java serialized rule tree for lemmatization and contains
only some little fragments from original dictionary.
This implies that .lem is not dictionary. I have confirmation from
author of LemmaGen project about it.
Dictionaries licensed by http://nl.ijs.si/ME/V4/mteV4-licence.txt are
NOT part of project and are stored only on my local computer.
I use them only to build .lem files. So result is, you can use all
content of Bitbucket according to Apache
License 2.0

If you will have any question, feel free to ask

thanks, miso

On Tuesday, November 26, 2013 1:46:03 PM UTC+1, Lukáš Vlček wrote:

Vojta,

by @hlavki's email I meant http://lucene.472066.n3.
nabble.com/JLemmaGen-project-td4097466.html

On Tue, Nov 26, 2013 at 1:19 PM, vhyza hyza.v...@gmail.com wrote:

Yes, thats exactly why I'm confused right now. I didn't find
@hlavki's mail, so I tried to ask for help on his Twitter account.

Thank you for the help.

Regards,
Vojta

On Tuesday, November 26, 2013 11:59:41 AM UTC+1, Lukáš Vlček wrote:

Vojta,

based on what @hlavki says in his email it seems he prebuilt the
data himself and is not using those found in C# project ("I
obtained also licenced dictionaries to build rules tree for 15 languages.
Dictionaries are licenced, but prebuilded trees don't."). In which
case I am not sure this is a valid according the the license V4 of
"RESOURCES" for any other use than "Research Purposes".

Looking further on the C# implementation (http://lemmatise.ijs.si/
Softwarehttp://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware&sa=D&sntz=1&usg=AFQjCNFQA5dLuwbv13lQxSq1HhflSFd3xQ)
it seems to me that they mostly refer to Multext-East V3 or older datasets
(which might have different license).

In any case, further clarification might be useful. And thanks a lot
for the effort.

Regards,
Lukas

On Tue, Nov 26, 2013 at 11:22 AM, vhyza hyza.v...@gmail.com wrote:

Hi Lukas,

hmm, I understood from the post I sent before, that the prebuilt
trees are not affected by license. And on the mail page of LemmaGen project
(http://lemmatise.ijs.si/http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2F&sa=D&sntz=1&usg=AFQjCNHjzKSH2ZIVLNxScUwZAO19GLcfQA)
is

• it is free - open source licence for all the code included in
the project,
• multilingual support - currently 12 different languages included,

So I'm little bit confused now.

Anyway I'll try to write email to Michal Hlaváč (prebuilt lexicons
are part of his lemmagen-lang.jar package).

Regards,
Vojta

On Tuesday, November 26, 2013 10:49:02 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

I am not a layer but from my initial investigation of MULTEXT-East
V4 license (http://nl.ijs.si/ME/V4/mteV4-licence.txthttp://www.google.com/url?q=http%3A%2F%2Fnl.ijs.si%2FME%2FV4%2FmteV4-licence.txt&sa=D&sntz=1&usg=AFQjCNHHrDWmY7Y0nF1UiCvDfG46OoKRxQ)
it sounded as if any work built on top of "RESOURCES" could be used for
"Research Purpose" only and "[...] excludes the development or commercial
exploitation of any product or prototype product incorporating, using or
based upon RESOURCES or any parts of RESOURCES."

Hopefully, I am just missing something. I feel a bit confused
about the licensing though (but not saying this is your fault!).

Regards,
Lukáš

On Tue, Nov 26, 2013 at 10:02 AM, vhyza hyza.v...@gmail.comwrote:

Hi Lukas,

according to the jLemmaGen author, dictionaries are licensed, but
prebuilt trees are not (http://lucene.472066.n3.nabbl
e.com/JLemmaGen-project-td4097466.htmlhttp://www.google.com/url?q=http%3A%2F%2Flucene.472066.n3.nabble.com%2FJLemmaGen-project-td4097466.html&sa=D&sntz=1&usg=AFQjCNG9Slni8XEbwJomTdnIaF3skhe4qg).
So I guess whole plugin can be licensed under Apache 2 license (
Updated for elasicsearch 8.6.1 · vhyza/elasticsearch-analysis-lemmagen@1411478 · GitHub
144f4490493c9d973683f6a611acb7bdfbee15cchttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen%2Fcommit%2F144f4490493c9d973683f6a611acb7bdfbee15cc&sa=D&sntz=1&usg=AFQjCNHOKjDTgRO0f2qgM4IAqiFHXzbSIw
)

Regards,
Vojta

On Tuesday, November 26, 2013 8:37:50 AM UTC+1, Lukáš Vlček wrote:

Hi Vojta,

Very interesting. Do you think you can include license info for
both the code and linguistic resources?

Regards,
Lukáš
Dne 25.11.2013 22:04 "Vojtech Hyza" hyza.v...@gmail.com
napsal(a):

Hello,

I wrote plugin which provides jLemmaGen (
Bitbuckethttps://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fhlavki%2Fjlemmagen&sa=D&sntz=1&usg=AFQjCNEhai9tb0up5-WAOpmGFjtuicqGrg)
lemmatizer with 14 prebuilt European lexicons.

jLemmaGen is Java implementation of LemmaGen (Multilingual Open
Source Lemmatisation) - http://lemmatise.ijs.si/Soft
ware/Version3http://www.google.com/url?q=http%3A%2F%2Flemmatise.ijs.si%2FSoftware%2FVersion3&sa=D&sntz=1&usg=AFQjCNE-hO-oJoFYEcknXhybrZO2McEsEw

If you are interested, source code is located at GitHub
GitHub - vhyza/elasticsearch-analysis-lemmagen: Elasticsearch lemmatizer for 15 languageshttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fvhyza%2Felasticsearch-analysis-lemmagen&sa=D&sntz=1&usg=AFQjCNE28TXXPDe2847oZcxduUSuUnJWAg

Regards,
Vojta

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/grou
ps/opt_out.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ecf60f5-6f21-4710-8904-6d2871c40dd4%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a803c9cd-d1d3-4ae3-9daa-081d620f285a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.