Using analyzers based on language detection


(Nitin Maheshwari) #1

Hi,

I am using langdetect plugin to dynamically assign the analyzer at index
time.

PUT test
POST test/article/_mapping
{
"article" : {
"_analyzer" : {
"path" : "description.lang"
},
"properties" : {
"description" : { "type" : "langdetect" }
}
}
}

Langdetect plugin detects the language as 'en', 'fr', 'de', and so on. so
the analyzers should be defined as 'en', etc. This make them less
descriptive and the context of analyzer is lost. Is it possible to derive a
more descriptive name, such that _analyzer is resolve to 'en_icu_analyzer',
instead of just 'en'?

Something like... (this does not work), this is just what i want to
achieve.

article" : {
"_analyzer" : {
"path" : "description.lang" + "_icu_analyzer"
}

Another question - is it possible to define analyzer dynamically at field
level rather than at the index level?

Thanks,
Nitin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc1b9d30-0200-411c-8bba-8939c7f87fe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

This requires a change in the langdetect plugin, feel free to open an issue
at

It should be trivial to wrap something like a String.format() around the
language name by config request.

Jörg

On Thu, Aug 28, 2014 at 8:42 AM, Nitin Maheshwari ask4nitin@gmail.com
wrote:

Hi,

I am using langdetect plugin to dynamically assign the analyzer at index
time.

PUT test
POST test/article/_mapping
{
"article" : {
"_analyzer" : {
"path" : "description.lang"
},
"properties" : {
"description" : { "type" : "langdetect" }
}
}
}

Langdetect plugin detects the language as 'en', 'fr', 'de', and so on. so
the analyzers should be defined as 'en', etc. This make them less
descriptive and the context of analyzer is lost. Is it possible to derive a
more descriptive name, such that _analyzer is resolve to 'en_icu_analyzer',
instead of just 'en'?

Something like... (this does not work), this is just what i want to
achieve.

article" : {
"_analyzer" : {
"path" : "description.lang" + "_icu_analyzer"
}

Another question - is it possible to define analyzer dynamically at field
level rather than at the index level?

Thanks,
Nitin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc1b9d30-0200-411c-8bba-8939c7f87fe3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fc1b9d30-0200-411c-8bba-8939c7f87fe3%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHDq%3DX4%3D19YKFV4rnX%3DKeoRSwm%3D6sjSNVya_A8hbXeZ_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nitin Maheshwari) #3

Done: https://github.com/jprante/elasticsearch-langdetect/issues/20

Thanks.

On Thursday, 28 August 2014 12:12:26 UTC+5:30, Nitin Maheshwari wrote:

Hi,

I am using langdetect plugin to dynamically assign the analyzer at index
time.

PUT test
POST test/article/_mapping
{
"article" : {
"_analyzer" : {
"path" : "description.lang"
},
"properties" : {
"description" : { "type" : "langdetect" }
}
}
}

Langdetect plugin detects the language as 'en', 'fr', 'de', and so on. so
the analyzers should be defined as 'en', etc. This make them less
descriptive and the context of analyzer is lost. Is it possible to derive a
more descriptive name, such that _analyzer is resolve to 'en_icu_analyzer',
instead of just 'en'?

Something like... (this does not work), this is just what i want to
achieve.

article" : {
"_analyzer" : {
"path" : "description.lang" + "_icu_analyzer"
}

Another question - is it possible to define analyzer dynamically at field
level rather than at the index level?

Thanks,
Nitin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b085839f-91d5-4d81-bd17-091b9c317324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4