Implementation of multi lingual search


(Amit Soni) #1

Hi everyone - We have been exploring what it would take to implement multi
lingual search solution using ElasticSearch. In case anyone has already
done it, it would be great to hear their experience on the same. A few
questions I have is:

  1. Would one have to create separate index for each language?
  2. If the index is separate, would it be a very different mapping?
  3. Would the queries be constructed in different ways for different
    languages?

I am just keen to listen to the key considerations to be taken into account
when thinking of implementing it.

Thanks much!

-Amit.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2
  1. You do not need separate indexes. Language can be per field (or even
    mixed into a single field).

  2. You can assign each field different analyzers. If you use index types,
    there is nothing to prevent you from setting up a field "content", and
    assign english analyzer to it in index type "english", german analyzer in
    index type "german", french analyzer in index type "french" and so on. You
    can also use minimal or no stemming at all and use a single field for all
    languages. It can be useful if you do not know what languages you have to
    index. You can also use the langdetect plugin in that case and attach the
    language code in the doc for search filter. This totally depends on your
    requirements.

  3. No.

You do not mention the biggest challenge for multilingual search. It is
language independent normalization and case folding for robust search. The
ICU analysis plugin is very valuable for this

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Itamar Syn-Hershko) #3

It all really, really depends on your content and business requirements,
and the amounts of data you have.

For us it makes sense to have everything in one index but use different
analyzer for each document based on the main language detected for the
text. But it's just our way of doing that.

On Wed, Oct 30, 2013 at 9:58 AM, Amit Soni amitsoni29@gmail.com wrote:

Hi everyone - We have been exploring what it would take to implement multi
lingual search solution using ElasticSearch. In case anyone has already
done it, it would be great to hear their experience on the same. A few
questions I have is:

  1. Would one have to create separate index for each language?
  2. If the index is separate, would it be a very different mapping?
  3. Would the queries be constructed in different ways for different
    languages?

I am just keen to listen to the key considerations to be taken into
account when thinking of implementing it.

Thanks much!

-Amit.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4