Implementation of multi lingual search

Amit_Soni · October 30, 2013, 7:58am

Hi everyone - We have been exploring what it would take to implement multi
lingual search solution using ElasticSearch. In case anyone has already
done it, it would be great to hear their experience on the same. A few
questions I have is:

Would one have to create separate index for each language?
If the index is separate, would it be a very different mapping?
Would the queries be constructed in different ways for different
languages?

I am just keen to listen to the key considerations to be taken into account
when thinking of implementing it.

Thanks much!

-Amit.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 30, 2013, 8:16am

You do not need separate indexes. Language can be per field (or even
mixed into a single field).
You can assign each field different analyzers. If you use index types,
there is nothing to prevent you from setting up a field "content", and
assign english analyzer to it in index type "english", german analyzer in
index type "german", french analyzer in index type "french" and so on. You
can also use minimal or no stemming at all and use a single field for all
languages. It can be useful if you do not know what languages you have to
index. You can also use the langdetect plugin in that case and attach the
language code in the doc for search filter. This totally depends on your
requirements.
No.

You do not mention the biggest challenge for multilingual search. It is
language independent normalization and case folding for robust search. The
ICU analysis plugin is very valuable for this

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Itamar_Syn_Hershko · October 30, 2013, 8:18am

It all really, really depends on your content and business requirements,
and the amounts of data you have.

For us it makes sense to have everything in one index but use different
analyzer for each document based on the main language detected for the
text. But it's just our way of doing that.

On Wed, Oct 30, 2013 at 9:58 AM, Amit Soni amitsoni29@gmail.com wrote:

Hi everyone - We have been exploring what it would take to implement multi
lingual search solution using Elasticsearch. In case anyone has already
done it, it would be great to hear their experience on the same. A few
questions I have is:

Would one have to create separate index for each language?

If the index is separate, would it be a very different mapping?

Would the queries be constructed in different ways for different
languages?

I am just keen to listen to the key considerations to be taken into
account when thinking of implementing it.

Thanks much!

-Amit.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Using a different analyzer for each query and same index Elasticsearch	3	378	July 6, 2017
Indexing for multi-language support Elasticsearch	5	2993	July 5, 2017
Supporting as many languages as possible Elasticsearch	1	338	July 6, 2017
Multi-lingual ES Elasticsearch	9	1182	July 6, 2017
Multi language support on Same Index Elasticsearch	2	782	July 6, 2017

Implementation of multi lingual search

Related topics