Multilingual index options: _analyzer or multiple mappings or?


(Otis Gospodnetić) #1

Hi,

What's the best way to handle an index with multilingual docs (e.g., 1
lang per doc, but not all docs in the same language)?

Is explicitly specifying the language-specific analyzer via _analyzer
for each field at index time the best approach?
See http://www.elasticsearch.org/guide/reference/mapping/analyzer-field.html

Or would this work better:

curl -XPOST localhost:9200/test -d '{
"mappings" : {
"english" : {
"properties" : {
"title" : { "type" : "string", "index" : "analyzed" },
"pubDate" : { "type" : "date", "index" : "analyzed" }
}
},
"french" : {
"properties" : {
"title" : { "type" : "string", "index" : "analyzed" },
"pubDate" : { "type" : "date", "index" : "analyzed" }
}
}

}

}'

Or maybe there is some other option I'm missing?

Thanks,
Otis

Sematext is hiring world-wide -- http://sematext.com/about/jobs.html


Multiple Languages against single attribute
(Clinton Gormley) #2

Hi Otis

What's the best way to handle an index with multilingual docs (e.g., 1
lang per doc, but not all docs in the same language)?

Is explicitly specifying the language-specific analyzer via _analyzer
for each field at index time the best approach?
See http://www.elasticsearch.org/guide/reference/mapping/analyzer-field.html

I'd say this is your best option, so when you index your doc, you can
just add a field that indicates which analyzer to use, and any field
which doesn't have a specific analyzer set will use this "dynamic"
analyzer instead.

clint


(system) #3