Multi language index, documents performance?


(bryan rasmussen) #1

Hi,

Assuming I have an index of documents where any document can be one of 12 languages.

therefore I have fields like

sw_name, da_name...
sw_description, da_description...
sw_categories, da_categories (arrays with their own fields sw_category, da_category...)
obviously any document has only the fields relevant to its language.

If I am doing a multi-field query then I have basically a query with 36 possible fields.
There can then be performance issues related to query size, I am wondering if there are performance issues related to, for example all the sw_ fields needing to be queried in documents where there are only da_ fields?

I'm just wondering about this because whenever someone wants to handle multiple language indexes, it seems the suggestion is to make a field for each language - it doesn't seem like that solution would scale beyond a few languages so I would like to know before I start off on that path.

Thanks,
Bryan Rasmussen


(Ali Beyad) #2

Hi Bryan,

How about having a separate index for each language? That would generally be the best way to handle documents in multiple languages, especially if you know the language ahead of time, and judging by the fact that you have separate fields populated for each language's field, it seems like you would know the language.

There will be some more performance overhead for having to search multiple fields when the term does not exist in that field. But it would be a similar performance overhead if searching against many indices at the same time.


(system) #3