Multi-Field vs Multi-Index

Hi Everybody. I have the following structure for my "tweet" typed documents: { text: "a short text of the tweet", country: "iso code for country of origin for the tweet"}.
My queries will search by text and filter by countries. Some example filers would be: country X or all country except X.
Problems start when taking into account the analyzers for the text column. I will need different analyzers for different countries, as the language will be different.

One option I have now is to have text.[country] fields for each country, each with it's own analyzer and analyze every tweet with each analyzer. When searching then, I will have a proxy that modifies the query to take into account the country, as I dont want the country list to stay fluid and be able to add countries on the fly, without many modifications to the query. So I search for text:"abc" and country:"UK" and the proxy transforms the query in match "abc" and filter on country: "UK".

The second option would be to keep a different index for each country and use wildcards when searching. For all countries except uk I could use +*-UK. I should also mention here that each country will also have some data that will be stored only on tweets from that country. It's a must and they won't match data from other countries to be generalised. I then can use a different analyzer for every index and get around that problem.

Those are my two options, but they both have another problem: If I do it by country, so I can filter it, then I will need to have one analyzer for, lets say, Canada. People in Canada will tweet in both english and french, and I'll have no idea which language it is.

Which option would you guys go for? And how would you solve the complexity added by multiple languages in one country as well?

Thank you!

1 Like

You can search for text.* and still apply your filter by country. Make sure to use the best_field type in the multi_match query.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.