Searching and filtering i18n terms


(Fabrice Hong) #1

Hello,

tldr;

How to match and filter localized search with a localized index ?

long version

I have an application where the user search must be done in the context of it's language.

In elastic search index, I want documents with both i18n properties and non i18n properties (I want to avoid creating multiple index, one for each language).

The mapping of the document should look like :

'entry': {
'properties': {
'name' : {'type': 'string'}, /* unlocalized properties /
'category': { /
localized properties */
"properties" : {
"lang_fr" : {
"type" : "string"
},
"lang_de" : {
"type" : "string"
}
}
},
}
}

having that, I have two requirements:

  1. when doing a search, exclude from search the localized fields that are not concerned by the user language (let's say the user's language is 'fr', I want to exclude 'de' fields from search. How to do this without specifying the entire list of fields I want to search on. To start simple, I tried this but it doesn't work :
    {"query":
    {
    "match": {
    '*.lang_fr': full_text
    }
    }
    }

However, "'categories.lang_fr': full_text" works well. But I don't want to maintain the list of fields in the query. I want a general rule like you can do in SolR.

  1. when I retrieve my results, I want to filter out all localized fields that doesn't corresponds to my user language. In other words, using the source filter, I'd like to have all unlocalized fields, exclude all fields starting with "lang" , but include all fields being 'lang_fr'. I tried the following but it doesn't work:

{
"source": {
"include": [ "", ".lang_de" ],
"exclude": [ "*.lang
*" ],
}
...
}

the wildcard operator doesn't seems to work. I partially have that I want if I specify "categories.lang_de", but again, I don't want to maintain the list of fields, I want a generic rule. The include/exclude operation doesn't work as I would like. The only thin that actually work is a query where I specify all languages to excludes for all fields specifically, such as :

{
"_source": {
"exclude": [ "categories.lang_de", "categories.lang_en", "categories.lang_it",
"another_field.lang_de", "catanother_fieldgories.lang_en", "another_field.lang_it"],
}
...
}

for 'fr' search.

I'm quite surprised I couldn't find anything on google. I see it as a very standard case of i18n applied to elasticsearch. Maybe I'm modelizing i18n the wrong way in ES ?

thank you in advance !


(system) #2