Multilinqual queries in ElasticSearch


(Pavel) #1

Let's say we have the following mapping in ElasticSearch.

{
"content": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"locale_container": {
"type": "object",
"properties": {
"english": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"german": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"russian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"italian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
}
}
}
}
}
}

When a particular user queries the index, we can take her culture from
her settings, i.e. we know which analyzer to use. How can we formulate
a query which will search only "title" and "text" fields in her own
language (let's say, German) and use German analyzer to tokenize the
search query?


(David Pilato) #2

What happens if you make search on content.locale_container.english.text
field or content.locale_container.german.text ?
Something like :

{
"query" : {
"term" : { "content.locale_container.english.text" : "my english
words" }
}
}

HTH
David

-----Message d'origine-----
De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Pavel
Envoyé : mercredi 14 septembre 2011 05:03
À : elasticsearch
Objet : Multilinqual queries in ElasticSearch

Let's say we have the following mapping in ElasticSearch.

{
"content": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"locale_container": {
"type": "object",
"properties": {
"english": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"german": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"russian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"italian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
}
}
}
}
}
}

When a particular user queries the index, we can take her culture from
her settings, i.e. we know which analyzer to use. How can we formulate
a query which will search only "title" and "text" fields in her own
language (let's say, German) and use German analyzer to tokenize the
search query?


(Jan Fiedler) #3

A term query (as shown above) would not use an analyzer (I think). You
probably need to go with the text query family:

{
"query" : {
"text" : { "content.locale_container.english.text" : "my english
words" }
}
}

Based on your mapping, ES would lookup the correct search analyzer for the
field 'content.locale_container.english.text' and use it to analyze your
query 'my english words'. In other words, the correct analyzer is identified
based on the field you query on (via the mapping definition).


(system) #4