Im trying to find the best way to receive in the resultset all the results that being made in a language plus the results in another language.
For example if i search for black, i want that Elasticsearch retrieves all the matches with black, and all the matches with black in another language, like German, Italian, Spanish or other that could be in the data.
My question is, what is the best option to do this?
Hi Antonio,
You are touching a vast subjet which is cross-language search (or Cross Language Information Retrieval in academic speech). I think the hardest part is managing translation, which is beyond elasticsearch's scope.
As a first glance, you could have 2 approaches:
you have n indices. Each index has its language and assiciated mapping/analyzers. You translate your input queries in n languages and you launch n queries. you get n result lists and you present a tabbed resut page (1 tab per language)
you sitll have n indices (or one combined index, does not make functional difference). At index time, in a pipeline for instance, you extract significant text of your non-english docs, and send it to a translation service. you put the result in an "english_text" field you've added in your mapping. At search time, you run only one query and you are able to display a combined result list which your users can sort or filter as they please.
Of course there are other approaches (semantic vector based...) and this greatly depends on your constraints, volumes, requirements...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.