Hi I'm working on a project with elasticseach. I have an index with 1 million phrases inside and I want to retrieve phrases from the index which match with some query phrases. The phrases are in italian and I use an italian anlyzer in order to analyze them. Everything works fine but the problem is in the order (and the score) of the matches: ideally I want to get as the first matches the exact matches of the query phrases but that's not happening.
For example:
searching in my index for phrases containing the words "film cortometraggio" the first match is:
Pappi Corsicato Ha diretto film , cortometraggi , documentari e videoclip.
And then there is the match:
Robinet aviatore Robinet aviatore è un filmcortometraggio del 1911 diretto da Luigi Maggi;
In this case the first phrase contains the second word ("cortometraggio") in a plural form, instead the second phrase contains an exact match but the similarity algorithm gives a higher score for the first phrase.
I am using the default BM25 algorithm and I also tried the boolean algorithm but the problem does not solve.
How can I can the similarity measure in order to get the matches in the correct order?
This seems to work fine but since I am using the Java Rest High Level Client I have to "translate" from DSL into Java but I'm having troubles on using the should clause (don't know how to insert the second match_phrase). Do you have any suggestions?
Thank you very much
So you have 3 conditions here.
But in your query you put only 2.
I believe you need to add another querie(s).
If you don't succeed, please provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.
A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.
But it works fine in this case and not in many others.
For example: if I'm searching for "bobina del film" I'd like to retrieve:
An exact match like:
La bobina contenente il film incompiuto e mai uscito nelle sale venne distrutta durante
un bombardamento
A phrase that is not an exact match but matches the input text like:
Il film,un cortometraggio in una bobina, fu distribuito dalla General Film Company e uscì in sala
A phrase that matches with words resulting from stemming like:
Il 30 maggio 2013 James Bobin viene scelto come regista del film, il cui titolo di lavorazione è "Alice"
This three cases must be in this order but I can't achieve it. I'm trying with the should-match_phrase combination but I don't know how to specify these three different cases.
Becuase the boolean similarity algorithm does not work fine for me I went back to BM25 but still I have similar problems.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.