Scenario: I have 2 Presentations, with the titles as follows:
Example 1:
Presentation #1: "Bake Cake Bake bake cake cake cake bake"
Presentation #2: "Bake Cake"
Now, when I search with the text 'Bake cake', the result must return with the most appropriate match, ie. Presentation #2 should be listed at the top/first.
Example 2:
Presentation #1: "Bake Cake Bake bake cake cake cake bake"
Presentation #2: "Bake Cake"
Presentation #3: "bake A cake"
Now, when I search with the text 'Bake cake', the result must return with the most appropriate match, ie. Presentation #2 should be listed at the top/first, followed by Presentation #3, and last should be Presentation #1.
I have only limited experience with Elasticsearch, so I could be wrong in the following statements.
I feel you're problem can't be solved because we can either have keyword matching or text matching.
Keyword matching is exact match and when you search for "Cake a Bake" it doesn't match any presentation title because of missing 'a'. However, you can analyse the text before indexing and searching to remove stop words but still won't gurantee to work for all scenarios.
Text matching relevancy is all about (tf * idf). The more frequent a word appears in a document, the more relevant it is for your search. So, going by text matching for "Cake Bake", it matches the presentation title with more frequency of these words.
I would like to see from others if there is a solution to your problem
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.