Now, I have a name field with an index analyzer of autocomplete. When I store the name field with a value of, say "abcdef", the autocomplete analyzer will store them as tokens of "a", "ab", "abc"... "abcdef".
If I have documents with values "abc", "abcd", "abcde", they can all be found.
However, I want to be able to score my result in such a way that if I search for "abc", the document with the source value "abc" will rank higher than "abcd" and "abcde". However all things being equal all three results will have the same score.
Is there a way I can structure my index analyzer or search analyzer so that I don't lose the benefit of autocomplete but is able to influence the search result rank?
There are lots of ways! First I should point out that you should look at the completion suggestion for autocomplete. I don't know as much about it other than that it was written for autocomplete. I use edgeNGram for it like you do.
Ok, that out of the way there are a couple of things you could do:
Do a bool query where the exact match and the ngram are both in should clauses. Boost the exact match.
Add a function_score query that multiplies the score by some value derived from the length. Something like 1/length or length/(length + 1) or something. The longer matches would get sorted to the end. Mostly.
Use some external popularity metric.
Switch from sort: relevance to sort: some_custom_value.
We use 1 and 3. You can see that by going to en.wikipedia.org and typing "a" into the search box on the upper right. "A", the exact match, is first. "Australia" is second because it has the most incoming links. Its nowhere near perfect but it gets the job done.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.