Scoring autcomplete (edgeNGram) results

miloconway · July 9, 2015, 1:37pm

I have a name field where I am applying the following filter and analyzer:

"filter": {
    "autocomplete_filter": {
        "min_gram": "1",
        "type": "edgeNGram",
        "max_gram": "50"
    },
    ...
}
...
"analyzer": {
    "autocomplete": {
        "type": "custom",
        "filter": [
            "lowercase",
            "autocomplete_filter"
        ],
        "tokenizer": "standard"
    },
    ...
}

Now, I have a name field with an index analyzer of autocomplete. When I store the name field with a value of, say "abcdef", the autocomplete analyzer will store them as tokens of "a", "ab", "abc"... "abcdef".

If I have documents with values "abc", "abcd", "abcde", they can all be found.

However, I want to be able to score my result in such a way that if I search for "abc", the document with the source value "abc" will rank higher than "abcd" and "abcde". However all things being equal all three results will have the same score.

Is there a way I can structure my index analyzer or search analyzer so that I don't lose the benefit of autocomplete but is able to influence the search result rank?

nik9000 · July 9, 2015, 1:57pm

There are lots of ways! First I should point out that you should look at the completion suggestion for autocomplete. I don't know as much about it other than that it was written for autocomplete. I use edgeNGram for it like you do.

Ok, that out of the way there are a couple of things you could do:

Do a bool query where the exact match and the ngram are both in should clauses. Boost the exact match.
Add a function_score query that multiplies the score by some value derived from the length. Something like 1/length or length/(length + 1) or something. The longer matches would get sorted to the end. Mostly.
Use some external popularity metric.
Switch from sort: relevance to sort: some_custom_value.

We use 1 and 3. You can see that by going to en.wikipedia.org and typing "a" into the search box on the upper right. "A", the exact match, is first. "Australia" is second because it has the most incoming links. Its nowhere near perfect but it gets the job done.

miloconway · July 9, 2015, 4:44pm

I was thinking of trying approach 2 but it seemed to introduce a lot of extra overhead (enabling dynamic scripting, etc)

1 seems like a much better approach, I'll try that.

Thanks!

nik9000 · July 9, 2015, 4:58pm

If you are a reasonably modern version of Elasticsearch you can use the lucene expression language: Scripting | Elasticsearch Guide [8.11] | Elastic

Its fully sandboxed and enabled by default.

miloconway · July 10, 2015, 12:12am

Thanks, will take a look!

Topic		Replies	Views
Elasticsearch - how to make shorter phrase more relevant in result Elasticsearch	2	624	September 13, 2019
Improve scoring of search results for a multi-field, weighted Elasticsearch query Elasticsearch	1	476	December 16, 2019
Scoring for a full text search with ngram filter Elasticsearch	4	2305	January 6, 2017
Auto complete Elasticsearch	9	599	July 6, 2017
edgeNGram filter prefix scoring precedence Elasticsearch	1	338	July 6, 2017

Scoring autcomplete (edgeNGram) results

Related topics