Scoring for a full text search with ngram filter


(Jérôme Avoustin) #1

Hi there !

I'm actually working on a full text search (and quite new to it).
I'd like to give to the user the results which best fit the query string, and do it as the user types characters.
I've created a multi match query like this:

GET /streams/_search
{
  "query": {
    "multi_match": {
        "query":  "event",
        "fields": [ "*_keywords", "name^2" ]
    }
  }
} 

I've also created a custom autocomplete analyzer, exactly like the one described here (except I used a ngram filter instead of edge ngram):
https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html

I have declared the analyzer in the index, and used it in the mapping of my document type, as described in the article.
When calling the search as mentioned above, I obtained the results I'm looking for, but I'd like the score to be more accurate.
Let's say I create these documents:

PUT /streams/stream/123
{
  "name": "Event test"
}

PUT /streams/stream/456
{
  "name": "Eventually consistent"
}

PUT /streams/stream/789
{
  "name": "Another Event"
}

And that I'm looking for the term "event", the three documents obtain the same score (either in best fields or in most fields query type). I can understand it.
But I would like to improve the score so that document with the name "Eventually consistent" has a lower score than the others. Previously, I was using only the default analyzer (with no autocomplete possibility) and the score was different depending on the weight of the expression is in the field.

So my question is : How can I have a more accurate score in this situation?

I'm quite new to ES, so I might have missed something important...

Thanks !


(David Pilato) #2

I'd probably index the same field using different strategies (using multi fields):

  • standard (where it produces exact terms)
  • ngrams
  • whatever...

Then I'd use a bool query with 2 should clauses. The first one with a boost of 3.0 for example would use the "standard" strategy. The second clause would use ngrams but with no boost. Or a boost of 0.5 for example.

Makes sense?


(Jérôme Avoustin) #3

Yes, that might help !
I actually didn't know we could make such a query (I'm really a newbie :slight_smile:)
I'll try that!

Thanks David!


(Jérôme Avoustin) #4

So I created the multi-fields using standard and ngram strategies.
I didn't use a bool query, and kept a multi match query like this:

GET /streams/_search
{
  "query": {
    "multi_match": {
        "query":  "event",
        "fields": [ "*_keywords^3", "name^6" , "*_keywords.autocomplete", "name.autocomplete^2"]
    }
  }
}

And I had a very interesting result, getting more accurate scores.
I'll keep on using it for some time, and see how it behaves.

Thanks !


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.