Getting _score 0 with Function Score Query, with explain


(Thiago Locatelli) #1

Hi All,

I am trying to understand the reason as to why ElasticSearch would return me a hit with _score 0 even though my search caused the document to be matched. I also have a custom score function that would not allow 0 as _score unless the original _score is 0.

My query is below:

`{
  "size": 250,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "title": {
                  "query": "Amy Schumer",
                  "type": "phrase_prefix"
                }
              }
            },
                {
              "match": {
                "episodeTitle": {
                  "query": "Amy Schumer",
                  "type": "phrase_prefix"
                }
              }
            },
            {
              "match": {
                "actors": {
                  "query": "Amy Schumer",
                  "type": "phrase_prefix",
                }
              }
            },
            {
              "match": {
                "directors": {
                  "query": "Amy Schumer",
                  "type": "phrase_prefix"
                }
              }
            },
            {
              "match": {
                "genres": {
                  "query": "Amy Schumer",
                  "type": "phrase_prefix"
                }
              }
            }
          ]
        }
      },
      "script_score": {
        "script": "_score * (_source.type == 'PEOPLE' ? 30 : 1) * (_source.type == 'SERIES' ? 80 : 1) * (_source.type == 'GENRE' ? 40 : 1) * (_source.type == 'LINEAR' ? _source.distance : 1) * (_source.type == 'MOVIE' ? 2.5 : 1) * (_source.type == 'MOVIE' && _source.newMovie ? 10 : 1)"
        }
      }
    },
      "highlight" : {
        "order": "score",
        "fields" : {
            "title" : {"type" : "plain"},
            "episodeTitle" : {"type" : "plain"},
            "genres" : {"type" : "plain"},
            "actors" : {"type" : "plain"},
            "directors" : {"type" : "plain"}
        }
      }
  }`

So my function score works pretty much like this: Genres, Actors and Tv Series should get a high score, so if I search for "Amy Schumer", the actor should come first and then the movie Trainwreck, since she was an actor in this movie. if I search for "Drama", the genre Drama comes first and then movies, programs and Tv Series with Drama in the genre. If I search for "Jim Parsons", I would get first the Actor, then the TV Series and everything else that matches my query after.

The problem is that by searching for "Amy Schumer", Trainwreck gets _score ZERO, but it shouldnt since its a new movie, so the original score should be multiplied by 2.5 (because its a movie) and 10 (because its a new movie).

This is the pastebin for my explain result for Trainwreck: http://pastebin.com/Hb9SCFpr . I would post here but only 5000 chars are allowed.

Thank you
Thiago


(Christoph) #2

Hi Thiago,

I might not completely understand your script, but there's this factor (_source.type == 'LINEAR' ? _source.distance : 1) that looks like it could be 0 to me. All depending of course how your documents look like. I'm afraid the explain output is not really helpful in this case for debugging what happens when the script is calculated. All I can see is that the original query "_score" goes in and 0 comes out.

Having said that, why don't you simply use different boost factors on the different queries? I think you should be able to do the same as you do with the script now, and its potentially much faster.


(Thiago Locatelli) #3

Hi @cbuescher,

First of all, thank you for your reply.

I am running a search against a alias that points to different indexes: TV Guide, Movies Database, Tv Series, Genres and Celebrities. LINEAR means the document belongs to the TV Guide index, so _source.distance is an inverted distance that means: the bigger the value of distance, the closer the event is to happen from now. So if a TV program starts in an hour from now, it will have its score multiplied by the distance value, which is a a number between 20 and 1 (never zero).

My function needs to change the _score based on a type attribute (since _type is not available for custom functions) I have for each of my documents, I am not sure how I can use boost and _source.type at the same time since the boost is applied on each of the queries.

Thank you
Thiago


(Christoph) #4

I see, have you ever tried Index Boost for that? Might save you some trouble with scripting. Just a thought.


(Thiago Locatelli) #5

Hi @cbuescher,

Thank you for that. I will take a look. I am using spring-data-elastic search, hopefully spring has added this support.

I am still dealing with the problem that a search using "amy trainwreck" is not returning anything since "amy" matches actor and "train" should partially match the title "trainwreck". I am having a hard time trying to find the best search type in my case. I am using phrase_prefix but I think I am on the wrong path.

Thank you
Thiago


(Thiago Locatelli) #6

Hi @cbuescher,

I was looking at spring-data-elasticsearch and it seems its just a wrapper for the elasticsearch java library, so the support should be in there, but it seems it is not. I was able to use indices_boost in Marvel and got some pretty good results, but now putting this in java seems not to be supported.

Do you have any idea?
Thank you
Thiago

EDIT 1
The elasticsearch java client has support to indices_boost but spring-data-elasticsearch doesnt, so I created a feature request to add the new featuare.

https://jira.spring.io/browse/DATAES-216


(Christoph) #7

Hi,

as I'm not so familiar with spring-data-elasticsearch I can't tell if they provide this functionality anywhere else. Good that you tried it with marvel though, creating that issue seems the right way forward to me.


(Thiago Locatelli) #8

@cbuescher Just FYI, my function was returning zero because _score was failing. I replaced _score by _score.score() and now I dont get 0 _score.


(system) #9