queryNorm is defeating the purpose of scoring


(bruun) #1

I'm unsure of whether or not I'm trying to wrangle elasticsearch to do
something it was not supposed to, but here we go.

We have an index (example_index) of documents (example) of the
following prototype (I've excluded unimportant fields):

{
    "tag": ["apple", "fruit", "red", ..], 
    "popularity": 96,
    "average_rating": 3.4
    ..
}

We're currently porting the search functionality from a set of SQL database
queries, and in a very simplistic form, the search for the query string
"apple fruit" result in a query like the following (actually, there's a
down scoring of duplicate terms, but this is just an example to keep
complexity low):

{
    "query": {
        "custom_score": {
            "query": {
                "bool": {
                    "should": [{
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "apple fruit"
                                }
                            },
                            "boost": "16"
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "apple"
                                }
                            },
                            "boost": "8"
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "fruit"
                                }
                            },
                            "boost": "8"
                        }
                    }]
                }
            },
            "script": "_score * 4.0 + doc['popularity'].value * 5.0 + 

doc['average_rating'].value * 2.0"
}
}
}

As you can probably tell, the boost factor is being used to score different
levels of matches and to make the match scoring fit with the ranges of the
additional scoring variables (the popularity and average_rating fields).

However, this does not function as expected, as the boost factor for tag
term is effectively negated by the query normalization factor as can be
seen form the following explanatory hit:

{
    "_explanation": {
        "description": "custom score, product of:",
        "details": [{
            "description": "script score function: product of:",
            "details": [{
                "description": "sum of:",
                "details": [{
                    "description": "ConstantScore(tag:apple)^95.0, 

product of:",
"details": [{
"description": "boost",
"value": 95.0
}, {
"description": "queryNorm",
"value": 0.007443229
}],
"value": 0.70710677
}, {
"description": "ConstantScore(tag:fruit)^95.0,
product of:",
"details": [{
"description": "boost",
"value": 95.0
}, {
"description": "queryNorm",
"value": 0.007443229
}],
"value": 0.70710677
}],
"value": 1.4142135
}],
"value": 3406.5686
}, {
"description": "queryBoost",
"value": 1.0
}],
"value": 3406.5686
},
"_id": "45550",
"_index": "example_index",
"_node": "_TRFral5Q7SUf0myvkL73g",
"_score": 3406.5686,
"_shard": 1,
"_source": {
"average_rating": 5.0,
"popularity": 65,
"tag": ["fruit", "food", "apple"],
..
},
"_type": "example"
}

Naturally, I'd expect there to be a way to avoid the query normalization
(and qutie possibly also a more clever way of doing what I'm attempting to
do here) but my digging through the Lucene and elasticsearch documentation
hasn't turned up any obvious solutions.

Can anyone provide a pointer as to what I should be doing either in terms
of changing my approach or fixing the scoring problem?

Best regards,
Nick Bruun

--


(bruun) #2

It seems that part of the solution can be found in using tlrx' great
example over at GitHub
on https://github.com/tlrx/elasticsearch-custom-similarity-provider

I'm still very open for input.

On Friday, September 21, 2012 10:05:14 PM UTC+2, Nick Bruun wrote:

I'm unsure of whether or not I'm trying to wrangle elasticsearch to do
something it was not supposed to, but here we go.

We have an index (example_index) of documents (example) of the
following prototype (I've excluded unimportant fields):

{
    "tag": ["apple", "fruit", "red", ..], 
    "popularity": 96,
    "average_rating": 3.4
    ..
}

We're currently porting the search functionality from a set of SQL
database queries, and in a very simplistic form, the search for the query
string "apple fruit" result in a query like the following (actually,
there's a down scoring of duplicate terms, but this is just an example to
keep complexity low):

{
    "query": {
        "custom_score": {
            "query": {
                "bool": {
                    "should": [{
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "apple fruit"
                                }
                            },
                            "boost": "16"
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "apple"
                                }
                            },
                            "boost": "8"
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "term": {
                                    "tag": "fruit"
                                }
                            },
                            "boost": "8"
                        }
                    }]
                }
            },
            "script": "_score * 4.0 + doc['popularity'].value * 5.0 + 

doc['average_rating'].value * 2.0"
}
}
}

As you can probably tell, the boost factor is being used to score
different levels of matches and to make the match scoring fit with the
ranges of the additional scoring variables (the popularity and
average_rating fields).

However, this does not function as expected, as the boost factor for tag
term is effectively negated by the query normalization factor as can be
seen form the following explanatory hit:

{
    "_explanation": {
        "description": "custom score, product of:",
        "details": [{
            "description": "script score function: product of:",
            "details": [{
                "description": "sum of:",
                "details": [{
                    "description": "ConstantScore(tag:apple)^95.0, 

product of:",
"details": [{
"description": "boost",
"value": 95.0
}, {
"description": "queryNorm",
"value": 0.007443229
}],
"value": 0.70710677
}, {
"description": "ConstantScore(tag:fruit)^95.0,
product of:",
"details": [{
"description": "boost",
"value": 95.0
}, {
"description": "queryNorm",
"value": 0.007443229
}],
"value": 0.70710677
}],
"value": 1.4142135
}],
"value": 3406.5686
}, {
"description": "queryBoost",
"value": 1.0
}],
"value": 3406.5686
},
"_id": "45550",
"_index": "example_index",
"_node": "_TRFral5Q7SUf0myvkL73g",
"_score": 3406.5686,
"_shard": 1,
"_source": {
"average_rating": 5.0,
"popularity": 65,
"tag": ["fruit", "food", "apple"],
..
},
"_type": "example"
}

Naturally, I'd expect there to be a way to avoid the query normalization
(and qutie possibly also a more clever way of doing what I'm attempting to
do here) but my digging through the Lucene and elasticsearch documentation
hasn't turned up any obvious solutions.

Can anyone provide a pointer as to what I should be doing either in terms
of changing my approach or fixing the scoring problem?

Best regards,
Nick Bruun

--


(Clinton Gormley) #3

Hi Nick

We have an index (example_index) of documents (example) of the
following prototype (I've excluded unimportant fields):

{
    "tag": ["apple", "fruit", "red", ..], 
    "popularity": 96,
    "average_rating": 3.4
    ..
}

We're currently porting the search functionality from a set of SQL
database queries, and in a very simplistic form, the search for the
query string "apple fruit" result in a query like the following
(actually, there's a down scoring of duplicate terms, but this is just
an example to keep complexity low):

Lets simplify your example query, to see what is actually going on. I'm
removing the custom score and the extra boosts, so just running this
query:

{
"query" : {
"bool" : {
"should" : [
{
"constant_score" : {
"query" : {
"term" : {
"tag" : "apple fruit"
}
}
}
},
{
"constant_score" : {
"query" : {
"term" : {
"tag" : "apple"
}
}
}
},
{
"constant_score" : {
"query" : {
"term" : {
"tag" : "fruit"
}
}
}
}
]
}
},
"explain" : 1
}

The first thing to note is that a term query on 'apple fruit' cannot
match, as the terms that you have stored are 'apple', 'fruit', 'red'.

So only 2 of your 3 boolean clauses ('apple', 'fruit','apple fruit' )
can match.

Now, let's look at the explanation of the above query:

"_explanation" : {
"description" : "product of:",
"value" : 0.76980036,
"details" : [
{
"description" : "sum of:",
"value" : 1.1547005,
"details" : [
{
"description" : "ConstantScore(tag:apple), product of:",
"value" : 0.57735026,
"details" : [
{
"value" : 1,
"description" : "boost"
},
{
"value" : 0.57735026,
"description" : "queryNorm"
}
]
},
{
"description" : "ConstantScore(tag:fruit), product of:",
"value" : 0.57735026,
"details" : [
{
"value" : 1,
"description" : "boost"
},
{
"value" : 0.57735026,
"description" : "queryNorm"
}
]
}
]
},
{
"value" : 0.6666667,
"description" : "coord(2/3)"
}
]
}

You should not be bothered about queryNorm. It is a way of normalising
the score for multiple parts of the query, and will have the same value
for all results.

You wrapped each of those term queries in a constant score query which
means that each clause returns a score of 1 (I removed the boosting),
and you can see the resulting score at the end:

      "value" : 0.6666667,
      "description" : "coord(2/3)"

2 of your 3 clauses matched, giving you a score of 2/3.

So what were you actually trying to achieve?

Did you want documents with both 'apple' and 'fruit' to rank higher than
documents with only one of the terms?

Should your 'tags' field ignore field norms and term frequencies
completely?

If these assumptions are correct, then I would do the following:

  1. Set the mapping for the tags field to disable norms:

    {
    type: "string",
    index: "not_analyzed",
    omit_norms: true,
    omit_term_freq_and_positions: true
    }

  2. Use a simple bool query:

    {
    "query" : {
    "terms" : {
    "tags" : [ "apple", "fruit" ]
    }
    }
    }

The more matching terms a document has, the higher it will score. The
'terms' query gets rewritten to a 'bool' query, so if you want more
control over the boosting, you can rewrite it yourself.

http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html

You may also want to look at the custom_filters_score query, which gives
you more direct control over the final scores of your docs, using
filters.

http://www.elasticsearch.org/guide/reference/query-dsl/custom-filters-score-query.html

Then, the popularity and average_rating script. You can use it as you
are doing, but it is not terribly efficient. The upside of using a
script is that you can change the formula at query time., but the
results of your script are not cached, so es has to do a lot of work on
every query.

Once you settle on the "correct" formula, I suggest storing that value
in a _boost field, which will be factored into the scoring automatically
(and efficiently!).

http://www.elasticsearch.org/guide/reference/mapping/boost-field.html

Naturally, I'd expect there to be a way to avoid the query
normalization (and qutie possibly also a more clever way of doing what
I'm attempting to do here) but my digging through the Lucene and
elasticsearch documentation hasn't turned up any obvious solutions.

Can anyone provide a pointer as to what I should be doing either in
terms of changing my approach or fixing the scoring problem?

elasticsearch and lucene are powerful tools, and most requirements have
already been thought of. Things that you would do manually in a
relational DB are often provided out of the box in es. So before
looking at writing custom similarity providers, get to know the query
DSL - you'll probably find an easy way to do it.

And if you need help with it, email the list explaining what it is you
want to achieve ("my data looks like this, i want it ordered like this")
and we'll be happy to help.

clint

--


(bruun) #4

Hi Clint,

Absolutely brilliant response. I think you solved five more problems than I
initially thought I had, among which is actually an oddity in my own
approach to scoring.

I appreciate the effort and will definitely be back, if I get stuck again.

Thank you ever so much,
Nick

--


(system) #5