Question about _boost result ordering

Kellan · December 20, 2011, 8:50pm

I have documents that look something like:

{ authorId: "9c76e24a8586f3389b2e9758", _boost: 2.54631, keywords:
[suspense, mystery], <other values ...>}

I have noticed that when I search by authorId, the results are roughly
ordered by the boost value, but something else is contributing to the
final _score for sorting. The documents all have only 1 author, so the
match is exact and there isn't anything else in the author field to
skew the result ordering. In one case, it seems that documents with
fewer keywords are getting a small boost. Any ideas on why this might
be happening? The mapping for keywords is:

        keywords: {
            type: "string",
            store: "no",
            index: "analyzed",
            analyzer: "snowball"
        },

while all other fields are defaulted. My query is:

query: {
    term: {
        authorId: "9c76e24a8586f3389b2e9758"
    }
}

Kellan

Clinton_Gormley · December 21, 2011, 9:25am

Hi Kellan

{ authorId: "9c76e24a8586f3389b2e9758", _boost: 2.54631, keywords:
[suspense, mystery], <other values ...>}

I have noticed that when I search by authorId, the results are roughly
ordered by the boost value, but something else is contributing to the
final _score for sorting. The documents all have only 1 author, so the
match is exact and there isn't anything else in the author field to
skew the result ordering. In one case, it seems that documents with
fewer keywords are getting a small boost. Any ideas on why this might
be happening? The mapping for keywords is:
        keywords: {
            type: "string",
            store: "no",
            index: "analyzed",
            analyzer: "snowball"
        },
while all other fields are defaulted.

You should probably set your authorId to {"index: "not_analyzed"}
because it is a fixed value, you don't want it to be analyzed at all.

My query is:

query: {
    term: {
        authorId: "9c76e24a8586f3389b2e9758"
    }
}

The score is calculated from a number of values, including:

the boost that you specified
how frequently your term appears in all your docs (eg
'smith' appears very frequently, and so is less important
than 'gormley'
how frequently the term appears in the field
what percentage of the field consists of your term

Two options here:

You could use a filter to search for authorId (all authorId values
would have _score = 1).

But you're specifying a custom boost per doc, so presumably you're
wanting some authors to be more important than others.

In this case, you should set the authorId field to {omit_norms: true}.

You can read more about norms here:

clint

Kellan · December 21, 2011, 3:56pm

Clint,

Thanks for the suggestion of using "not_analyzed".

I tried the "omit_norms" suggestion. But this led to even more
confusing behavior i.e. the 10 search results all had a score of
either 8.836764 or 8.300338 and it seemed to have nothing to do with
the _boost value.

The score is calculated from a number of values, including:

the boost that you specified

how frequently your term appears in all your docs (eg
'smith' appears very frequently, and so is less important
than 'gormley'

how frequently the term appears in the field

what percentage of the field consists of your term

I'm not trying to boost some authors more than others. Rather, I'm
trying to boost some documents more than others (even by the same
author). I guess if I search for a single author, it seems like the
results should be sorted purely by the boost value as there is nothing
else to make the search prefer one document over another.

One thing is very peculiar ... often documents with different boost
values have exactly the same _score (at least to 5 decimal places).
This seems to happen much more often than coincidence would suggest.

Kellan

Topic		Replies	Views
Why aren't my string _boost values sorting correctly? Elasticsearch	3	392	July 6, 2017
Question about boost and scoring Elasticsearch	2	446	July 6, 2017
Boosting a field yields bizarre results Elasticsearch	4	860	July 5, 2017
Sort by only '_boost' Elasticsearch	6	358	July 6, 2017
Sort by document boost Elasticsearch	8	3853	July 6, 2017

Question about _boost result ordering

Related topics