I have noticed that when I search by authorId, the results are roughly
ordered by the boost value, but something else is contributing to the
final _score for sorting. The documents all have only 1 author, so the
match is exact and there isn't anything else in the author field to
skew the result ordering. In one case, it seems that documents with
fewer keywords are getting a small boost. Any ideas on why this might
be happening? The mapping for keywords is:
I have noticed that when I search by authorId, the results are roughly
ordered by the boost value, but something else is contributing to the
final _score for sorting. The documents all have only 1 author, so the
match is exact and there isn't anything else in the author field to
skew the result ordering. In one case, it seems that documents with
fewer keywords are getting a small boost. Any ideas on why this might
be happening? The mapping for keywords is:
Thanks for the suggestion of using "not_analyzed".
I tried the "omit_norms" suggestion. But this led to even more
confusing behavior i.e. the 10 search results all had a score of
either 8.836764 or 8.300338 and it seemed to have nothing to do with
the _boost value.
The score is calculated from a number of values, including:
the boost that you specified
how frequently your term appears in all your docs (eg
'smith' appears very frequently, and so is less important
than 'gormley'
how frequently the term appears in the field
what percentage of the field consists of your term
I'm not trying to boost some authors more than others. Rather, I'm
trying to boost some documents more than others (even by the same
author). I guess if I search for a single author, it seems like the
results should be sorted purely by the boost value as there is nothing
else to make the search prefer one document over another.
One thing is very peculiar ... often documents with different boost
values have exactly the same _score (at least to 5 decimal places).
This seems to happen much more often than coincidence would suggest.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.