I am experiencing an very annoying behaviour of the elastic search score
calculating algorithm - the field length fails to find a difference between
fields which contain 3 and 4 words. Always return same score for both.
Example:
LANCA HOTEL EXTREME and MASSIVE AMAZING HOTEL GROUP
would come back with the same field length and set the same score for
field-length norm.
I did try using BM25 similarity instead of default one manipulating
parameters, however the output would be always the same.
Anybody got any idea why that would be happening? It is extremely annoying
as most of fields in each document contain about 3-4 words.
The field norm is computed at index time and is stored in a single byte,
which can lead to a loss in precision. This behavior might have changed
with newer versions of Lucene, but probably not.
I am experiencing an very annoying behaviour of the Elasticsearch score
calculating algorithm - the field length fails to find a difference between
fields which contain 3 and 4 words. Always return same score for both.
Example:
LANCA HOTEL EXTREME and MASSIVE AMAZING HOTEL GROUP
would come back with the same field length and set the same score for
field-length norm.
I did try using BM25 similarity instead of default one manipulating
parameters, however the output would be always the same.
Anybody got any idea why that would be happening? It is extremely annoying
as most of fields in each document contain about 3-4 words.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.