Field-length norm fails on fields with 3 and 4 words

Fil_ES · April 30, 2015, 4:42pm

Hello,

I am experiencing an very annoying behaviour of the elastic search score
calculating algorithm - the field length fails to find a difference between
fields which contain 3 and 4 words. Always return same score for both.
Example:

LANCA HOTEL EXTREME and MASSIVE AMAZING HOTEL GROUP

would come back with the same field length and set the same score for
field-length norm.

I did try using BM25 similarity instead of default one manipulating
parameters, however the output would be always the same.

Anybody got any idea why that would be happening? It is extremely annoying
as most of fields in each document contain about 3-4 words.

Thank you,
Fil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a007c1fc-a5c4-45f5-9f83-7f414831170b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · April 30, 2015, 8:54pm

The field norm is computed at index time and is stored in a single byte,
which can lead to a loss in precision. This behavior might have changed
with newer versions of Lucene, but probably not.

Ivan
On Apr 30, 2015 6:42 PM, "Fil ES" lisowski.filip91@gmail.com wrote:

Hello,

I am experiencing an very annoying behaviour of the Elasticsearch score
calculating algorithm - the field length fails to find a difference between
fields which contain 3 and 4 words. Always return same score for both.
Example:

LANCA HOTEL EXTREME and MASSIVE AMAZING HOTEL GROUP

would come back with the same field length and set the same score for
field-length norm.

I did try using BM25 similarity instead of default one manipulating
parameters, however the output would be always the same.

Anybody got any idea why that would be happening? It is extremely annoying
as most of fields in each document contain about 3-4 words.

Thank you,
Fil

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a007c1fc-a5c4-45f5-9f83-7f414831170b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a007c1fc-a5c4-45f5-9f83-7f414831170b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA2qwW9RAJ9NM_9kvWzfPkF7qxFHuLZaxGOphj%2BvjLA6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
AvgFieldLength seem wrong Elasticsearch	1	544	July 6, 2017
AvgFieldLength seem wrong Elasticsearch	1	487	July 6, 2017
How does Elasticsearch calculate the field-length norm? Elasticsearch	3	3353	July 6, 2017
Scoring variable length documents Elasticsearch	1	260	July 6, 2017
What is the BM 25's way to disable field length norm as in TF/IDF Elasticsearch	3	4280	March 29, 2017

Field-length norm fails on fields with 3 and 4 words

Related topics