fieldNorm value calculation seems to be wrong

Hi,

I indexed the following document in ES.

{
"content_id": 1,
"options": ["CD14", "CD2", "CD235a", "CD20", "CD11b", "CD19", "CD7",
"CD56", "CD10", "CD8", "CD3", "CD4"]
}

Then I did a simple search using the query_string query.

{
"explain": true,
"query": {
"query_string": {
"fields": [ "options" ],
"query": "cd4",
"default_operator": "and",
"use_dis_max": false
}
}
}

I got the result with score = 0.076713204 and fieldNorm = 0.25.

Then I changed the boosting of the "options" field to 2 and tried again.
This time got score = 314.2173 and fieldNorm = 1024.

When the boost is 3, I got score = 40219.812 and fieldNorm = 131072.

When the boost is 4, I got score = 1287034 and fieldNorm = 4194304.

The tf and idf values was the same on all cases with tf = 1, idf
= 0.30685282.

Here the fieldNorm value increases exponentially giving me very high scores.
My understanding is that the fieldNorm should be linearly proportional to
the boost value.
Can somebody please explain me how the above numbers for fieldNorm are
calculated?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It appears that each instance of the options field is being boosted. Can
you try on an example document with only one option? Never used field
boosting on a multi-valued field in Elasticsearch, but I have seen this
issue in Lucene.

--
Ivan

On Wed, Apr 10, 2013 at 12:19 AM, dilshan dilshan@calcey.com wrote:

Hi,

I indexed the following document in ES.

{
"content_id": 1,
"options": ["CD14", "CD2", "CD235a", "CD20", "CD11b", "CD19", "CD7",
"CD56", "CD10", "CD8", "CD3", "CD4"]
}

Then I did a simple search using the query_string query.

{
"explain": true,
"query": {
"query_string": {
"fields": [ "options" ],
"query": "cd4",
"default_operator": "and",
"use_dis_max": false
}
}
}

I got the result with score = 0.076713204 and fieldNorm = 0.25.

Then I changed the boosting of the "options" field to 2 and tried again.
This time got score = 314.2173 and fieldNorm = 1024.

When the boost is 3, I got score = 40219.812 and fieldNorm = 131072.

When the boost is 4, I got score = 1287034 and fieldNorm = 4194304.

The tf and idf values was the same on all cases with tf = 1, idf
= 0.30685282.

Here the fieldNorm value increases exponentially giving me very high
scores.
My understanding is that the fieldNorm should be linearly proportional to
the boost value.
Can somebody please explain me how the above numbers for fieldNorm are
calculated?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes. It seems like each item in options field is getting boosted even
though the search matches only one item in the field.
Is there a way to override this behavior? (A way to boost only on items
that matches the search)
I tried with example documents having 1,2 and 3 fields. Below are my
findings.

A document with only one item in options field ("options": ["CD4"])
When boost is 1, fieldNorm = 1
When boost is 2, fieldNorm = 2
When boost is 3, fieldNorm = 3
When boost is 4, fieldNorm = 4
When boost is 5, fieldNorm = 5

A document with two items in options field ("options": ["CD4", "CD3"])
When boost is 1, fieldNorm = 0.625
When boost is 2, fieldNorm = 2.5
When boost is 3, fieldNorm = 6
When boost is 4, fieldNorm = 10
When boost is 5, fieldNorm = 16

A document with three items in options field ("options": ["CD4", "CD3",
"CD2"])
When boost is 1, fieldNorm = 0.5
When boost is 2, fieldNorm = 4
When boost is 3, fieldNorm = 14
When boost is 4, fieldNorm = 32
When boost is 5, fieldNorm = 64

I can't see any logic in the above values. Can somebody please explain how
these values are calculated for fieldNorm?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

On Wed, 2013-04-10 at 22:00 -0700, dilshan wrote:

Yes. It seems like each item in options field is getting boosted even
though the search matches only one item in the field.
Is there a way to override this behavior? (A way to boost only on
items that matches the search)
I tried with example documents having 1,2 and 3 fields. Below are my
findings.

Yes - this is an issue in Lucene. My advice: don't use index time
boosting. It is inflexible (you have to reindex if you want to change
it) and suffers from complexities like the above.

Rather just use search time boosting.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.