Index-time boost in multi-valued fields

Matthew_A_Brown · April 6, 2012, 7:37pm

I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:

"The boost is multiplied by Document.getBoost() of the document
containing this field. If a document has multiple fields with the same
name, all such values are multiplied together. This product is then
used to compute the norm factor for the field."
(source: http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/document/AbstractField.html#setBoost(float))

So basically the index-time boost you specify is taken to the power of
the number of values in the field!

Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.

Just a thought!

Mat

kimchy · April 7, 2012, 3:27pm

There are downsides to it, for example, if its stored explicitly, or when
one does nested mappings, or faceting on the fields, or having them not
analyzed. In any case, its not planned to automatically concatenate the
values of multi valued fields into a single one.

On Fri, Apr 6, 2012 at 10:37 PM, Matthew A. Brown mat.a.brown@gmail.comwrote:

I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:

"The boost is multiplied by Document.getBoost() of the document
containing this field. If a document has multiple fields with the same
name, all such values are multiplied together. This product is then
used to compute the norm factor for the field."
(source:
AbstractField (Lucene 3.5.0 API)
)

So basically the index-time boost you specify is taken to the power of
the number of values in the field!

Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.

Just a thought!

Mat

Matthew_Schulkind · June 21, 2012, 10:50pm

Is there any way to work around this? I'd like to be able to set the boost
for if one of the multiple values match, but not have it depend on the
number of values for the field.

On Saturday, April 7, 2012 11:27:31 AM UTC-4, kimchy wrote:

There are downsides to it, for example, if its stored explicitly, or when
one does nested mappings, or faceting on the fields, or having them not
analyzed. In any case, its not planned to automatically concatenate the
values of multi valued fields into a single one.

On Fri, Apr 6, 2012 at 10:37 PM, Matthew A. Brown mat.a.brown@gmail.comwrote:

I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:

"The boost is multiplied by Document.getBoost() of the document
containing this field. If a document has multiple fields with the same
name, all such values are multiplied together. This product is then
used to compute the norm factor for the field."
(source:
AbstractField (Lucene 3.5.0 API)
)

So basically the index-time boost you specify is taken to the power of
the number of values in the field!

Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.

Just a thought!

Mat

Topic		Replies	Views
Index time boosting can be more than 5? Elasticsearch	4	1000	July 5, 2017
Mapping/boosting problem Elasticsearch	15	603	July 6, 2017
Assign boost to document at index time Elasticsearch	7	1045	April 3, 2018
Subtle scoring issue with multi-value fields' fieldNorm being calculated as if they are one concatenated value Elasticsearch	5	715	July 6, 2017
You cannot set an index-time boost: norms are omitted Elasticsearch	8	2617	July 6, 2017

Index-time boost in multi-valued fields

Related topics