I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:
So basically the index-time boost you specify is taken to the power of
the number of values in the field!
Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.
There are downsides to it, for example, if its stored explicitly, or when
one does nested mappings, or faceting on the fields, or having them not
analyzed. In any case, its not planned to automatically concatenate the
values of multi valued fields into a single one.
I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:
"The boost is multiplied by Document.getBoost() of the document
containing this field. If a document has multiple fields with the same
name, all such values are multiplied together. This product is then
used to compute the norm factor for the field."
(source: AbstractField (Lucene 3.5.0 API)
)
So basically the index-time boost you specify is taken to the power of
the number of values in the field!
Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.
Is there any way to work around this? I'd like to be able to set the boost
for if one of the multiple values match, but not have it depend on the
number of values for the field.
On Saturday, April 7, 2012 11:27:31 AM UTC-4, kimchy wrote:
There are downsides to it, for example, if its stored explicitly, or when
one does nested mappings, or faceting on the fields, or having them not
analyzed. In any case, its not planned to automatically concatenate the
values of multi valued fields into a single one.
I came across some pretty crazy scoring behavior recently, where
certain matches on a field boosted at index-time had enormously high
field norms. After some illuminating discussion on the #lucene
channel, I tracked it down to this little nugget:
"The boost is multiplied by Document.getBoost() of the document
containing this field. If a document has multiple fields with the same
name, all such values are multiplied together. This product is then
used to compute the norm factor for the field."
(source: AbstractField (Lucene 3.5.0 API)
)
So basically the index-time boost you specify is taken to the power of
the number of values in the field!
Since the whole concept of multi-valued field is more or less just
sugar in Lucene, might it make more sense for ES to take care of
concatenating the values in multi-valued fields and passing them as a
single value to Lucene? This would make the index-time boost behavior
better and I don't really see a downside.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.