Scoring not behaving as expected

Andrew_Soep_2 · October 5, 2013, 1:44am

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had Saudi
Arabia and Ice Cream, it would score much higher, since there are multiple
instances of words. Am I off base here? Can anyone give me some insight?
Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · October 6, 2013, 5:29pm

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep siraris@gmail.com wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew_Soep_2 · October 7, 2013, 8:38pm

Hi Alexander,

Here's my query:

{"multi_match":{"query":"ice cream in saudi arabia","fields":["question","categories","responses","geography"]}},"size":50,"explain":true}

Here's mapping:

{
"test : {
"properties" : {
"categories" : {
"type" : "string"
},
"geography" : {
"type" : "string"
},
"question" : {
"type" : "string"
},
"question_id" : {
"type" : "string"
},
"responses" : {
"type" : "string"
}
}
}

On Sunday, October 6, 2013 12:29:17 PM UTC-5, Alexander Reelsen wrote:

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep <sir...@gmail.com<javascript:>

wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Britta_Weber · October 16, 2013, 12:41pm

Hi Andrew,

there is two things in multi match that might cause the scoring
behavior you see:
First, I think multi match per default computes one score per field
and then uses the maximum of these scores as the final score. So, more
hits do not necessarily mean higher score with multi match using the
default setting. If you set "use_dis_max":false the scores are summed.
Second, when scoring for each field, the field length is also taken
into account.
Could you provide some example documents to check if this this is the
reason for the scores you get?

Cheers,
Britta

On Mon, Oct 7, 2013 at 10:38 PM, Andrew Soep siraris@gmail.com wrote:

Hi Alexander,

Here's my query:

{"multi_match":{"query":"ice cream in saudi
arabia","fields":["question","categories","responses","geography"]}},"size":50,"explain":true}

Here's mapping:

{
"test : {
"properties" : {
"categories" : {
"type" : "string"
},
"geography" : {
"type" : "string"
},
"question" : {
"type" : "string"
},
"question_id" : {
"type" : "string"
},
"responses" : {
"type" : "string"
}
}
}

On Sunday, October 6, 2013 12:29:17 PM UTC-5, Alexander Reelsen wrote:

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep sir...@gmail.com wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Odd scoring behavior Elasticsearch	7	500	March 22, 2018
Elasticsearch MutliMatchQuery field boost not contributing to the score Elasticsearch	1	318	July 6, 2017
Full text query multi_match document scores Elasticsearch	2	985	October 9, 2019
Elasticsearch Multimatch Query Field level boost is not working Elasticsearch	1	407	July 6, 2017
Surprising scoring when using multi_match's cross_field Elasticsearch	5	417	July 6, 2017

Scoring not behaving as expected

Related topics