Scoring not behaving as expected


(Andrew Soep-2) #1

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had Saudi
Arabia and Ice Cream, it would score much higher, since there are multiple
instances of words. Am I off base here? Can anyone give me some insight?
Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep siraris@gmail.com wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Andrew Soep-2) #3

Hi Alexander,

Here's my query:

{"multi_match":{"query":"ice cream in saudi arabia","fields":["question","categories","responses","geography"]}},"size":50,"explain":true}

Here's mapping:

{
"test : {
"properties" : {
"categories" : {
"type" : "string"
},
"geography" : {
"type" : "string"
},
"question" : {
"type" : "string"
},
"question_id" : {
"type" : "string"
},
"responses" : {
"type" : "string"
}
}
}

On Sunday, October 6, 2013 12:29:17 PM UTC-5, Alexander Reelsen wrote:

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep <sir...@gmail.com<javascript:>

wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #4

Hi Andrew,

there is two things in multi match that might cause the scoring
behavior you see:
First, I think multi match per default computes one score per field
and then uses the maximum of these scores as the final score. So, more
hits do not necessarily mean higher score with multi match using the
default setting. If you set "use_dis_max":false the scores are summed.
Second, when scoring for each field, the field length is also taken
into account.
Could you provide some example documents to check if this this is the
reason for the scores you get?

Cheers,
Britta

On Mon, Oct 7, 2013 at 10:38 PM, Andrew Soep siraris@gmail.com wrote:

Hi Alexander,

Here's my query:

{"multi_match":{"query":"ice cream in saudi
arabia","fields":["question","categories","responses","geography"]}},"size":50,"explain":true}

Here's mapping:

{
"test : {
"properties" : {
"categories" : {
"type" : "string"
},
"geography" : {
"type" : "string"
},
"question" : {
"type" : "string"
},
"question_id" : {
"type" : "string"
},
"responses" : {
"type" : "string"
}
}
}

On Sunday, October 6, 2013 12:29:17 PM UTC-5, Alexander Reelsen wrote:

Hey,

can you share some information about your queries (and also your mapping)?
What kind of query are you executing, do you use bool queries with
must/should clauses or are you simply querying the _all field, are filters
used as well? Some samples would be great (also including your boosting
tries!

--Alex

On Sat, Oct 5, 2013 at 3:44 AM, Andrew Soep sir...@gmail.com wrote:

I'm having some odd results (or it seems to me) in regards to the results
returned by my queries. I'm doing a multi-match query against 4 fields, 1
of which is a geographic location (US/Japan/Saudi Arabia etc) and the 3
others are a mix of strings/arrays. The issue seems to be that the more
hits on a document, the lower the document scores. So for example, if I
have 10 documents with a similar geography (Saudi Arabia), and I do a query
that contains Saudi Arabia and ice cream, the documents that simply have
Saudi Arabia as the geography and no instance of ice cream in any of the
other fields score higher than documents that have Saudi Arabia in the
geography and at least one instance of ice cream in the other fields.

I'm confused why this would be. I would think that if a document had
Saudi Arabia and Ice Cream, it would score much higher, since there are
multiple instances of words. Am I off base here? Can anyone give me some
insight? Should I be using a different query type? I tried boosting the
non-geography fields, and it actually made the results even more confusing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5