Top hits aggregation default sort


(Dan Tuffery) #1

I using the top hits aggregation with a has_child query. In the top_hits
aggregation documentation it says 'By default the hits are sorted by the
score of the main query
', but I'm not seeing that in the results for my
query

{
"from": 0,
"size": 3,
"query": {
"has_child": {
"score_mode": "max",
"type": "child_type",
"query": {
"match": {
"myField": {
"query": "some text"
}
}
}
}
},
"aggs": {
"replies": {
"terms": {
"field": "parent_type_id",
"size": 3
},
"aggs": {
"topChildren": {
"top_hits": {
"size": 1
}
}
}
}
}
}

the has_child query returns three parent results with the following scores.

  • doc 1 = 0.83619833
  • doc 2 = 0.7210085
  • doc 3 = 0.7210085

The score for the top hits aggregations are:

  • first top hit aggregation = 0.29160267
  • second top hit aggregation = 0.83619833
  • third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed with
aggregations with the score 0.7210085?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Martijn Van Groningen) #2

Hi Dan,

The buckets of there replies terms agg are sorted by default by their doc
count, but the hits inside topChildren agg are sorted by default by the
score of the query.
I think if you sort the replies buckets by highest score you get what you
want. In order to do this you need to define a max metric agg, that keeps
track of the score and let the replies terms agg sort its buckets by that,
similar to what is done in this example:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example

Martijn

On 18 August 2014 23:52, Dan Tuffery dan.tuffery@gmail.com wrote:

I using the top hits aggregation with a has_child query. In the top_hits
aggregation documentation it says 'By default the hits are sorted by the
score of the main query
', but I'm not seeing that in the results for my
query

{
"from": 0,
"size": 3,
"query": {
"has_child": {
"score_mode": "max",
"type": "child_type",
"query": {
"match": {
"myField": {
"query": "some text"
}
}
}
}
},
"aggs": {
"replies": {
"terms": {
"field": "parent_type_id",
"size": 3
},
"aggs": {
"topChildren": {
"top_hits": {
"size": 1
}
}
}
}
}
}

the has_child query returns three parent results with the following scores.

  • doc 1 = 0.83619833
  • doc 2 = 0.7210085
  • doc 3 = 0.7210085

The score for the top hits aggregations are:

  • first top hit aggregation = 0.29160267
  • second top hit aggregation = 0.83619833
  • third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed with
aggregations with the score 0.7210085?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TxPRu%2BafuMnBckEPsG87AqQCD3hfp6nCKa1kndEYLW9Zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dan Tuffery) #3

Hi Martijn,

Thanks for your response, that seems like the right approach. The problem I
am seeing now is that the metrics aggregation is always returning 0. Even
if I remove top_hits out of the query the metrics aggregation still returns
alwyas 0, so something is quite right? Here is a simplified example:

{
"query": {
"match": {
"r_message": "some text"
}
},
"aggs": {
"replies": {
"terms": {
"field": "myField",
"order": {
"top_score": "desc"
}
},
"aggs": {
"top_score": {
"max": {
"lang": "groovy",
"script": "_score"
}
}
}
}
}
}

The result is:

  • aggregations: {
    • replies: {
      • buckets: [
        • {
          • key: 5643
          • doc_count: 1
          • top_score: {
            • value: 0
              }
              }
              ]
              }
              }

This returns the following result.

On Tuesday, August 19, 2014 8:47:40 AM UTC+1, Martijn v Groningen wrote:

Hi Dan,

The buckets of there replies terms agg are sorted by default by their doc
count, but the hits inside topChildren agg are sorted by default by the
score of the query.
I think if you sort the replies buckets by highest score you get what you
want. In order to do this you need to define a max metric agg, that keeps
track of the score and let the replies terms agg sort its buckets by that,
similar to what is done in this example:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example

Martijn

On 18 August 2014 23:52, Dan Tuffery <dan.t...@gmail.com <javascript:>>
wrote:

I using the top hits aggregation with a has_child query. In the top_hits
aggregation documentation it says 'By default the hits are sorted by
the score of the main query
', but I'm not seeing that in the results
for my query

{
"from": 0,
"size": 3,
"query": {
"has_child": {
"score_mode": "max",
"type": "child_type",
"query": {
"match": {
"myField": {
"query": "some text"
}
}
}
}
},
"aggs": {
"replies": {
"terms": {
"field": "parent_type_id",
"size": 3
},
"aggs": {
"topChildren": {
"top_hits": {
"size": 1
}
}
}
}
}
}

the has_child query returns three parent results with the following
scores.

  • doc 1 = 0.83619833
  • doc 2 = 0.7210085
  • doc 3 = 0.7210085

The score for the top hits aggregations are:

  • first top hit aggregation = 0.29160267
  • second top hit aggregation = 0.83619833
  • third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed
with aggregations with the score 0.7210085?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd57f012-0536-421f-bf81-742fb584da7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Dan Tuffery) #4

I should have looked at the documentation first :slight_smile: it is 'doc.score'.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_score

On Tuesday, August 19, 2014 7:07:18 PM UTC+1, Dan Tuffery wrote:

Hi Martijn,

Thanks for your response, that seems like the right approach. The problem
I am seeing now is that the metrics aggregation is always returning 0. Even
if I remove top_hits out of the query the metrics aggregation still returns
alwyas 0, so something is quite right? Here is a simplified example:

{
"query": {
"match": {
"r_message": "some text"
}
},
"aggs": {
"replies": {
"terms": {
"field": "myField",
"order": {
"top_score": "desc"
}
},
"aggs": {
"top_score": {
"max": {
"lang": "groovy",
"script": "_score"
}
}
}
}
}
}

The result is:

  • aggregations: {
    • replies: {
      • buckets: [
        • {
          • key: 5643
          • doc_count: 1
          • top_score: {
            • value: 0
              }
              }
              ]
              }
              }

This returns the following result.

On Tuesday, August 19, 2014 8:47:40 AM UTC+1, Martijn v Groningen wrote:

Hi Dan,

The buckets of there replies terms agg are sorted by default by their doc
count, but the hits inside topChildren agg are sorted by default by the
score of the query.
I think if you sort the replies buckets by highest score you get what you
want. In order to do this you need to define a max metric agg, that keeps
track of the score and let the replies terms agg sort its buckets by that,
similar to what is done in this example:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example

Martijn

On 18 August 2014 23:52, Dan Tuffery dan.t...@gmail.com wrote:

I using the top hits aggregation with a has_child query. In the top_hits
aggregation documentation it says 'By default the hits are sorted by
the score of the main query
', but I'm not seeing that in the results
for my query

{
"from": 0,
"size": 3,
"query": {
"has_child": {
"score_mode": "max",
"type": "child_type",
"query": {
"match": {
"myField": {
"query": "some text"
}
}
}
}
},
"aggs": {
"replies": {
"terms": {
"field": "parent_type_id",
"size": 3
},
"aggs": {
"topChildren": {
"top_hits": {
"size": 1
}
}
}
}
}
}

the has_child query returns three parent results with the following
scores.

  • doc 1 = 0.83619833
  • doc 2 = 0.7210085
  • doc 3 = 0.7210085

The score for the top hits aggregations are:

  • first top hit aggregation = 0.29160267
  • second top hit aggregation = 0.83619833
  • third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed
with aggregations with the score 0.7210085?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8967c8c1-4b3a-4c5b-822f-b1bdb86f7480%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Martijn Van Groningen) #5

:slight_smile: great that it works out

On 19 August 2014 20:26, Dan Tuffery dan.tuffery@gmail.com wrote:

I should have looked at the documentation first :slight_smile: it is 'doc.score'.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_score

On Tuesday, August 19, 2014 7:07:18 PM UTC+1, Dan Tuffery wrote:

Hi Martijn,

Thanks for your response, that seems like the right approach. The problem
I am seeing now is that the metrics aggregation is always returning 0. Even
if I remove top_hits out of the query the metrics aggregation still returns
alwyas 0, so something is quite right? Here is a simplified example:

{
"query": {
"match": {
"r_message": "some text"
}
},
"aggs": {
"replies": {
"terms": {
"field": "myField",
"order": {
"top_score": "desc"
}
},
"aggs": {
"top_score": {
"max": {
"lang": "groovy",
"script": "_score"
}
}
}
}
}
}

The result is:

  • aggregations: {
    • replies: {
      • buckets: [
        • {
          • key: 5643
          • doc_count: 1
          • top_score: {
            • value: 0
              }
              }
              ]
              }
              }

This returns the following result.

On Tuesday, August 19, 2014 8:47:40 AM UTC+1, Martijn v Groningen wrote:

Hi Dan,

The buckets of there replies terms agg are sorted by default by their
doc count, but the hits inside topChildren agg are sorted by default by the
score of the query.
I think if you sort the replies buckets by highest score you get what
you want. In order to do this you need to define a max metric agg, that
keeps track of the score and let the replies terms agg sort its buckets by
that, similar to what is done in this example: http://www.
elasticsearch.org/guide/en/elasticsearch/reference/
current/search-aggregations-metrics-top-hits-aggregation.
html#_field_collapse_example

Martijn

On 18 August 2014 23:52, Dan Tuffery dan.t...@gmail.com wrote:

I using the top hits aggregation with a has_child query. In the
top_hits aggregation documentation it says 'By default the hits are
sorted by the score of the main query
', but I'm not seeing that in
the results for my query

{
"from": 0,
"size": 3,
"query": {
"has_child": {
"score_mode": "max",
"type": "child_type",
"query": {
"match": {
"myField": {
"query": "some text"
}
}
}
}
},
"aggs": {
"replies": {
"terms": {
"field": "parent_type_id",
"size": 3
},
"aggs": {
"topChildren": {
"top_hits": {
"size": 1
}
}
}
}
}
}

the has_child query returns three parent results with the following
scores.

  • doc 1 = 0.83619833
  • doc 2 = 0.7210085
  • doc 3 = 0.7210085

The score for the top hits aggregations are:

  • first top hit aggregation = 0.29160267
  • second top hit aggregation = 0.83619833
  • third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed
with aggregations with the score 0.7210085?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TygarVMd8wVgDi3dzeSPzY%2BMuuFV0kr4h-C-uY2hgynoQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6