Get top 1 row of each group

In my sport index, I have the following documents indexed as football_team
type:

Here, each football team has a name and some strength values. Besides that,
there is a player_ids collection for each team. The team stregth has been
calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

We will get the following result:

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

Any idea?

Also, Here is the whole question in
gist: https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-football_teams-md

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73eefd0e-c555-4f70-9e67-5da05b04f32b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Looking at the top hits aggregation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-aggregations-metrics-top-hits-aggregation.htm.
I guess this is what I want. I will try this out now.

On Wednesday, September 24, 2014 1:18:27 PM UTC+3, Tugberk Ugurlu wrote:

In my sport index, I have the following documents indexed as football_team
type:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-index-js

Here, each football team has a name and some strength values. Besides
that, there is a player_ids collection for each team. The team stregth has
been calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_query-js

We will get the following result:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_result-js

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-expected_result-js

Any idea?

Also, Here is the whole question in gist:
https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-football_teams-md

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b018ce04-24c2-43d4-956a-bc6a5dda8468%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OK, I think I was able to get what I needed but I am still not able to get
it 100% because of the lack of paging support for aggregations. I also
learned how powerful the aggregation is in Elasticsearch.

I changed the document structure a little (added the primaryId)

POST sport/football_team
{
"primaryId": "541afe09532aec0f305c5f2b",
"name": "Real Madrid",
"defense_strength": 88.2,
"middle_strength": 92.34,
"forward_strength": 97.45,
"player_ids": [
"1", "2", "3", "4", "21", "6", "7", "8", "9", "10", "11"
]
}

This is what I ended up with:

POST sport/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"defense_strength": {
"lte": 83.43
}
}
},
{
"range": {
"forward_strength": {
"gte": 91
}
}
}
]
}
}
}
},
"aggs": {
"top_teams": {
"terms": {
"field": "primaryId"
},
"aggs": {
"top_team_hits": {
"top_hits": {
"sort": [
{
"forward_strength": {
"order": "desc"
}
}
],
"_source": {
"include": [
"name"
]
},
"from": 0,
"size" : 1
}
}
}
}
}
}
}

The result is what I expected:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"top_teams": {
"buckets": [
{
"key": "541afdfc532aec0f305c2c48",
"doc_count": 2,
"top_team_hits": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "y6jZ31xoQMCXaK23rPQgjA",
"_score": null,
"_source": {
"name": "Barcelona"
},
"sort": [
98.32
]
}
]
}
}
},
{
"key": "541afe08532aec0f305c5f28",
"doc_count": 2,
"top_team_hits": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "hewWI0ZpTki4OgOeneLn1Q",
"_score": null,
"_source": {
"name": "Arsenal"
},
"sort": [
94.3
]
}
]
}
}
},
{
"key": "541afe09532aec0f305c5f2b",
"doc_count": 1,
"top_team_hits": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "x-_YBX5jSba8qsEuB8guTQ",
"_score": null,
"_source": {
"name": "Real Madrid"
},
"sort": [
91.34
]
}
]
}
}
}
]
}
}
}

All good but now what I need is the ability to get first 2 aggregation
result and get the other 2 (in this case, only 1) in other request.

On Wednesday, September 24, 2014 1:18:27 PM UTC+3, Tugberk Ugurlu wrote:

In my sport index, I have the following documents indexed as football_team
type:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-index-js

Here, each football team has a name and some strength values. Besides
that, there is a player_ids collection for each team. The team stregth has
been calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_query-js

We will get the following result:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_result-js

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-expected_result-js

Any idea?

Also, Here is the whole question in gist:
https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-football_teams-md

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/974e2f9f-053f-4add-a41c-6cb23148214e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.