Get top 1 row of each group

Tugberk_Ugurlu · September 24, 2014, 10:18am

In my sport index, I have the following documents indexed as football_team
type:

gist.github.com

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-index-js

expected_result.js

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,

This file has been truncated. show original

football_teams.md

In my sport index, I have the following documents indexed as football_team type:

    POST sport/football_team
    {
      "name": "Real Madrid",
      "defense_strength": 87.4,
      "middle_strength": 90.34,
      "forward_strength": 98.34,
      "player_ids": [
          "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"

This file has been truncated. show original

index.js

POST sport/football_team
{
  "name": "Real Madrid",
  "defense_strength": 87.4,
  "middle_strength": 90.34,
  "forward_strength": 98.34,
  "player_ids": [
      "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"
  ]
}

This file has been truncated. show original

There are more than three files. show original

Here, each football team has a name and some strength values. Besides that,
there is a player_ids collection for each team. The team stregth has been
calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

gist.github.com

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_query-js

expected_result.js

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,

This file has been truncated. show original

football_teams.md

In my sport index, I have the following documents indexed as football_team type:

    POST sport/football_team
    {
      "name": "Real Madrid",
      "defense_strength": 87.4,
      "middle_strength": 90.34,
      "forward_strength": 98.34,
      "player_ids": [
          "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"

This file has been truncated. show original

index.js

POST sport/football_team
{
  "name": "Real Madrid",
  "defense_strength": 87.4,
  "middle_strength": 90.34,
  "forward_strength": 98.34,
  "player_ids": [
      "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"
  ]
}

This file has been truncated. show original

There are more than three files. show original

We will get the following result:

gist.github.com

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-search_result-js

expected_result.js

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,

This file has been truncated. show original

football_teams.md

In my sport index, I have the following documents indexed as football_team type:

    POST sport/football_team
    {
      "name": "Real Madrid",
      "defense_strength": 87.4,
      "middle_strength": 90.34,
      "forward_strength": 98.34,
      "player_ids": [
          "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"

This file has been truncated. show original

index.js

POST sport/football_team
{
  "name": "Real Madrid",
  "defense_strength": 87.4,
  "middle_strength": 90.34,
  "forward_strength": 98.34,
  "player_ids": [
      "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"
  ]
}

This file has been truncated. show original

There are more than three files. show original

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

gist.github.com

https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-expected_result-js

expected_result.js

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,

This file has been truncated. show original

football_teams.md

In my sport index, I have the following documents indexed as football_team type:

    POST sport/football_team
    {
      "name": "Real Madrid",
      "defense_strength": 87.4,
      "middle_strength": 90.34,
      "forward_strength": 98.34,
      "player_ids": [
          "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"

This file has been truncated. show original

index.js

POST sport/football_team
{
  "name": "Real Madrid",
  "defense_strength": 87.4,
  "middle_strength": 90.34,
  "forward_strength": 98.34,
  "player_ids": [
      "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"
  ]
}

This file has been truncated. show original

There are more than three files. show original

Any idea?

Also, Here is the whole question in
gist: https://gist.github.com/tugberkugurlu/4fd750a5ada3ee5de17a#file-football_teams-md

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73eefd0e-c555-4f70-9e67-5da05b04f32b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tugberk_Ugurlu · September 24, 2014, 1:58pm

Looking at the top hits aggregation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-aggregations-metrics-top-hits-aggregation.htm.
I guess this is what I want. I will try this out now.

On Wednesday, September 24, 2014 1:18:27 PM UTC+3, Tugberk Ugurlu wrote:

In my sport index, I have the following documents indexed as football_team
type:

expected_result.js · GitHub

Here, each football team has a name and some strength values. Besides
that, there is a player_ids collection for each team. The team stregth has
been calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

expected_result.js · GitHub

We will get the following result:

expected_result.js · GitHub

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

expected_result.js · GitHub

Any idea?

Also, Here is the whole question in gist:
expected_result.js · GitHub

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b018ce04-24c2-43d4-956a-bc6a5dda8468%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tugberk_Ugurlu · September 24, 2014, 4:58pm

OK, I think I was able to get what I needed but I am still not able to get
it 100% because of the lack of paging support for aggregations. I also
learned how powerful the aggregation is in Elasticsearch.

I changed the document structure a little (added the primaryId)

POST sport/football_team
{
"primaryId": "541afe09532aec0f305c5f2b",
"name": "Real Madrid",
"defense_strength": 88.2,
"middle_strength": 92.34,
"forward_strength": 97.45,
"player_ids": [
"1", "2", "3", "4", "21", "6", "7", "8", "9", "10", "11"
]
}

This is what I ended up with:

POST sport/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"defense_strength": {
"lte": 83.43
}
}
},
{
"range": {
"forward_strength": {
"gte": 91
}
}
}
]
}
}
}
},
"aggs": {
"top_teams": {
"terms": {
"field": "primaryId"
},
"aggs": {
"top_team_hits": {
"top_hits": {
"sort": [
{
"forward_strength": {
"order": "desc"
}
}
],
"_source": {
"include": [
"name"
]
},
"from": 0,
"size" : 1
}
}
}
}
}
}
}

The result is what I expected:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits":
},
"aggregations": {
"top_teams": {
"buckets": [
{
"key": "541afdfc532aec0f305c2c48",
"doc_count": 2,
"top_team_hits": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "y6jZ31xoQMCXaK23rPQgjA",
"_score": null,
"_source": {
"name": "Barcelona"
},
"sort": [
98.32
]
}
]
}
}
},
{
"key": "541afe08532aec0f305c5f28",
"doc_count": 2,
"top_team_hits": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "hewWI0ZpTki4OgOeneLn1Q",
"_score": null,
"_source": {
"name": "Arsenal"
},
"sort": [
94.3
]
}
]
}
}
},
{
"key": "541afe09532aec0f305c5f2b",
"doc_count": 1,
"top_team_hits": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "sport",
"_type": "football_team",
"_id": "x-_YBX5jSba8qsEuB8guTQ",
"_score": null,
"_source": {
"name": "Real Madrid"
},
"sort": [
91.34
]
}
]
}
}
}
]
}
}
}

All good but now what I need is the ability to get first 2 aggregation
result and get the other 2 (in this case, only 1) in other request.

On Wednesday, September 24, 2014 1:18:27 PM UTC+3, Tugberk Ugurlu wrote:

In my sport index, I have the following documents indexed as football_team
type:

expected_result.js · GitHub

Here, each football team has a name and some strength values. Besides
that, there is a player_ids collection for each team. The team stregth has
been calculated by taking the avarage of players' strengths during the ETL
process. You can also see that there are multiple football teams with the
same name here but the player_ids collection is different.

When we run the following query:

expected_result.js · GitHub

We will get the following result:

expected_result.js · GitHub

Which is expected. However, what I would like to get here is top 1 row of
each group (grouped by the team name). The result I would like to get for
the above query is this:

expected_result.js · GitHub

Any idea?

Also, Here is the whole question in gist:
expected_result.js · GitHub

Tugberk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/974e2f9f-053f-4add-a41c-6cb23148214e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.