Aggregations within aggregations

I have an index that contains documents about movies and the movies are tagged with their genre and main actors. Like this:

{
	"title": "The fast and the furious",
	"tags": [
		{
			"groupName": "Genre",
			"tags": [ "Action", "Racing"]
		},
		{
			"groupName": "Actors",
			"tags": ["Vin Diesel", "Paul Walker"]
		}
	]
},
{
	"title": "2 fast 2 furious",
	"tags": [
		{
			"groupName": "Genre",
			"tags": [ "Action"]
		},
		{
			"groupName": "Actors",
			"tags": ["Paul Walker", "Ludacris"]
		}
	]
}

Now I want to use aggregation to aggregate first on groupName then on the tags-list, in order to create a list like this:

Genre

  • Action (2)
  • Racing (1)

Actors

  • Paul Walker (2)
  • Vin Diesel (1)
  • Ludacris (1)

I have tried to do the following aggregation:

POST mymovies/_search
 {
   "query": {
     "match_all": {}
   },
   "aggs": {
     "movie_groupName": {
       "terms": {
         "field": "tags.groupName",
         "size": 20
       },
       "aggs": {
         "movie_tags": {
           "terms": {
             "field": "tags.tags",
             "size": 20
           }
         }
       }
     }
   }
 }

This is the result I expected:

"aggregations": {
	"movie_groupName": {
		"doc_count_error_upper_bound": 0,
		"sum_other_doc_count": 0,
		"buckets": [
			{
				"key": "Genre",
				"doc_count": 2,
				"movice_tags": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": 
					[
						{
							"key": "Action",
							"doc_count": 2
						},
						{
							"key": "Racing",
							"doc_count": 1
						}
					]
				}
			},
			{
				"key": "Actors",
				"doc_count": 2,
				"movice_tags": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": 
					[
						{
							"key": "Paul Walker",
							"doc_count": 2
						},
						{
							"key": "Vin Diesel",
							"doc_count": 1
						},
						{
							"key": "Ludacris",
							"doc_count": 1
						}
					]
				}
			}
		]
	}
}

But this is the result I get back:

"aggregations": {
	"movie_groupName": {
		"doc_count_error_upper_bound": 0,
		"sum_other_doc_count": 0,
		"buckets": [
			{
				"key": "Genre",
				"doc_count": 2,
				"movice_tags": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": 
					[
						{
							"key": "Action",
							"doc_count": 2
						},				
						{
							"key": "Paul Walker",
							"doc_count": 2
						},
						{
							"key": "Racing",
							"doc_count": 1
						},
						{
							"key": "Vin Diesel",
							"doc_count": 1
						},
						{
							"key": "Ludacris",
							"doc_count": 1
						}
					]
				}
			},
			{
				"key": "Actors",
				"doc_count": 2,
				"movice_tags": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": 
					[
						{
							"key": "Action",
							"doc_count": 2
						},				
						{
							"key": "Paul Walker",
							"doc_count": 2
						},
						{
							"key": "Racing",
							"doc_count": 1
						},
						{
							"key": "Vin Diesel",
							"doc_count": 1
						},
						{
							"key": "Ludacris",
							"doc_count": 1
						}
					]
				}
			}
		]
	}
}

Giving me a list that will look like this (which clearly is not what I want):

Genre

  • Paul Walker (2)
  • Action (2)
  • Racing (1)
  • Vin Diesel (1)
  • Ludacris (1)

Actors

  • Paul Walker (2)
  • Action (2)
  • Racing (1)
  • Vin Diesel (1)
  • Ludacris (1)

What am I doing wrong?

Life is probably made a lot easier if your actors are listed in a field called "actors" and your genres in a field called "genres". e.g.

{
    "title": "2 fast 2 furious",
    "actors": ["Paul Walker", "Ludacris"],
    "genres": ["Action"]
}

Yeah, I know how to do it like that. But that is not my question. I want the list to be generic.

Probably because your field tags is not a nested field.
But I deeply agree with @Mark_Harwood 's comment!

Prepare yourself for pain then :slight_smile:

  • Your queries, aggs and docs will all need to adopt nested syntax to overcome the cross-matching problem [1] inherent to Lucene.
  • Kibana will be unable to use these fields.
  • Your index will bloat in size.

[1] Proposal for nested document support in Lucene | PPT

@dadoonet How do I make it a nested field?
The problem with @Mark_Harwood s suggestion is that I do not know what the groupNames might be. A list of movies is just an example to simplify my problem. I want to be able to create a list showing the groupnames and number of documents pr tag without having to do any programming every time a groupName is added or removed.

Well this doesn't sound very good :see_no_evil:
I want to group something within a group... Do you have any other suggestions as how to solve this problem?

People use elasticsearch to do a lot of of grouping groups within groups and then subgroups etc

However, normally the meaning of the data is expressed on the left-hand side of "fieldname": "value" pairs. Life is more complicated if the meaning of "value" depends on a neighbouring field/value pair buried in the same object. We then need you to be explicit about which neighbouring item contains the context etc. using nested. Messy.

How many unique "groupnames" do you expect to accumulate over time? You might be able to use dynamic mapping to add new fields automatically but this shouldn't be used for too many unique field names (where "too many" is probably a mapping with thousands of fields).

Not that many. Maybe 10-20 unique "groupNames" at max

Totally manageable then.
Just throw JSON docs with the new fields at elasticsearch and by default it will automatically expand the internal mapping definition used to manage these fields. For example, given this previously-unseen content:

{
   "genre" : ["Action", "Science Fiction"],
    "actors" : ["Vin Diesel", "Ludacris"]
}

..elasticsearch will automatcally add to the internal mapping definition as follows:

{
  "test": {
	"mappings": {
	  "doc": {
		"properties": {
		  "actors": {
			"type": "text",
			"fields": {
			  "keyword": {
				"type": "keyword",
				"ignore_above": 256
			  }
			}
		  },
		  "genre": {
			"type": "text",
			"fields": {
			  "keyword": {
				"type": "keyword",
				"ignore_above": 256
			  }
			}
 ...

You can query on the ".text" indexed versions of fields e.g. actors.text so that you would match a search for vin regardless of case but would use the keyword versions of fields in any aggregation e.g. actor.keyword so that when returning popular actors you'd get a single keyword Vin Diesel rather than the words vin and diesel as separate items.
You can nest terms aggregations arbitrarily e.g. break down by genre.keyword and then the most popular actor.keyword values under that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.