How to aggregate values for embedded documents

Keith_Garrod · September 7, 2015, 12:49pm

I have a document 'user' with embedded 'review' documents. Each review document has a 'rating' field.

I want to run a query which returns a 'range' type field for each 'user' that is returned from a query.

I am able to aggregate all 'rating's for the selected 'users's, but I don't see how to return a bucket for each 'user'.

The only way I can see of achieving it is by adding a 'script_field' - is that the right approach?

I'm a confused newbie - please be kind

a2tirb · September 7, 2015, 5:20pm

You can nest aggregations, for example have a terms aggregation that creates a bucket per id (which would then be per user) and then beneath that nest an aggregation that aggregates on the values in the document. Below is an example that gives you the min and max value per user. Is that what you are looking for?
The downside of this is that if you create one bucket per user this might potentially be many buckets but if you only do that for selected users this might be ok. If not then a script field would actually be the better approach I think.

PUT testidx
{
  "mappings": {
    "user": {
      "_id": {
        "index": "not_analyzed"
      }
    }
  }
}

POST testidx/user
{
  "name": "user1",
  "ratings": [
    {
      "rating": 1
    },
    {
      "rating": 2
    },
    {
      "rating": 3
    }
  ]
}

# index some more....
...

# search and aggregate

POST testidx/user/_search
{
  "size": 0,
  "aggs": {
    "doc_id": {
      "terms": {
        "field": "_id",
        "size": 10
      },
      "aggs": {
        "min": {
          "min": {
            "field": "ratings.rating"
          }
        },
        "max": {
          "max": {
            "field": "ratings.rating"
          }
        }
      }
    }
  }
}

would result in (depending what you indexed):

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "doc_id": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "AU-oxK008HuUQps-las2",
               "doc_count": 1,
               "min": {
                  "value": 1
               },
               "max": {
                  "value": 5
               }
            },
            {
               "key": "AU-oxME18HuUQps-las3",
               "doc_count": 1,
               "min": {
                  "value": 0
               },
               "max": {
                  "value": 5
               }
            },
            {
               "key": "AU-oxNli8HuUQps-latQ",
               "doc_count": 1,
               "min": {
                  "value": 1
               },
               "max": {
                  "value": 3
               }
            }
         ]
      }
   }
}

Keith_Garrod · September 8, 2015, 7:42am

@a2tirb Thanks for the help, that was what I needed.

in my case, the 'review' documents are nested in the user document, so I had to add the aggs for the nested documents:

"aggs": {
    "doc_id": {
      "terms": {
        "field": "_id",
        "size": 10
      },
      "aggs": {
            "reviews": {
              "nested": {
                "path": "reviews"
              }, 
             "min": {
                "min": {
                  "field": "reviews.rating"
             }
        }

Topic		Replies	Views
Per bucket scoring in aggregations Elasticsearch	6	2533	July 5, 2017
Aggregate data per document Elasticsearch	3	172	July 11, 2023
Aggregate elasticsearch index by nested filed values Elasticsearch	1	380	January 13, 2017
Aggregations on nested objects Elasticsearch	8	1143	July 5, 2017
Can aggregations values be used in script score calculations? Elasticsearch	2	490	September 9, 2020

How to aggregate values for embedded documents

Related topics