How to aggregate values for embedded documents


(Keith Garrod) #1

I have a document 'user' with embedded 'review' documents. Each review document has a 'rating' field.

I want to run a query which returns a 'range' type field for each 'user' that is returned from a query.

I am able to aggregate all 'rating's for the selected 'users's, but I don't see how to return a bucket for each 'user'.

The only way I can see of achieving it is by adding a 'script_field' - is that the right approach?

I'm a confused newbie - please be kind :smile:


(Britta Weber) #2

You can nest aggregations, for example have a terms aggregation that creates a bucket per id (which would then be per user) and then beneath that nest an aggregation that aggregates on the values in the document. Below is an example that gives you the min and max value per user. Is that what you are looking for?
The downside of this is that if you create one bucket per user this might potentially be many buckets but if you only do that for selected users this might be ok. If not then a script field would actually be the better approach I think.

PUT testidx
{
  "mappings": {
    "user": {
      "_id": {
        "index": "not_analyzed"
      }
    }
  }
}

POST testidx/user
{
  "name": "user1",
  "ratings": [
    {
      "rating": 1
    },
    {
      "rating": 2
    },
    {
      "rating": 3
    }
  ]
}

# index some more....
...

# search and aggregate

POST testidx/user/_search
{
  "size": 0,
  "aggs": {
    "doc_id": {
      "terms": {
        "field": "_id",
        "size": 10
      },
      "aggs": {
        "min": {
          "min": {
            "field": "ratings.rating"
          }
        },
        "max": {
          "max": {
            "field": "ratings.rating"
          }
        }
      }
    }
  }
}

would result in (depending what you indexed):

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "doc_id": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "AU-oxK008HuUQps-las2",
               "doc_count": 1,
               "min": {
                  "value": 1
               },
               "max": {
                  "value": 5
               }
            },
            {
               "key": "AU-oxME18HuUQps-las3",
               "doc_count": 1,
               "min": {
                  "value": 0
               },
               "max": {
                  "value": 5
               }
            },
            {
               "key": "AU-oxNli8HuUQps-latQ",
               "doc_count": 1,
               "min": {
                  "value": 1
               },
               "max": {
                  "value": 3
               }
            }
         ]
      }
   }
}

(Keith Garrod) #3

@a2tirb Thanks for the help, that was what I needed.

in my case, the 'review' documents are nested in the user document, so I had to add the aggs for the nested documents:

"aggs": {
    "doc_id": {
      "terms": {
        "field": "_id",
        "size": 10
      },
      "aggs": {
            "reviews": {
              "nested": {
                "path": "reviews"
              }, 
             "min": {
                "min": {
                  "field": "reviews.rating"
             }
        }

(system) #4