Return _source fields within aggregation

Hi all,

In my ES indices I have an order type, which as you can guess stores various data about orders.
What I need to do is create a query such that: it splits data using scripted field, and then sums up order values within created buckets, as well as returns information such as currency, account id, channel the order came from etc which would be in _source...

I can prepare this query through Kibana visualisation but it ends up having a lot of unnecessary aggregations. I tried to use top_hits but with no success... I wondered, are there limitations to what type of aggregations top_hits can be used or is it just most likely me writing the query incorrectly?

Thanks,
Kornelia

Can you share the aggregation that you tried ? The top_hits aggregation can be used as a root aggregation or under a multi_bucket aggregation (terms for instance).

Might be easier to maybe show the query that Kibana suggests, which is:

     {
        "query": {
          "bool": {
            "must": [
              {
                "query_string": {
                  "analyze_wildcard": true,
                  "query": "some query"
                }
              },
              {
                "range": {
                  "date": {
                    "gte": 1447058731436,
                    "lte": 1510217131436,
                    "format": "epoch_millis"
                  }
                }
              }
            ]
          }
        },
        "size": 0,
        "aggs": {
          "id": {
            "terms": {
              "script": {
                "inline": "some scripted field",
                "lang": "painless"
              },
              "size": 500000,
              "order": {
                "_count": "desc"
              },
              "value_type": "string"
            },
            "aggs": {
              "account_id": {
                "terms": {
                  "field": "account_id",
                  "size": 1,
                  "order": {
                    "_count": "desc"
                  }
                },
                "aggs": {
                  "channel": {
                    "terms": {
                      "field": "channel_description",
                      "size": 1,
                      "order": {
                        "_count": "desc"
                      }
                    },
                    "aggs": {
                      "base_currency": {
                        "terms": {
                          "field": "base_currency",
                          "size": 1,
                          "order": {
                            "_count": "desc"
                          }
                        },
                        "aggs": {
                          "gmv": {
                            "sum": {
                              "field": "value_base"
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }

And the output is

{
        "aggregations": {
          "id": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "acc-channel-10-2017",
                "doc_count": 2670,
                "account_id": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "some account",
                      "doc_count": 2670,
                      "channel": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                          {
                            "key": "some channel",
                            "doc_count": 2670,
                            "base_currency": {
                              "doc_count_error_upper_bound": 0,
                              "sum_other_doc_count": 0,
                              "buckets": [
                                {
                                  "key": "some currency",
                                  "doc_count": 2670,
                                  "gmv": {
                                    "value": some gmv value
                                  }
                                }
                              ]
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      }

So there are a lot of subbuckets. What I wondered is if after performing two aggregations (to retrieve id and gmv) I can then somehow access other values like account_id, channel and base_currency without having to do all those subaggregations

The aggregation named "gmv" in your example is a sum aggregation. It is a metric aggregation which in your case computes a sum of the field value_base for every id.account_id.channel. base_currency bucket created by the tree of terms aggregation at the upper levels. It is a single value that is computed based on the documents that are contained in the bucket. The value for account_id, channel and base_currency are already in the response. The values are the key of the parent buckets above each gmv result.
The top_hits aggregation returns the top N documents per bucket. It is not a single valued metric like the sum aggregation and is not needed in your case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.