Elasticsearch Aggregation - Return a list of array from documents found

Hi All,

I am querying a time series data using the aggregation functionality. The data to be queried is of categorical nature.
I use date histogram to first create buckets. From these generated buckets, I wish to extract actual values from the documents found, forming an array.

A workaround to the solution might be pushing hits object within the aggs object. Don't know how to do this either.

The query:

 GET elastiflow-*/_search
{
  "size": 10000,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type":"boolean"
      }
    }
  ],
  "_source": {
    "includes": ["time", "data" ]
  }
  , "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-2d/d",
            "lte": "now"
          }
        }
      }
    }
  }
  , "aggs": {
    "30secbuckets": {
      "date_histogram": {
        "field": "time",
        "fixed_interval": "30s"
      }
      , "aggs": {
        "average": {
            "terms": {
            "field": "data"
          }
        }
      }
    }
  }
}

Thanks!

I believe top_hits aggregation within the buckets would work. If you need every single result, you might need to perform a terms aggregation over the _id field and then the top_hits agg under that _id terms agg. This may have very high memory usage depending on your data, so use at your own risk.

Hey,
Thanks for the reply.
This is what you suggest - date histogram followed by terms aggregation over the id field and then top hits aggregation.
However, the date histogram does not generate a _id key. Date histogram simply returns doc count and key.

I am a beginner with elasticsearch. Is there a code example which I can refer to?

No problem, I'll try to be add more detail :slightly_smiling_face:

In the case of the aggs which you have in the query posted, date_histogram and terms are both Bucket Aggregations. That doc_count you mentioned refers to the number of docs (hits) which matched all aggs and filters and ended up in the bucket in question. When you continue to run aggs under other aggs (such as your average agg, the result will be computed per-bucket (using the docs in each bucket).

I've included an example query here (based on your original query) which includes both methods I mentioned employing Top Hits Aggregations. I think I should reiterate that using a terms agg on the _id field could have disastrous consequences in terms of memory usage depending on your data and I don't think it's the right approach for that reason. But I've included it in the example for illustration purposes.

I also noticed that the size specified in the original query was very high. Recalling that many hits in a single request will likely see performance issues. I'd recommend using pagination of some kind, with options outlined here.

Example query:

{
    "size": 0,
    "sort": [
        {
            "@timestamp": {
                "order": "desc",
                "unmapped_type": "boolean"
            }
        }
    ],
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "gte": "now-2d/d",
                        "lte": "now"
                    }
                }
            }
        }
    },
    "aggs": {
        "30secbuckets": {
            "date_histogram": {
                "field": "time",
                "fixed_interval": "30s"
            },
            "aggs": {
                "my_top_hits": {
                    "top_hits": {
                        "sort": [
                            {
                                "@timestamp": {
                                    "order": "desc"
                                }
                            }
                        ],
                        "_source": {
                            "includes": [
                                "data",
                                "time"
                            ]
                        },
                        "size": 10
                    }
                },
                "docs_by_id": {
                    "terms": {
                        "field": "_id"
                    },
                    "aggs": {
                        "all_top_hits_DANGER_OOM": {
                            "top_hits": {
                                "_source": {
                                    "includes": [
                                        "data",
                                        "time"
                                    ]
                                },
                                "size": 1
                            }
                        }
                    }
                }
            }
        }
    }
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.