Bucket-key=custom, value=array(_ids) aggregation?


I have a mapping with fields "a string,b string,c string,t timestamp". Can I make bucket aggregations, where I can specify the key to be for example::

key=t(yymmdd):a:b (generate the key from script)

Each bucket should have as value an array of documents with the ability to also include _source. Ability to return top(x) + doc-count if there are alot of documents in a bucket.

The buckets should be sortable by a field-value (ex: t timestamp)

Ability to limit the number of buckets.

I also need to get back the min(timestamp) that is on the whole aggregation (in case the last bucket has too many documents to return the _source of them all).

Is this possible ? If not, can I do anything (custom java?) to make it possible ?

I think this can be done using terms-script-aggregation to generate the initial buckets https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-script
top-hits as sub-aggregation to return the documents for each bucket https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html ?

But I don't know how to get the "minimum timestamp" on the last bucket ? Maybe by sorting by timestamp-ascending on the top-hits-sub-aggregation (so I get the top-documents) ?

Makes sense ?


shameless bumping

Can you reformat your OP, it's hard to see what is happening. Wrap it in code tags :slight_smile:

Hope it's more clear now.

It sounds like you are on the right track, regarding scripting. But I can't help there as I don't know much on that.

However a better solution might be to look at crafting fields with these sorts of values during ingestion, that way it'll be much simpler (and better on your resources).

The "what/how to group on" is dynamic (from the client side) so I can't do that. I just wanted to know if that's the right way, and looks like it is.