I'm just migrating my application from elasticsearch 1.7 to 5.6 but I'm stuck with the following aggregation which previously relies on size:"0" (removed in 5.x).
So basically, I want to retrieve all "events" (which are, in fact, reservations for an event) in a date range, grouped by date, and get the total of their ca field. Later, in javascript I sorted all buckets by total.value and keep only the top 10.
In other words, I want to know the top 10 of the events with the most ca.
But now that "size": 0 was deleted, I have no idea of how to achieve this. For example, if I use "size": 2147483647 it render incorrect values as some reservations are ignored (yes, I have a huge database ).
Maybe there is a way to ask elastic to give me only the top 10 results ... I was looking at the top_hits aggregation but I don't get if this is applicable to my problem?
Well, I think we fall back in the initial problem here. For example, here by_event_id only returns 10 buckets (with none of them is in the top 10). If I add "size": 2147483647, to the terms aggregation, I'll get more results but not all of them ...
Yes by default ES returns 10 values in the aggregations, with the size parameter you can indeed increase the number of returned values. But this suppose you already know the number of values you expect.
It's only in ES 6 you can parse all the results in the aggregations with the composite aggregation :
But i dont understand why you aggregate on a query (with the date range)? If you want your aggregations on all the data?
So maybe you want to aggregate only on events of type "booking", then filter your results with the date range?
In that case move the date range clause inside a post_filter:
No problem!
Steps order :
1 - Query will return all events of type "booking"
2 - aggregate per eventId on the results of the query (1) with the sum of the CA
3 - filter the results of the query (1) by the date range
Maybe I'm missing something but I can't see any significant difference between this query and the very first one I used: only 10 results are returned in the terms aggregation... and they may not be the top 10.
Yes unless you define the size parameter in the aggregations you will get 10 results (ES default value)...
What are the top 10 events? The ones (eventId) that have the most reservations (documents)?
Because the terms aggregation on eventId should give you the 10 best events.
So can we, build this query step by step? By removing the post_filter clause...
But you should already get the 10 best eventId in the "aggregations" object of your response.
To be precise, What I need is the 10 best groups of events (grouped by eventId | date |eventTheme | name | subType | age; but only one of them at a time).
The best criteria is determined by the sum of the ca fields in a group. For exemple, I need the 10 groups by age where the sum of ca is the maximum of all the groups by age.
Yes perfectly clear!
You're right! So I tried with the top_hits aggregation, and it should fulfill the need.
Could you try this one, and check the result?
You should get the list of 10 eventId, sorted by "sum_CA" which is an aggs of the events CA.
The "top_events" aggs lists only one event (sort by the highest CA), but you could also increase the size to get more events (reservations) for this eventId.
From what i understood, the first aggregation will be your top-level grouping, as you want to group by eventId, then the top_hits aggs will serve on how you want to aggregate the sub-data, in your exemple you could also set "top_hits": {}, if the sub-documents don't matter.
The interesting part, is the nested sum aggregation, of the CA grouped by eventId, that you can refer in the top-level aggs to sort the eventId.
It is just my understanding of this aggregation, I don't know if it's clear, but you should play with it and try the different options available.
Thank you for your use case, it's very interesting!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.