Query on aggregation and scroll


I have a requirement to retrieve all instances of records having unique value for a particular column. All the records having that same unique value must appear in one cluster. The number of records in the index could be in billions.

Should i be using scroll with aggregation? I somewhere read aggregation is not the best solution for this one.

The other approach could be to scroll over those records and sort on that particular column. For this approach, i wanted to know whether the sorting will be over the 10000 records to be presented or all the matching records will be sorted first and then 10000 records will presented.

Please suggest.


Can you explain more the requirement ?

Do you need to scroll in order to extract those data ?


Yes, I need to scroll to extract those data.


You can run a filtered query on the particular column and then scroll results, but it can be long and heavy, depending on the number of "selected documents".
Note that sorting sorts over all records.

Extra question: Do you need to aggregate selected documents or not ?

Yes. I need aggregation over records matching my query.

Recently I "solved" a search dilemn with the following trick: We have complex searches with various parameters and they can return lot of records or not. My trick is to run the query with size = 0 to get the totalHits and then run a query with complex aggregations or to scroll and doing aggregations in our code.
The limit is fixed to 30000 records. Above this limit I let ES do aggregations, under this limit it's very quicker to do it ourself. Running queries with size=0 is super fast (< 100ms), but running the same query with complex aggregrations can take over 6/7s !

Note that we are running ES 2.4.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.