Processing Aggregation Results in the Client-Side

panda2004 · July 20, 2017, 10:36am

Hello,

Elasticsearch currently has known limitations when using and manipulating aggregations.
For example:

Pipeline aggregations are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering.

It means that we can't order our aggregations by Bucket_Script_Aggregation, which is quite useful and powerful in any other BI engine (such as Create.io).

I want to know what are the best-practices for making such "processing" in the client-side. Would you make the processing in the browser itself (should I be afraid of "big" json responses)? Would you make a proxy server which makes the processing for the browser (although it might hurts the client flexibility)?

polyfractal · July 20, 2017, 8:46pm

Not sure there is a good answer here. Probably depends on the quantity of processing required, how big the responses are and if the results need to be shared.

Browsers are surprisingly capable when it comes to crunching large amounts of JSON. I wrote the Search Profiler UI for xpack, and have used it before to crunch/display profile responses that are several megabytes in size. So browsers can definitely handle big payloads, especially if the processing is relatively straightforward like sorting.

But a dedicating "processor" might be a good idea too, because it would allow you to use a potentially more optimized language, cache results, etc.

As an aside, we may revisit pipelines at some point in the future. A lot of the technical limitations that were in place when they were first built have since been removed. So we may be able to open pipelines up a bit more to allow for things like sorting, global pathing, etc.

panda2004 · July 20, 2017, 9:14pm

@polyfractal, thanks for the information above!

For the record, our biggest query request about 20,000 buckets from terms aggregation, where each bucket has a sub-bucket containing a numeric aggregation such has value_count aggregation or sum aggregation. In addition, i'm making another request like this in parallel with different parameters (different date range for example).

By the way, i'm making only aggregations - so I always put size:0 to avoid retrieving hits / source. I believe the requests which return hits probably got bigger size. If you say that handling several megabytes size requests is fine, i'm good.

My initial plan was the make a proxy between the browser to ElasticSearch, that would handle the request for the client, and when the response arrives compress it (gzip) or even sort it for the client. What do you think?

system · August 18, 2017, 11:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does order of aggregation in ElasticSearch query affect final search result Elastic Search	1	51	February 28, 2025
Elasticsearch- Aggregation pagination Elasticsearch	3	1163	July 6, 2017
ES2.0 - Pipeline Aggregation for logging user? Elasticsearch	3	1217	July 6, 2017
How to avoid some aggregation data (if not needed) Elasticsearch	4	2307	April 26, 2017
Aggregations Elasticsearch	7	535	July 6, 2017

Processing Aggregation Results in the Client-Side

Related topics