Processing Aggregation Results in the Client-Side


#1

Hello,

ElasticSearch currently has known limitations when using and manipulating aggregations.
For example:

Pipeline aggregations are run during the reduce phase after all other aggregations have already completed. For this reason, they cannot be used for ordering.

It means that we can't order our aggregations by Bucket_Script_Aggregation, which is quite useful and powerful in any other BI engine (such as Create.io).

I want to know what are the best-practices for making such "processing" in the client-side. Would you make the processing in the browser itself (should I be afraid of "big" json responses)? Would you make a proxy server which makes the processing for the browser (although it might hurts the client flexibility)?


(Zachary Tong) #2

Not sure there is a good answer here. Probably depends on the quantity of processing required, how big the responses are and if the results need to be shared.

Browsers are surprisingly capable when it comes to crunching large amounts of JSON. I wrote the Search Profiler UI for xpack, and have used it before to crunch/display profile responses that are several megabytes in size. So browsers can definitely handle big payloads, especially if the processing is relatively straightforward like sorting.

But a dedicating "processor" might be a good idea too, because it would allow you to use a potentially more optimized language, cache results, etc.

As an aside, we may revisit pipelines at some point in the future. A lot of the technical limitations that were in place when they were first built have since been removed. So we may be able to open pipelines up a bit more to allow for things like sorting, global pathing, etc.


#3

@polyfractal, thanks for the information above!

For the record, our biggest query request about 20,000 buckets from terms aggregation, where each bucket has a sub-bucket containing a numeric aggregation such has value_count aggregation or sum aggregation. In addition, i'm making another request like this in parallel with different parameters (different date range for example).

By the way, i'm making only aggregations - so I always put size:0 to avoid retrieving hits / source. I believe the requests which return hits probably got bigger size. If you say that handling several megabytes size requests is fine, i'm good.

My initial plan was the make a proxy between the browser to ElasticSearch, that would handle the request for the client, and when the response arrives compress it (gzip) or even sort it for the client. What do you think?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.