Computation expensive queries

I need to run expensive queries on couple of days' of index, then ES server gives me below error:

Data too large, data for [reused_arrays] would be larger than limit of [2566520832/2.3gb]

"bytes_wanted": 2608572536,

"bytes_limit": 2566520832

Any general suggestion for overcoming this? Shall I increase the shards or use instances with more computing power? thanks!

my query:

("cardinality" is very expensive):
POST /index_name/_search?size=0
"aggs": {
"search_terms": {
"terms": {
"field": "search_term.keyword",
"size": 10,
"order": {
"unique_session": "desc"
"aggs": {
"unique_session": {
"cardinality": {
"field": "session_id.keyword"

Add more nodes/resources to alleviate this.

A couple of other approaches to consider:

A. Reduce accuracy or
B. Break into multiple requests

For A consider lowering the precision_threshold in the cardinality agg [1]. The default value is 3,000 meaning each unique search term will count up to 3,000 session IDs each which is largely at the root of your memory problems given the number of unique search terms there are likely to be.

For option B we have a couple of ways of doing this. The first is to simply issue a query for the top 1,000 search_term.keyword values. This should give you an approximation for the search terms used in sessions. There will be false positives (search terms caused by outlier sessions repeating the same search) but no false negatives. Take this list of search terms and use them as a terms query in your existing example agg request.
Another option in 5.4 is to run multiple requests like your existing one but focusing each request on a subset of the search terms in your index. This can be done using the partitioning function [2] on the include clause in your terms query.


1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.