Memory problems while querying

Hi,

I am running a query (below) on my cluster which is heavily impacting it since its status turns red. The cluster has 10 data nodes.

    {  
        'size':0,
        'query':{  
        'bool':{  
         'must_not':[],
         'must':[  
            {  
               'query_string':{  
                  'analyze_wildcard':'true',
                  'query':'ID:/2[0-9]{11}/ AND NOT timestamp:[1535414400000 TO 1535435999599]'
               }
            },
            {  
               'range':{  
                  'timestamp':{  
                     'gte':1535436000000,
                     'lte':1535437799999,
                     'format':'epoch_millis'
                  }
               }
            }
         ]
      }
   },
   '_source':{  
      'excludes':[  
      ]
   },
   'aggs':{  
      'devices':{  
         'terms':{  
            'field':'ID',
            'order':{  
               '_count':'desc'
            },
            'size':26810
         },
         'aggs':{  
            'timestamps':{  
               'date_histogram':{  
                  'field':'timestamp',
                  'interval':'15m',
                  'time_zone':'Europe/Berlin',
                  'min_doc_count':1
               },
               'aggs':{  
                  'field1':{  
                     'filters':{  
                        'filters':{  
                           'field1_meq-33_or_leq-63':{  
                              'query_string':{  
                                 'query':'FIELD1:<-63 OR FIELD1:>-33',
                                 'analyze_wildcard':'true'
                              }
                           },
                           'field1_total':{  
                              'query_string':{  
                                 'query':'*',
                                 'analyze_wildcard':'true'
                              }
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

The index has a daily format, which means that for the specified timestamp only one index is queried. It has 10 shards, each one being around 50GB and no replicas.

Each data node has 64GB of RAM and 32GB for heap size.

Below you can also see a picture of the JVM heap size for one of the data nodes, during the period the query is performed (around 11h50).

I have been checking multiple posts about these OOM problems to try to understand it. I believe the problem is the query itself, since it is aggregating over many documents (this time range has 5.235.160):

Can someone please help me troubleshoot this issue a bit more? Any other idea why this may be happening? Any solution, such as modifying the query into a "better" one?

Thank you in advance!

The nested aggregations you are doing look strange to me, especially around the filters. Can you explain in words what the intent is there?

On the query portion at the top, there are simpler ways to go about this. It looks like you just want ID's that start with 2 in a given date range.

  1. Get rid of must_not and the source excludes since you aren't using them.
  2. Change must to filter since you aren't scoring anything.
  3. Delete AND NOT timestamp... since the range clause makes it impossible to ever match.
  4. Change the remaining query_string to a simple prefix query.

Next I would delete the field1 sub-agg, re-run the query, and see how your heap is behaving.

Finally, 26810 buckets is a lot, so consider breaking the agg into partitions. That will help reduce memory pressure.

Good luck!

Hi Loren,

Thank you very much for your quick reply!

We followed your suggestions and we finally have a much lighter query that is not impacting the cluster!

Once again, thank you very much for your help!

Great news! Glad to be of help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.