Hi,
I am running a query (below) on my cluster which is heavily impacting it since its status turns red. The cluster has 10 data nodes.
{
'size':0,
'query':{
'bool':{
'must_not':[],
'must':[
{
'query_string':{
'analyze_wildcard':'true',
'query':'ID:/2[0-9]{11}/ AND NOT timestamp:[1535414400000 TO 1535435999599]'
}
},
{
'range':{
'timestamp':{
'gte':1535436000000,
'lte':1535437799999,
'format':'epoch_millis'
}
}
}
]
}
},
'_source':{
'excludes':[
]
},
'aggs':{
'devices':{
'terms':{
'field':'ID',
'order':{
'_count':'desc'
},
'size':26810
},
'aggs':{
'timestamps':{
'date_histogram':{
'field':'timestamp',
'interval':'15m',
'time_zone':'Europe/Berlin',
'min_doc_count':1
},
'aggs':{
'field1':{
'filters':{
'filters':{
'field1_meq-33_or_leq-63':{
'query_string':{
'query':'FIELD1:<-63 OR FIELD1:>-33',
'analyze_wildcard':'true'
}
},
'field1_total':{
'query_string':{
'query':'*',
'analyze_wildcard':'true'
}
}
}
}
}
}
}
}
}
}
}
The index has a daily format, which means that for the specified timestamp only one index is queried. It has 10 shards, each one being around 50GB and no replicas.
Each data node has 64GB of RAM and 32GB for heap size.
Below you can also see a picture of the JVM heap size for one of the data nodes, during the period the query is performed (around 11h50).
I have been checking multiple posts about these OOM problems to try to understand it. I believe the problem is the query itself, since it is aggregating over many documents (this time range has 5.235.160):
Can someone please help me troubleshoot this issue a bit more? Any other idea why this may be happening? Any solution, such as modifying the query into a "better" one?
Thank you in advance!