I am using elasticsearch 6.1.1 and creating a single shard per index. The cluster settings is as follows -
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"enable": "all"
}
}
},
"indices": {
"breaker": {
"request": {
"limit": "80%"
}
}
}
},
"transient": {}
}
System RAM : 16 GB
JVM heap size : 4 GB
The reason of using one shard per index is to avoid data approximation performed by elasticsearch on specific cases.
I have an index 'atcc_summary_201707_5' containing nearly 0.14 million documents(having some nested fields) of size 250 mb. I am trying to run an aggregation query over a subset of those those documents. The query involves nested bucketing(up to 3 levels) and some metric aggregations. Every time I'm running the query, it is throwing circuit_breaking_exception with the following reason -
"[parent] Data too large, data for [<agg [count_2wh]>] would be [2982073327/2.7gb], which is larger than the limit of [2982071500/2.7gb]"
I'm literally stuck here as the whole point of using elasticsearch was to be able to query on a huge data set quickly. Please throw some light on why it is consuming so much of memory. Is it because of having one-off shard per index ? Please advise how to get around this.