ES 2.2.0 JVM overflow

I`m using elastic 2.2.0 with the following configuration:

cluster.name: test_cluster
node.name: test_node_1
path.data: /mnt/elasticsearch/data
path.logs: /var/log/elasticsearch

network.bind_host:

  • local
  • eth0:ipv4

network.publish_host: eth0:ipv4
network.host: eth0:ipv4

discovery.zen.ping.unicast.hosts: ["x:9301"]
discovery.zen.ping_timeout: 30s
bootstrap.mlockall: true

Also I have the following configuration in /etc/default/elasticsearch:

ES_HEAP_SIZE=4g
MAX_LOCKED_MEMORY=unlimited

When I using Kibana to query the cluster (with only 1 node) after some heavy queries, I rapidly get to the point where no more queries could be processed and the cluster is super slow. I get "heap_used_percent": 99 and the GC time is very high:

> b]->[3.8gb]/[3.9gb], all_pools {[young] [101.1mb]->[12.5mb]/[133.1mb]}{[survivor] [16.6mb]->[0b]/[16.6mb]}{[old] [3.8gb]->[3.8gb]/[3.8gb]}
> [2016-04-12 13:08:05,445][INFO ][monitor.jvm              ] [test_node_1] [gc][old][1115][72] duration [8.9s], collections [1]/[9.9s], total [8.9s]/[4.9m], memory [3.9gb]->[3.8gb]/[3.9gb], all_pools {[young] [121.1mb]->[2.8mb]/[133.1mb]}{[survivor] [16.6mb]->[0b]/[16.6mb]}{[old] [3.8gb]->[3.8gb]/[3.8gb]}
> [2016-04-12 13:08:18,375][INFO ][monitor.jvm              ] [test_node_1] [gc][old][1119][73] duration [9.1s], collections [1]/[9.9s], total [9.1s]/[5m], memory [3.8gb]->[3.8gb]/[3.9gb], all_pools {[young] [58.4mb]->[1.3mb]/[133.1mb]}{[survivor] [14.6mb]->[0b]/[16.6mb]}{[old] [3.8gb]->[3.8gb]/[3.8gb]}

What is the problem? I set 4g heap size (the machine have 8g in total) and I can see it is not a swappiness problem because the I/O is low. Why is the cluster slows down so much after a couple of queries in kibana?

What sort of queries are they?

just regular visualizations like " triggeredBy: "DATA_T" AND network.x: 425 AND connectivity.type:"MOBILE" " nothing spacial really. The problem is that the es (seems like) do not release the memory back and stuck processing something long after the query returns- the cpu just keep being high. I can see it in Grafana (using collectd on the es machine).

That looks ok, how much data do you have in the cluster?