Hi,
I am using elasticsearch-2.4.0 in cluster having two nodes (configured on seperate servers).
I have allocated 15gb of RAM to each of the nodes. (50% of total RAM) and using bulk insertion.
Sometimes following series of errors/warnings are coming in elasticsearch:
(1) GC(old) starts:
[2017-12-14 06:01:00,533][INFO ][monitor.jvm ] [node1] [gc][old][301607][18425] duration [57.6s], collections [8]/[57.6s], total [57.6s]/[6.9h], memory [14.8gb]->[14.7gb]/[14.8gb], all_pools {[young] [865.3mb]->[856.7mb]/[865.3mb]}{[survivor] [107.4mb]->[0b]/[108.1mb]}{[old] [13.9gb]->[13.9gb]/[13.9gb]}
(2) After GC running for several hours, JavaHeap space problem comes:
"engine failed, but can't find index shard. failure reason: [already closed by tragic event on the index writer]
java.lang.OutOfMemoryError: Java heap space"
(3) Then master leaves after sometime:
[2017-12-14 06:05:20,098][WARN ][transport ] [node1] Received response for a request that has timed out, sent [40543ms] ago, timed out [10543ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], id [5714073]
[2017-12-14 06:07:00,126][INFO ][discovery.zen ] [node1] master_left [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-12-14 06:07:00,127][WARN ][discovery.zen ] [node1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{node1}{jlrw3wCoTb-C8sdJcfk-6Q}{yy.yy.yy.yy}{yy.yy.yy.yy:9300}{master=true},}
(4) After that following warning starts coming repetitively:
[WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.io.IOException: File exists
at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
at sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
All these things cause failure in search as well as insertion. To overcome this error, I have to kill elasticsearch and restart the ES cluster.
Configuration I am using for both nodes:
node.master: true
node.data: true
bootstrap.memory_lock: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1_ip", "node2_ip"]
discovery.zen.minimum_master_nodes: 2
network.bind : 0.0.0.0
http.enabled: true
http.cors.allow-credentials: true
http.cors.enabled: true
http.cors.allow-origin: /(.*)?/
http.cors.allow-methods: OPTIONS,HEAD,GET,POST,PUT,DELETE
http.jsonp.enable: true
indices.fielddata.cache.size: 25%
index.number_of_shards: 2
index.number_of_replicas: 1
index.codec: best_compression
threadpool.search.queue_size: 5000
threadpool.bulk.queue_size: 5000
threadpool.index.queue_size: 5000
Is this due to configuration or any other problem?