Insertion/search failure - elasticsearch2.4.0

Hi,

I am using elasticsearch-2.4.0 in cluster having two nodes (configured on seperate servers).
I have allocated 15gb of RAM to each of the nodes. (50% of total RAM) and using bulk insertion.

Sometimes following series of errors/warnings are coming in elasticsearch:

(1) GC(old) starts:

[2017-12-14 06:01:00,533][INFO ][monitor.jvm ] [node1] [gc][old][301607][18425] duration [57.6s], collections [8]/[57.6s], total [57.6s]/[6.9h], memory [14.8gb]->[14.7gb]/[14.8gb], all_pools {[young] [865.3mb]->[856.7mb]/[865.3mb]}{[survivor] [107.4mb]->[0b]/[108.1mb]}{[old] [13.9gb]->[13.9gb]/[13.9gb]}

(2) After GC running for several hours, JavaHeap space problem comes:

"engine failed, but can't find index shard. failure reason: [already closed by tragic event on the index writer]
java.lang.OutOfMemoryError: Java heap space"

(3) Then master leaves after sometime:

[2017-12-14 06:05:20,098][WARN ][transport ] [node1] Received response for a request that has timed out, sent [40543ms] ago, timed out [10543ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], id [5714073]
[2017-12-14 06:07:00,126][INFO ][discovery.zen ] [node1] master_left [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-12-14 06:07:00,127][WARN ][discovery.zen ] [node1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{node1}{jlrw3wCoTb-C8sdJcfk-6Q}{yy.yy.yy.yy}{yy.yy.yy.yy:9300}{master=true},}

(4) After that following warning starts coming repetitively:

[WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.io.IOException: File exists
at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
at sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

All these things cause failure in search as well as insertion. To overcome this error, I have to kill elasticsearch and restart the ES cluster.

Configuration I am using for both nodes:

node.master: true
node.data: true
bootstrap.memory_lock: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1_ip", "node2_ip"]
discovery.zen.minimum_master_nodes: 2
network.bind : 0.0.0.0
http.enabled: true
http.cors.allow-credentials: true
http.cors.enabled: true
http.cors.allow-origin: /(.*)?/
http.cors.allow-methods: OPTIONS,HEAD,GET,POST,PUT,DELETE
http.jsonp.enable: true
indices.fielddata.cache.size: 25%
index.number_of_shards: 2
index.number_of_replicas: 1
index.codec: best_compression
threadpool.search.queue_size: 5000
threadpool.bulk.queue_size: 5000
threadpool.index.queue_size: 5000

Is this due to configuration or any other problem?

How much data do you have? How many indices and shards?

I have (220*2)gb of data. Multiplied by 2 due to replica.
Total indices = 20.
Primary shards for each indices I have kept at 2 and replica as 1.

It looks like you are suffering from heap pressure. I would recommend reducing the queue sizes significantly as that can use up a lot of memory.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.