Insertion/search failure - elasticsearch2.4.0

aditya7 · December 14, 2017, 10:21am

Hi,

I am using elasticsearch-2.4.0 in cluster having two nodes (configured on seperate servers).
I have allocated 15gb of RAM to each of the nodes. (50% of total RAM) and using bulk insertion.

Sometimes following series of errors/warnings are coming in elasticsearch:

(1) GC(old) starts:

[2017-12-14 06:01:00,533][INFO ][monitor.jvm ] [node1] [gc][old][301607][18425] duration [57.6s], collections [8]/[57.6s], total [57.6s]/[6.9h], memory [14.8gb]->[14.7gb]/[14.8gb], all_pools {[young] [865.3mb]->[856.7mb]/[865.3mb]}{[survivor] [107.4mb]->[0b]/[108.1mb]}{[old] [13.9gb]->[13.9gb]/[13.9gb]}

(2) After GC running for several hours, JavaHeap space problem comes:

"engine failed, but can't find index shard. failure reason: [already closed by tragic event on the index writer]
java.lang.OutOfMemoryError: Java heap space"

(3) Then master leaves after sometime:

[2017-12-14 06:05:20,098][WARN ][transport ] [node1] Received response for a request that has timed out, sent [40543ms] ago, timed out [10543ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], id [5714073]
[2017-12-14 06:07:00,126][INFO ][discovery.zen ] [node1] master_left [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-12-14 06:07:00,127][WARN ][discovery.zen ] [node1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{node1}{jlrw3wCoTb-C8sdJcfk-6Q}{yy.yy.yy.yy}{yy.yy.yy.yy:9300}{master=true},}

(4) After that following warning starts coming repetitively:

[WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.io.IOException: File exists
at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
at sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

All these things cause failure in search as well as insertion. To overcome this error, I have to kill elasticsearch and restart the ES cluster.

Configuration I am using for both nodes:

node.master: true
node.data: true
bootstrap.memory_lock: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1_ip", "node2_ip"]
discovery.zen.minimum_master_nodes: 2
network.bind : 0.0.0.0
http.enabled: true
http.cors.allow-credentials: true
http.cors.enabled: true
http.cors.allow-origin: /(.*)?/
http.cors.allow-methods: OPTIONS,HEAD,GET,POST,PUT,DELETE
http.jsonp.enable: true
indices.fielddata.cache.size: 25%
index.number_of_shards: 2
index.number_of_replicas: 1
index.codec: best_compression
threadpool.search.queue_size: 5000
threadpool.bulk.queue_size: 5000
threadpool.index.queue_size: 5000

Is this due to configuration or any other problem?

Christian_Dahlqvist · December 14, 2017, 10:24am

How much data do you have? How many indices and shards?

aditya7 · December 14, 2017, 10:32am

I have (220*2)gb of data. Multiplied by 2 due to replica.
Total indices = 20.
Primary shards for each indices I have kept at 2 and replica as 1.

Christian_Dahlqvist · December 14, 2017, 10:35am

It looks like you are suffering from heap pressure. I would recommend reducing the queue sizes significantly as that can use up a lot of memory.

system · January 11, 2018, 10:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OutOfMemoryError (Java heap space) during replication enabling on 90 indices Elasticsearch	6	1224	July 6, 2017
Elasticsearch cluster down due to Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	4	374	June 25, 2019
Elasticsearch 7.1x + Java 11: Possible GC misconfiguration Elasticsearch	2	562	September 30, 2019
Searching the big index - java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	495	July 6, 2017
Out of heap error on machines with 18GB heap and 6GB index Elasticsearch	2	479	July 6, 2017

Insertion/search failure - elasticsearch2.4.0

Configuration I am using for both nodes:

Related topics