Insertion/search failure - elasticsearch2.4.0


I am using elasticsearch-2.4.0 in cluster having two nodes (configured on seperate servers).
I have allocated 15gb of RAM to each of the nodes. (50% of total RAM) and using bulk insertion.

Sometimes following series of errors/warnings are coming in elasticsearch:

(1) GC(old) starts:

[2017-12-14 06:01:00,533][INFO ][monitor.jvm ] [node1] [gc][old][301607][18425] duration [57.6s], collections [8]/[57.6s], total [57.6s]/[6.9h], memory [14.8gb]->[14.7gb]/[14.8gb], all_pools {[young] [865.3mb]->[856.7mb]/[865.3mb]}{[survivor] [107.4mb]->[0b]/[108.1mb]}{[old] [13.9gb]->[13.9gb]/[13.9gb]}

(2) After GC running for several hours, JavaHeap space problem comes:

"engine failed, but can't find index shard. failure reason: [already closed by tragic event on the index writer]
java.lang.OutOfMemoryError: Java heap space"

(3) Then master leaves after sometime:

[2017-12-14 06:05:20,098][WARN ][transport ] [node1] Received response for a request that has timed out, sent [40543ms] ago, timed out [10543ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], id [5714073]
[2017-12-14 06:07:00,126][INFO ][discovery.zen ] [node1] master_left [{node2}{WTHP9NdeTaqbGsjtOV50Rg}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}{master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-12-14 06:07:00,127][WARN ][discovery.zen ] [node1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{node1}{jlrw3wCoTb-C8sdJcfk-6Q}{yy.yy.yy.yy}{yy.yy.yy.yy:9300}{master=true},}

(4) After that following warning starts coming repetitively:

[WARN ][] Unexpected exception in the selector loop. File exists
at Method)
at org.jboss.netty.util.internal.DeadLockProofWorker$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

All these things cause failure in search as well as insertion. To overcome this error, I have to kill elasticsearch and restart the ES cluster.

Configuration I am using for both nodes:

node.master: true true
bootstrap.memory_lock: true false ["node1_ip", "node2_ip"]
discovery.zen.minimum_master_nodes: 2
network.bind :
http.enabled: true
http.cors.allow-credentials: true
http.cors.enabled: true
http.cors.allow-origin: /(.*)?/
http.cors.allow-methods: OPTIONS,HEAD,GET,POST,PUT,DELETE
http.jsonp.enable: true
indices.fielddata.cache.size: 25%
index.number_of_shards: 2
index.number_of_replicas: 1
index.codec: best_compression 5000
threadpool.bulk.queue_size: 5000
threadpool.index.queue_size: 5000

Is this due to configuration or any other problem?

How much data do you have? How many indices and shards?

I have (220*2)gb of data. Multiplied by 2 due to replica.
Total indices = 20.
Primary shards for each indices I have kept at 2 and replica as 1.

It looks like you are suffering from heap pressure. I would recommend reducing the queue sizes significantly as that can use up a lot of memory.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.