BulkUpdate elasticsearch issue

(Amit Sharma) #1

I am using elasticsearch version1.7.3.
I am running 3 node es cluster on 16gb ram and 4 core machine. 8gb ram is allocated to es process.
I am using mongoriver to sync data from mongo db. Some bulk update happened to mongo db. I got exception in elasticsearch
[ERROR][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] Bulk processor failed. failure in bulk execution
and org.elasticsearch.transport.NodeDisconnectedException:
and after node rejoin to cluster GC started. GC was consuming 90% of CPU. It was running indefinitely. sawtooth pattern was observed in grabage collection memory graph.

What should we do to avoid this scenerio?
I am unable to debug root cause of this issue.
What should we do to fix this issue? I restarted node in which GC was running after that GC started on other node.

(Mark Walkom) #2

Sounds like you are running out of resources. Can you add more nodes or more RAM+CPU?

(Amit Sharma) #3

@warkolm We have total of 16gb ram and 4 core machine for each node. Only elasticsearch application is running on each node. and request of each node is also minimal. I don't find any reason to allocate more resource.

All memory are allocated to buffer or cached memory in each es node. Is there any issue if we free cached/buffer memory in each es node?

I am running below command to clear buffer memory in es node. I am running es in linux OS machine.
sync; echo 3 > /proc/sys/vm/drop_caches

(Amit Sharma) #4

Source: github
[clintongormley ] This looks like it is related to this bug in Lucene: https://issues.apache.org/jira/browse/LUCENE-6670

Basically, an OOM exception is thrown on a merge thread which means the lock is never released. This bug is fixed in Lucene 5.3 and backported to 2.0. For 1.x, there is not much we can do expect advising you to keep heap usage as low as possible.

(system) #5