I've got an ES cluster of two data nodes and one no-data node (serving the
kibana website). It receives approx. 40 mio. loglines a day, and normally
has no issue with this.
If I stop reading in for a short time - and start again -the queue is
emptied about 50x faster than it is filled.
We've had several different issues, and have fixed up nprocs and tuned
elasticsearch.yml - which have helped, but ES (since 1.1.2 - which might
be a coincidence though) suddenly gets an immense slowdown - which makes
the queue fill up. If I then stop everything and restart ES, then LS - it
usually picks back up. Sometimes I have to do it several times.
The only thing that seems to increase in elasticsearch logs, around when
this happens is this message:
[2014-06-22 20:23:02,612][WARN ][transport ]
[p-elasticlog02] Received response for a request that has timed out, sent
[44943ms] ago, timed out [14943ms] ago, action
in the second node in the cluster (which seemed to be the cause)
there was GC messages.. and I had to bring down the entire cluster to make
it start running properly again ( I could perhaps just have restarted the
node writing about gc).
I've set nprocs to 4096 and max open files to 65k.
ES is started with: /usr/bin/java -Xms41886M -Xmx41886M
-XX:MaxDirectMemorySize=41886M -Xss256k -Djava.awt.headless=true
Any recommendations as to how I can make try to fix this problem? It
happens a few times a week
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70c87756-f9b8-4032-9906-9a520c28801e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.