I posted this on IRC, but obviously my GMT+11 Timezone is not friendly, so
as a backup I post the text here for anyone that might have experience in
I have an application using a TransportClient configured to connect to a
2-node ES cluster (i'll leave aside for now why we have to use the
TransportClient, but it's rationale..)
one of the ES nodes hand a faulty backplane and died.
ES of course kept on trucking with the other node
however since that event the application client has burnt a hell of a lot
which looking at the thread dumps look to be the "New I/O client worker
#1-5 daemon" style threads used by ES.
I thought somehow with the one ES node dead there's some looping logic
trying to re-establish connection to it.
so I waited till the Dell guys replaced the backplane and we restored that
once back in green state I was hoping the CPU burn would go away, but alas
now looking at one of our other instances running in a similar config, I
note the ES app threads are always runnable because of the NIO, but they're
generally in a sleep state looking at them.
has anyone else seen this sort of problem?
I'm just gathering a known 'good' thread dump to compare this with.
Here's a gist: https://gist.github.com/1440329