Hello,
we recently moved our ES cluster from dedicated hardware to AWS instances,
they have less memory available, but use SSDs for the ES data directory. We
kept JVM (1.7.0_17) and ES (0.90.9) version exactly the same. On the new
hardware, after running a full re-index (creating a new index, pointing an
alias to the new and one alias to the old index, sending realtime updates
to both aliases and running a script to fill up the new index) our cluster
gets stuck.
10 minutes after the re-index finishes and we move both aliases to the new
index, ES stops answering any search or index queries, no errors in the
logs apart from it not answering queries anymore:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
rejected execution (queue capacity 1000) on
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@172018e5
CPU load is low, it doesn't look like it's doing anything expensive. A
request to hot_threads times out. I've put the output from jstack and jmap
here:
We tried upgrading to 0.90.13, since the changelog mentioned a problem with
infinite loops, but same behavior. We're planning to upgrade to a more
recent version of ES soon, but it'll take a bit to fully test that.
Any ideas what could be causing this?
thanks,
Florian
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a347529-df1a-4a21-9ac1-d3af882a035a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.