Odd behavior - our 5-node cluster hums along happily but then, once or
twice a day, one node pops to all 1000 threads (the default limit) being
hit and the node becomes unresponsive, causing our whole cluster to become
extremely slow.
Has anyone experienced this? Any good way to diagnose this?
Memory and CPU appear normal, if that helps... I'm not even sure where to
start here.
Odd behavior - our 5-node cluster hums along happily but then, once or
twice a day, one node pops to all 1000 threads (the default limit) being
hit and the node becomes unresponsive, causing our whole cluster to become
extremely slow.
Has anyone experienced this? Any good way to diagnose this?
Memory and CPU appear normal, if that helps... I'm not even sure where to
start here.
Look at query rates and see if they correlate. I'm guessing they jumped,
too. SPM http://sematext.com/spm will help with that. Once you confirm
you can trace the source of queries further upstream.
On Saturday, November 22, 2014 12:51:31 AM UTC-5, Christopher Ambler wrote:
Odd behavior - our 5-node cluster hums along happily but then, once or
twice a day, one node pops to all 1000 threads (the default limit) being
hit and the node becomes unresponsive, causing our whole cluster to become
extremely slow.
Has anyone experienced this? Any good way to diagnose this?
Memory and CPU appear normal, if that helps... I'm not even sure where to
start here.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.